<a href="https://colab.research.google.com/github/yotam-biu/ps9/blob/main/parkinsons.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [31]:
%load_ext autoreload
%autoreload 2

# Download the data from your GitHub repository
!wget https://raw.githubusercontent.com/yotam-biu/ps9/main/parkinsons.csv -O /content/parkinsons.csv

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
/content/parkinsons.csv: No such file or directory




## 1. **Load the dataset:**  

   After running the first cell of this notebook, the file `parkinson.csv` will appear in the `Files` folder.
   You need to loaded the file as a DataFrame.  




In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import MinMaxScaler

# Load your dataset
data = pd.read_csv('parkinsons.csv')

## 2. **Select features:**  

   - Choose **two features** as inputs for the model.  
   - Identify **one feature** to use as the output for the model.  

  #### Advice:  
  - You can refer to the paper available in the GitHub repository for insights into the dataset and guidance on identifying key features for the input and output.  
  - Alternatively, consider creating pair plots or using other EDA methods we learned in the last lecture to explore the relationships between features and determine which ones are most relevant.  


In [33]:
# Selecting relevant features based on the paper
# Using the provided column names
features = ['HNR', 'RPDE']
output = 'PPE'

X = data[features]
y = data[output]

## 3. **Scale the data:**

   Apply the `MinMaxScaler` to scale the two input columns to a range between 0 and 1.  


In [34]:
# Scaling the features to [-1, 1] as suggested in the paper
scaler = MinMaxScaler(feature_range=(-1, 1))
X_scaled = scaler.fit_transform(X)
y_scaled = scaler.fit_transform(y.values.reshape(-1, 1))

## 4. **Split the data:**

   Divide the dataset into a training set and a validation set.





In [35]:
# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_scaled, test_size=0.2, random_state=42)

## 5. **Choose a model:**  

   Select a model to train on the data.  

   #### Advice:  
   - Consider using the model discussed in the paper from the GitHub repository as a reference.  


In [36]:
# Building the SVM regression model with a radial basis function (RBF) kernel
svm_model = SVR(kernel='rbf', C=1.0, gamma='scale')

# Training the model
svm_model.fit(X_train, y_train.ravel())

# Making predictions
y_pred = svm_model.predict(X_test)

# 6. **Test the accuracy:**  

   Evaluate the model's accuracy on the test set. Ensure that the accuracy is at least **0.8**.  


In [37]:
# Evaluating the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")
print(f"R-squared: {r2:.4f}")

Mean Squared Error: 0.0751
R-squared: 0.4569


## 7. **Save and upload the model:**  

   After you are happy with your results, save the model with the `.joblib` extension and upload it to your GitHub repository main folder.
   
   Additionally, update the `config.yaml` file with the list of selected features and the model's joblib file name.  


example:  
```yaml
selected_features: ["A", "B"]  
path: "my_model.joblib"  
```

In [38]:
import yaml
import joblib

# Save the trained model to a file
model_filename = 'svm_model.joblib'
joblib.dump(svm_model, model_filename)
print(f"Model saved as {model_filename}")

# Create and save the config.yaml file
config = {
    'selected_features': features,
    'path': model_filename
}

with open('config.yaml', 'w') as config_file:
    yaml.dump(config, config_file)
print("Configuration file 'config.yaml' created.")

Model saved as svm_model.joblib
Configuration file 'config.yaml' created.


## 8. **Copy the code:**  

   Copy and paste all the code from this notebook into a `main.py` file in the GitHub repository.  
