<a href="https://colab.research.google.com/github/pratyush-3000/me/blob/master/Pratyush_Lahane_ML_SVM_RMSE.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The objective of this assignment is to acquaint oneself with the SVM (Support Vector Machine) concept and gain practical experience in training both regression and classification SVM models.

# Problem 01

In this problem you are going to work with "Life Expectancy Data.csv" dataset. <br>
Import the dataset and
1. Drop all the missing values from the dataset at the beginning of the project.
2. Drop the "Country" column from the dataset.
3. Split the dataset into train and test.
4. Prepare the dataset using pipeline.

In [None]:
import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer

df = pd.read_csv("Life Expectancy Data.csv")

df= df.dropna().drop(columns="Country")

x = df.drop(columns="Life expectancy")
y = df["Life expectancy"]

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

numeric_features = x.select_dtypes(include=['float64', 'int64']).columns
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler())
])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features)
    ])

x_train_prepared = preprocessor.fit_transform(x_train)
x_test_prepared = preprocessor.transform(x_test)

(x_train_prepared.shape, x_test_prepared.shape, y_train.shape, y_test.shape)

((1675, 17), (419, 17), (1675,), (419,))

Train a linear SVM model to predict life expectancy. <br>
Test the model on both train and test datasets and print out the RMSEs.

In [None]:
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

svm_model = SVR(kernel='linear')
svm_model.fit(x_train_prepared, y_train)


y_train_pred = svm_model.predict(x_train_prepared)
y_test_pred = svm_model.predict(x_test_prepared)


rmse_train = np.sqrt(mean_squared_error(y_train, y_train_pred))
rmse_test = np.sqrt(mean_squared_error(y_test, y_test_pred))

rmse_train, rmse_test


(4.029706753853388, 4.090193442396489)

Train a second degree polynomial SMV, using kernel trick.
Test the model on both train and test datasets and print out the RMSEs.
Important points:
1. The RMSEs of the polynomial model should be smaller than RMSEs of the linear model that you trained earlier. If they are not, find a way to solve the problem.
2. Eliminate overfitting problem, **IF** you are overfitting.

In [None]:

svm_poly_model_optimized = SVR(kernel='poly', degree=2, C=0.01, epsilon=0.1)
svm_poly_model_optimized.fit(x_train_prepared, y_train)

y_train_pred_poly_opt = svm_poly_model_optimized.predict(x_train_prepared)
y_test_pred_poly_opt = svm_poly_model_optimized.predict(x_test_prepared)


rmse_train_poly_opt = np.sqrt(mean_squared_error(y_train, y_train_pred_poly_opt))
rmse_test_poly_opt = np.sqrt(mean_squared_error(y_test, y_test_pred_poly_opt))

rmse_train_poly_opt, rmse_test_poly_opt


(9.819574924481083, 9.990382946594236)

# Problem 02

In this problem you are going to work with Iris dataset. <br>
Import the dataset and
1. **DO NOT** split the dataset into train and test; you should use the whole dataset as the trainset.
2. Scale the dataset.
3. Use all four inputs as the input of the project and all the classes as the output of the project (in class, I used two inputs and one class).

In [None]:

from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler


iris = load_iris()
x = iris.data
y = iris.target

x_scaler = StandardScaler()
x_scaled = scaler.fit_transform(x)


x_scaled[:5], y[:5]


(array([[-0.90068117,  1.01900435, -1.34022653, -1.3154443 ],
        [-1.14301691, -0.13197948, -1.34022653, -1.3154443 ],
        [-1.38535265,  0.32841405, -1.39706395, -1.3154443 ],
        [-1.50652052,  0.09821729, -1.2833891 , -1.3154443 ],
        [-1.02184904,  1.24920112, -1.34022653, -1.3154443 ]]),
 array([0, 0, 0, 0, 0]))

Train a linear SMV model and print out:
1. Confusion matrix
2. Precision score
3. Recall score
<br>

**Important:** you need to train and test the model on the same dataset (as we did not split the dataset into train and test and there is no testset).

In [None]:
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, precision_score, recall_score

svm_model = SVC(kernel='linear')
svm_model.fit(x_scaled, y)


y_pred = svm_model.predict(x_scaled)

conf_matrix = confusion_matrix(y, y_pred)
precision = precision_score(y, y_pred, average='weighted')
recall = recall_score(y, y_pred, average='weighted')

conf_matrix, precision, recall


(array([[50,  0,  0],
        [ 0, 46,  4],
        [ 0,  1, 49]]),
 0.9677505687140372,
 0.9666666666666667)

Train a third degree polynomial model and print out:
1. Confusion matrix
2. precision score
3. Recall score
<br>

**Important:** you need to train and test the model on the same dataset (as we did not split the dataset into train and test and there is no testset).

In [None]:

svm_poly_model = SVC(kernel='poly', degree=3)
svm_poly_model.fit(x_scaled, y)


y_pred_poly = svm_poly_model.predict(x_scaled)

conf_matrix_poly = confusion_matrix(y, y_pred_poly)
precision_poly = precision_score(y, y_pred_poly, average='weighted')
recall_poly = recall_score(y, y_pred_poly, average='weighted')

conf_matrix_poly, precision_poly, recall_poly


(array([[50,  0,  0],
        [ 0, 50,  0],
        [ 0,  7, 43]]),
 0.9590643274853801,
 0.9533333333333334)