# Machine Learning Regression example

In this Machine Learning example Scikit-learn library is used to develop machine learning model for regression type of problem. Here we will be implementing linear and non-linear models to fit synthetic simulated data. 

##### Problem definition:
Develop predictive Machine Learning nonlinear regression model to predict taget variable (y) based on independent variable (x).  

### 1. Import Python packages

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline 
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error 
from sklearn.metrics import mean_absolute_error

### 2. Generate data

In [None]:
# X = np.sort(5 * np.random.rand(40, 1), axis=0)
# y = np.sin(X).ravel()
# y[::5] += 3 * (0.5 - np.random.rand(8))

In [None]:
X = np.sort(5 * np.random.rand(140, 1), axis=0)
y = np.sin(X).ravel()

In [None]:
# add noise to targets
y += (0.3 * np.random.rand(len(y)))
y[::5] += 3 * (0.2 - np.random.rand(28))

In [None]:
X.shape

In [None]:
y.shape

### 3. Visualize the data

In [None]:
dataset = pd.DataFrame({'x': X[:,0], 'y': y}, columns=['x', 'y'])

In [None]:
sns.scatterplot(data=dataset, x="x", y="y")

### 4. Split the data into train and test sets

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X , y, test_size = 0.1, random_state = 42)

print("Training set ", X_train.shape)
print("Testing set ", X_test.shape)

### 5. Fit the model 

In [None]:
# create an instance of the model 
# regressor = DecisionTreeRegressor()
# TODO later regressor = SVR(kernel="linear")
regressor = DecisionTreeRegressor()

In [None]:
# fit the model over the training datafrom sklearn.svm import SVR
regressor.fit(X_train, y_train)

### 6. Evaluate the model

In [None]:
y_pred = regressor.predict(X_test)

In [None]:
mean_squared_error(y_test, y_pred)

In [None]:
mean_absolute_error(y_test, y_pred)

### 7. Visualize the model over the scatterplot 

In [None]:
X_plot = np.linspace(X.min(), X.max(), 100)
y_plot = regressor.predict(X_plot.reshape(-1, 1))

In [None]:
plot_data = pd.DataFrame({'x': X_plot, 'y': y_plot}, columns=['x', 'y'])

In [None]:
sns.scatterplot(data=dataset, x="x", y="y")
plt.plot(X_plot, y_plot, color='r')