# Machine Learning 
### Objective: Use past data to predict future data. 

#### Note:
- What are methods


#### Machine Learning general template:
1. Import libraries (essentially code that other people have already written - you can just import and use the methods to make your life easier)
2. Load the dataset - as a pandas dataframe 
3. Analyse the data (know which are your features and which is the target variable - thing you are predicting) & Clean the data (may need to delete irrelevant stuff, missing values, remove outliers, etc.)
4. Shuffle the data to add to the randomness and remove possible patterns in the data
5. Split all the data into features and target class & Convert the loaded pandas dataframe into numpy arrays that can be used by scikit_learn (just something you gotta do)
6. Save your new clean numpy array data to use 
7. Normalise the data 
8. Within your features, split it into training_inputs, testing_inputs (usually 80/20 or 70/30 - depends on your data) & at the same time within your target, split into the training_classes, testing_classes 
9. There are many different classifiers that will do the bulk of the predicting for you - you just need to load your data into the classifier to train it - then get the accuracy score of each classifier that you test
10. When you have tested multiple, get the one with the highest score to make predictions with your testing data - using the testing data sets instead of the training data sets ensures that your classifier is seeing new data - makes sure it's accurate
11. Load in the testing data to get prediction and accuracy score 
12. With your classifier, you can create your own model to use when you want to predict new data 
13. Input own feature data and normalize it
14. Run the prediction and print the result 


## 1. Import libraries

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
from pandas.plotting import scatter_matrix
import matplotlib.pyplot as plt
from sklearn import model_selection
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn import svm
from sklearn import linear_model

import joblib
from sklearn.model_selection import cross_val_score
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn import utils

import pickle

## 2. Load the dataset

In [2]:
from scipy.io import loadmat 
xx=loadmat('./mu_30_new.mat')
df=xx['mu_30']
df2=xx['mu_30']
X=df[:,:8]
y=df2[:,8:9]

In [3]:
from sklearn.utils.validation import column_or_1d
y = column_or_1d(y, warn=True)

  y = column_or_1d(y, warn=True)


In [None]:
y[1200]

## 3. Normalize the data 
- Normalize the attribute data using preprocessing.StandardScaler() 
- Normalization means adjusting values measured on different scales to a common scale

In [4]:
scaler = preprocessing.StandardScaler()
X_scaled = scaler.fit_transform(X)
X_scaled

array([[-0.21403567,  0.28883223, -1.50244263, ..., -2.78638917,
         1.79544442,  1.67670329],
       [-0.16852877,  0.68935417, -1.50307563, ..., -2.78396704,
         1.7993944 ,  1.67670329],
       [ 0.38892436,  0.79743123, -1.50349763, ..., -2.78154491,
         1.80334447,  1.67670329],
       ...,
       [-0.02821673,  0.18711428,  1.11690548, ...,  0.69798513,
        -0.7957641 , -0.84383228],
       [-0.09647709, -0.10533061,  1.12302446, ...,  0.69798513,
        -0.7957641 , -0.84383228],
       [-0.03580122, -0.20704855,  1.12323546, ...,  0.69798513,
        -0.7957641 , -0.84383228]])

## 4. Within your features, split it into training_inputs, testing_inputs, training_classes, testing_classes 
- X_train, X_validation, Y_train, Y_validation

In [5]:
seed = 18
(training_inputs,
 testing_inputs,
 training_target,
 testing_target) = train_test_split(X_scaled, 
                                     y, 
                                     test_size=0.20, 
                                     train_size=0.80, 
                                     random_state=seed)

## 5. Classifiers
- If the target you are trying to predict is a continuous data, you need to use 'best-fit-line' (regression) kind of classifiers
- If the target is a YES/NO kind of thing, you can use others
- In this case, it's the former, so must use regressor algorithms

In [6]:
classifiers = [
    svm.SVR(gamma='auto'),
    linear_model.SGDRegressor(),
    linear_model.BayesianRidge(),
    linear_model.LassoLars(),
    linear_model.ARDRegression(),
    linear_model.PassiveAggressiveRegressor(),
    linear_model.TheilSenRegressor(),
    linear_model.LinearRegression()]

In [7]:
# Loop through each classifier in the array above 
for classifier in classifiers:
    print("======================================================================")
    print(classifier)
    clf = classifier
    cv_scores = cross_val_score(clf, X_scaled, y, cv=10, scoring='neg_mean_absolute_error')
    print("----- SCORE -----")
    print(cv_scores.mean())
    print("======================================================================")

SVR(gamma='auto')
----- SCORE -----
-0.48438505107155194
SGDRegressor()
----- SCORE -----
-0.5049590895516572
BayesianRidge()
----- SCORE -----
-0.5125107299053805
LassoLars()
----- SCORE -----
-0.5386993431094301
ARDRegression()

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), LassoLars())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * np.sqrt(n_samples). 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), LassoLars())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)




----- SCORE -----
-0.5143569039603432
PassiveAggressiveRegressor()
----- SCORE -----
-1.1409884546806486
TheilSenRegressor()
----- SCORE -----
-0.6336410240188096
LinearRegression()
----- SCORE -----
-0.5133297475758838


In [8]:
# Loop through each classifier in the array above 
for classifier in classifiers:
    print("======================================================================")
    print(classifier)
    clf = classifier
    cv_scores = cross_val_score(clf, X_scaled, y, cv=10, scoring='neg_root_mean_squared_error')
    print("----- SCORE -----")
    print(cv_scores.mean())
    print("======================================================================")

SVR(gamma='auto')
----- SCORE -----
-0.5935371833999096
SGDRegressor()
----- SCORE -----
-0.5863210865612182
BayesianRidge()
----- SCORE -----
-0.6043851102021379
LassoLars()


If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), LassoLars())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * np.sqrt(n_samples). 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), LassoLars())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)



----- SCORE -----
-0.6835596910555135
ARDRegression()
----- SCORE -----
-0.6063455827443562
PassiveAggressiveRegressor()
----- SCORE -----
-1.1186780366872282
TheilSenRegressor()
----- SCORE -----
-0.7949818678074712
LinearRegression()
----- SCORE -----
-0.6052966762262091


## 6. Choose classifier with the highest score 
- Looks like svm.SVR is the best

## 7. Do prediction with test data using the selected classifier from above
- Make predictions with testing data

In [None]:
clf = svm.SVR(gamma='auto')

clf.fit(training_inputs, training_target)
predictions = clf.predict(testing_inputs)
print(predictions)

# There will be 200 predictions because 20% of 1000 is 200
print(len(predictions)) 

## 8. Create model 
- Can use this model in other programs 

In [None]:
pickle.dump(clf, open('trained_sensor_model.pkl', 'wb'))

In [None]:
# Load the model 
clf_svm_model = pickle.load(open('trained_sensor_model.pkl', 'rb'))
clf_svm_model

## 9. Input own feature data and normalize it
-  'light_avg', 'humidity_avg', 'latitude', 'longitude', 'elevation'

In [None]:
X_scaled[1]

In [None]:
X[1]

In [None]:
y[1]

In [None]:
data = np.array([[-1.1430500e-01,  2.1028900e-01,  1.6401000e-02, -4.3494700e-01,
        8.3916729e+01,  8.1053279e+01,  3.1084237e+01,  3.0894988e+01]])
data_scaled = scaler.transform(data)
data_scaled

## 10. Run prediction 

In [None]:
# Make a prediction against prediction features
prediction = clf_svm_model.predict(data_scaled)

print("The predicted yaw rate:")
print(prediction)