<a href="https://colab.research.google.com/github/EthanSeungwonOh/Real_Time_Neural_Decoding/blob/master/neural_decoding_ML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ML Experiment Tips!
- If your model is underfitting the training data, adding more training examples will not help. You need to use a more complex model or come up with better features.

- One way to improve an overfitting model is to feed it more training data until the validation error reaches the training error.

- Bias-Variance Tradeoff: Increasing a model’s complexity will typically increase its variance and reduce its bias. Conversely, reducing a model’s complexity increases its bias and reduces its variance. This is why it is called a trade-off

In [1]:
from google.colab import drive
drive.mount('/content/drive')

# LSTM and CNN for sequence classification in the IMDB dataset
import numpy as np
import os
from numpy import loadtxt

# Binary Classification with Sonar Dataset: Baseline
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
from sklearn.metrics import accuracy_score

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [15]:
# load the dataset
path = '/content/drive/My Drive/data/data_lda_20170828/1005'
isInjection = False

data_list = [(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,2),(3,3),(4,1),(4,2),(4,3),(5,1),(5,2),(5,3),(3,1)] # list of dictionary : a form of day and inj
cellMouse = path.split('/')[-1]

# dirs = glob.glob(os.path.join(path, '1*'))
X_list = []
Y_list = []

# for each video folder
for day, inj in data_list:
  datasetName = os.path.join(path, 'TRACES_'+ cellMouse +'_'+ str(day) +'_' + str(inj) +'.csv')
  labelName = os.path.join(path, 'BEHAVIOR_'+ cellMouse +'_'+ str(day) +'_' +str(inj)+'.csv')

  dataset = np.transpose(loadtxt(datasetName, delimiter=','))
  label = loadtxt(labelName, delimiter=',')[:,2]

  X_list.append(dataset)
  Y_list.append(label)

X = np.vstack(X_list)
Y = np.vstack(Y_list)
Y = Y.reshape(-1)

print(X.shape, Y.shape)

(45000, 140) (45000,)


In [16]:
# for LSTM/GRU
time_step = 3000
num_features = X.shape[1]

x = X.reshape(-1, time_step, num_features)
y = Y.reshape(-1, time_step, 1)
print(x.shape, y.shape)

(15, 3000, 140) (15, 3000, 1)


In [23]:
timestep = 100

# train_X, train_y, test_X, test_y = train_test_split()

train_X = x[:13]
train_y = y[:13]
X_train = train_X.reshape(-1, timestep, num_features)
y_train = train_y.reshape(-1, timestep, 1)
print(X_train.shape, y_train.shape)

val_X = x[13:14]
val_y = y[13:14]
X_val = val_X.reshape(-1, timestep, num_features)
y_val = val_y.reshape(-1, timestep, 1)
print(X_val.shape, y_val.shape)

test_X = x[14:15]
test_y = y[14:15]
X_test = test_X.reshape(-1, timestep, num_features)
y_test = test_y.reshape(-1, timestep, 1)
print(X_test.shape, y_test.shape)

(390, 100, 140) (390, 100, 1)
(30, 100, 140) (30, 100, 1)
(30, 100, 140) (30, 100, 1)


## LSTM/GRU

In [24]:
model = keras.Sequential(
  [
   layers.LSTM(units=20, input_shape=(timestep, num_features), return_sequences=True),
   layers.Dropout(0.2),
   layers.LSTM(units=30, input_shape=(timestep, num_features), return_sequences=True),
   layers.Dropout(0.2),
   layers.TimeDistributed(layers.Dense(10, activation='relu')),
   layers.TimeDistributed(layers.Dense(1, activation='sigmoid'))
  ]
)
 
# lr_schedule = keras.optimizers.schedules.ExponentialDecay(initial_learning_rate=1e-2, decay_steps=10000, decay_rate=0.9)
opt = keras.optimizers.Adam(learning_rate=0.01) 
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_4 (LSTM)                (None, 100, 20)           12880     
_________________________________________________________________
dropout_4 (Dropout)          (None, 100, 20)           0         
_________________________________________________________________
lstm_5 (LSTM)                (None, 100, 30)           6120      
_________________________________________________________________
dropout_5 (Dropout)          (None, 100, 30)           0         
_________________________________________________________________
time_distributed_4 (TimeDist (None, 100, 10)           310       
_________________________________________________________________
time_distributed_5 (TimeDist (None, 100, 1)            11        
Total params: 19,321
Trainable params: 19,321
Non-trainable params: 0
__________________________________________________

In [26]:
# train model (# of examples / batch_size = # of batches)
print(X_train.shape, y_train.shape)
history = model.fit(X_train, y_train, epochs=20, batch_size=5, validation_data=(X_val, y_val))

# evaluate model on the test set
model.evaluate(X_test, y_test)

def cal_acc(model, X_test, y_test):
  y_pred = np.round(model.predict(X_test))
  
  return np.mean(np.equal(y_pred, y_test))

print(cal_acc(model, X_test, y_test))

(390, 100, 140) (390, 100, 1)
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
0.539


### Visualize Train/Val Loss/Acc

In [None]:
import matplotlib.pyplot as plt

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()

In [6]:
X_train= X_train.reshape(-1, num_features)
y_train= y_train.reshape(-1)
print(X_train.shape, y_train.shape)

X_test= X_test.reshape(-1, num_features)
y_test= y_test.reshape(-1)
print(X_test.shape, y_test.shape)

(12000, 273) (12000,)
(3000, 273) (3000,)


## Decision Trees

In [7]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()
svm_clf = SVC()

voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)],
    voting='hard')
voting_clf.fit(X_train, y_train)

KeyboardInterrupt: ignored

In [None]:
for clf in (log_clf, rnd_clf, svm_clf, voting_clf):
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

## Random Forest

In [8]:
from sklearn.ensemble import RandomForestClassifier

rnd_clf = RandomForestClassifier(n_estimators=500, max_leaf_nodes=16, n_jobs=-1)
rnd_clf.fit(X_train, y_train)

y_pred_rf = rnd_clf.predict(X_test)
accuracy_score(y_test, y_pred_rf)

0.588

## Support Vector Machines


### Linear SVM Classification
If two classes can clearly be linearly separated easily with a straight line (decision boundary), we start with using LinearSVM.

However, if the decision boundaries come so close to the instances, then these models will probably not perform well on new instances.

Therefore, we aim to not only separate the two classes but also stays as far away from the closest training instances as possible (think SVM of fitting the widest possible street between the classification : large margin classification)

decision boundary is full determined by the instances located on the edge of the street and such instances are called **"SUPPORT VECTORS"**
```
- SVMs are sensitive to the feature scales, be sure to use StandardScaler for feature scaling
- If your SVM model is overfitting, try regularizing it by reducing C
```

### Soft vs. Hard margin classification
all instance must be off the street and on the right side, this is called hard margin. There are two main issues with it that it only works when the data is linearly separable and sensitive to outliars.

To avoid these issues, use a more flexible model called soft margin. The objective is to find a good balance between maximizing the margin and limiting the margin violations.



In [9]:
# use the SVC class with a linear kernel.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

# scales the features 
svm_clf = Pipeline([
        ("scaler", StandardScaler()),
        ("linear_svc", LinearSVC(C=1, loss="hinge")),
    ])

# trains a linear SVM model (C=1 and hinge loss)

# use the model to make predictions
# Unlike Logistic Regression classifiers, SVM classifiers do not output probabilities for each class)

svm_clf.fit(X_train, y_train)
y_pred = svm_clf.predict(X_test)
print(accuracy_score(y_test, y_pred))

0.6023333333333334




Instead of using the LinearSVC class, we could use the SVC class with a linear kernel. When creating the SVC model, we would write SVC(kernel="linear", C=1).

```
The LinearSVC class regularizes the bias term, so you should center the training set first by subtracting its mean. This is automatic if you scale the data using the StandardScaler. Also make sure you set the loss hyperparameter to "hinge", as it is not the default value. Finally, for better performance, you should set the dual hyperparameter to False, unless there are more features than training instances.
```


## Nonlinear SVM Classification

many datasets are not even close to being linearly separable.

One approach is to add more features (polynomial features) and in somes cases it results in linearly separable dataset.

Although it is simple to implement, it poses issues that at a low degree it cannot deal with complex dataset and with a high degree it creates a huge number of features, making the training too slow.

In [None]:
# Polynomial Features
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import StandardScaler
 
polynomial_svm_clf = Pipeline([
        ("poly_features", PolynomialFeatures(degree=3)),
        ("scaler", StandardScaler()),
        ("svm_clf", LinearSVC(C=10, loss="hinge"))
    ])

polynomial_svm_clf.fit(X_train, y_train)
y_pred = polynomial_svm_clf.predict(X_test)
print(accuracy_score(y_test, y_pred))

### kernel trick
makes it possible to get the same results as if you had added many polynomial features, even with very high-degree polynomials, without actually having to add them.

In [None]:
# Polynomial Kernel
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

poly_kernel_svm_clf = Pipeline([
        ("scaler", StandardScaler()),
        ("svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5))
    ])

poly_kernel_svm_clf.fit(X_train, y_train)
y_pred = poly_kernel_svm_predict(X_test)
print(accuracy_score(y_test, y_pred))

**With so many kernels to choose from, how can you decide which one to use?**
```
As a rule of thumb, you should always try the linear kernel first (remember that LinearSVC is much faster than SVC(kernel="linear")), especially if the training set is very large or if it has plenty of features. If the training set is not too large, you should also try the Gaussian RBF kernel; it works well in most cases. Use cross-validation and grid search for searching a few other kernels. You’d want to experiment like that especially if there are kernels specialized for your training set’s data structure.
```


For online learning, we could use the SGDClassifier class, with SGDClassifier(loss="hinge", alpha=1/(m*C)). This applies regular Stochastic Gradient Descent to train a linear SVM classifier. It does not converge as fast as the LinearSVC class, but it can be useful to handle online classification tasks or huge datasets that do not fit in memory (out-of-core training).

In [None]:
# Online SVM : use Gradient Descent to minimize the cost function derived from the primal problem. 
from sklearn.svm import SGDClassifier

# Unfortunately, Gradient Descent converges much more slowly than the methods based on QP.

### Under the Hood
The linear SVM classifier model predicts the class of a new instance x by simply computing the decision function w⊺ x + b = w1 x1 + ⋯ + wn xn + b. If the result is positive, the predicted class ŷ is the positive class (1), and otherwise it is the negative class (0)


Consider the slope of the decision function: it is equal to the norm of the weight vector, ∥ w ∥. If we divide this slope by 2, the points where the decision function is equal to ±1 are going to be twice as far away from the decision boundary. 

```
dividing the slope by 2 will multiply the margin by 2. The smaller the weight vector w, the larger the margin.
```

Training a linear SVM classifier means finding the values of w and b that make this margin as wide as possible while avoiding margin violations (hard margin) or limiting them (soft margin).