# Linear Regression using Tensorflow

In this notebook, we make a custom loss funtion for tensor flow, then do a linear regression on our features for various optimizers (mostly Adam, after preliminary testing) in tensor flow models.

Result: The best linear regression gives an mae of 1.210 on k-fold validation and uses the following features: 
-az_t_pred
-ze_t_pred
-low_cluster (cutoff of 2)
-high_cluster (cutoff of 9)
-mse_cat (cutoff of 721)
-x_skew (cutoff of .9)
-y_skew
-z_skew
It also uses the Adam optimizer and 9 epochs

In [1]:
# import modules
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import KFold
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import tensorflow as tf
import math 
from tensorflow.keras import layers
from tensorflow import keras




# Loss Function

In [2]:
# Custom loss function
def get_mae(az_true, zen_true, az_pred, zen_pred): 
    """
    Given a predicted and true azimuth and zenith, compute the mae (mean angular error)
    """    
    
    # pre-compute all sine and cosine values
    sa1 = tf.math.sin(az_true)
    ca1 = tf.math.cos(az_true)
    sz1 = tf.math.sin(zen_true)
    cz1 = tf.math.cos(zen_true)
    
    sa2 = tf.math.sin(az_pred)
    ca2 = tf.math.cos(az_pred)
    sz2 = tf.math.sin(zen_pred)
    cz2 = tf.math.cos(zen_pred)
    
    # scalar product of the two cartesian vectors (x = sz*ca, y = sz*sa, z = cz)
    scalar_prod = sz1*sz2*(ca1*ca2 + sa1*sa2) + (cz1*cz2)
    
    # scalar product of two unit vectors is always between -1 and 1, this is against nummerical instability
    # that might otherwise occure from the finite precision of the sine and cosine functions
    scalar_prod = tf.clip_by_value(scalar_prod, -1.0, 1.0)
    
    # convert back to an angle (in radian)
    return tf.reduce_mean(tf.abs(tf.acos(scalar_prod)))

def mae(y_true, y_pred): 
    #return tf.reduce_mean(tf.abs(y_true - y_pred))
    #print(type(y_true))
    ta = tf.gather(y_true, 0)
    tz = tf.gather(y_true, 1)
    pa = tf.gather(y_pred, 0)
    pz = tf.gather(y_pred, 1)
    return get_mae(ta, tz, pa, pz)

# Set up train/test data

In [5]:
# Import data
event_data = pd.read_csv("C:/Users/k_vsl/Documents/Erdos/Boot Camp/ice-cube-katja/features-final.csv")

In [6]:
# X contains all categorical variables with cutoff of .5 for skew
# V contains a subset of categorical variables, where skew has a cutoff of .9 and we categorize into low, medium, and high clusters
X = event_data
X = X.set_index("event_id")
y = event_data[['event_id', 'az_true', 'ze_true']]
y = y.set_index("event_id")
X = X[['az_t_pred', 'ze_t_pred', 'cat_x', 'cat_y', 'cat_z', 'mse_cat', 'cat_1.0',
       'cat_2.0', 'cat_3.0', 'cat_4.0', 'cat_5.0', 'cat_6.0', 'cat_7.0',
       'cat_8.0', 'cat_9.0', 'cat_10.0']]
V = event_data[['az_t_pred', 'ze_t_pred', 'num_clusters','mse_cat','per_x', 'per_y', 'per_z']]
V['x_skew'] = [(val > .9) for val in V.per_x]
V['y_skew'] = [(val > .9) for val in V.per_y]
V['z_skew'] = [(val > .9) for val in V.per_z]
V['low_cluster'] = [(c < 2) for c in V.num_clusters]
V['high_cluster'] = [(c > 9) for c in V.num_clusters]
V.replace({False: 0, True: 1}, inplace=True)
w = y
V = V[['az_t_pred', 'ze_t_pred', 'mse_cat', 'x_skew', 'y_skew', 'z_skew', 'low_cluster', 'high_cluster']]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  V['x_skew'] = [(val > .9) for val in V.per_x]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  V['y_skew'] = [(val > .9) for val in V.per_y]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  V['z_skew'] = [(val > .9) for val in V.per_z]
A value is trying to be set on a copy of a slice from a DataFrame.


In [7]:
# Separate train and test

# Separate out a final training set
# random seed = 134
# test size = 25%
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                             shuffle = True,
                                                             random_state = 134, 
                                                             test_size = .25)
V_train, V_test, w_train, w_test = train_test_split(V, w, 
                                                             shuffle = True,
                                                             random_state = 134, 
                                                             test_size = .25)

In [8]:
# k-fold cross validation
# this cell imitates the erdos lectures notes on kfold cross validation , k = 5
# random seed to all splits random_seed = 134
kfold = KFold(n_splits = 5,
             shuffle = True,
             random_state = 134)

# Helper training functions

In [11]:
# Function which runs through optimizers, various epochs, kfold validation
def tensor_train(X,  y, optimizers, shape, epochs): 
    n = len(optimizers)
    losses = np.zeros((n,10,5))
    i = 0
    for opt in optimizers: 
        print("Trying Optimizer " + str(opt))
        k = 0
        for train_index, test_index in kfold.split(X, y):
                    
            ## get the kfold training data
            X_train = X.iloc[train_index,:]
            y_train = y.iloc[train_index]

            ## get the holdout data
            X_holdout = X.iloc[test_index,:]
            y_holdout = y.iloc[test_index]
                    
            j = 0
            for e in epochs: 

                ## Fit the data
                model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(shape,))])
                model.compile(optimizer = opt, loss = mae)
                model.fit(X_train, y_train, epochs=e)
                loss = model.evaluate(X_holdout, y_holdout)

                losses[i][j][k] = loss
                j += 1
                    
            k += 1
        
        i += 1
            
    return losses

# Function which tests one optimizer with a given number of epochs
def tensor_train_spec(X,y, opt, shape, epoch):
    losses = np.zeros(5)
    i = 0
    for train_index, test_index in kfold.split(X, y):
        ## get the kfold training data
        X_train = X.iloc[train_index,:]
        y_train = y.iloc[train_index]

        ## get the holdout data
        X_holdout = X.iloc[test_index,:]
        y_holdout = y.iloc[test_index]
        model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(shape,))])
        model.compile(optimizer = opt, loss = mae)
        model.fit(X_train, y_train, epochs=epoch)
        loss = model.evaluate(X_holdout, y_holdout)
        losses[i] = loss
        i +=1
    return losses

# Training

In [None]:
# Look at X
epochs = [5,10,15]
optimizers = ['Adam', 'Adadelta','sgd']
losses = tensor_train(X_train, y_train, optimizers, 16, epochs)
n = len(optimizers)
m = len(epochs)
means = np.zeros((n,m))
for i in range(0,n): 
    for j in range(0,m): 
        means[i][j] = losses[i][j].mean()

Trying Optimizer Adam
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Trying Optimizer Adadelta
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Ep

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/10
 667/3750 [====>.........................] - ETA: 12s - loss: 1.4226

Conclusion: Adam is the best optimizer to use for our scenario. We may want to cut down on the number of categorical variables to prevent overfitting....

In [None]:
n = len(optimizers)
m = len(epochs)
means = np.zeros((n,m))
for i in range(0,n): 
    for j in range(0,m): 
        means[i][j] = losses[i][j].mean()

In [None]:
# Look at V
epochs = [6,9, 13,26]
optimizers = ['Adam']
mae = np.zeros(len(epochs))
i = 0
for e in epochs: 
    loss = tensor_train_spec(V_train,w_train, 'Adam', 8, e)
    print(loss)
    print(loss.mean())
    mae[i] = loss.mean()
    i += 1

In [None]:
Result: 9 epochs appears to be the best
    
6 epochs: 1.2728947401046753
9 epochs: 1.2104717016220092
13 epochs: 1.2120121955871581
26 epochs: 1.265834665298462

In [None]:
# Alternate learning rate
# Didn't end up doing this fully as preliminary results were not positive
decay_steps = 1000
initial_learning_rate = .0001
lr_schedule = keras.optimizers.schedules.CosineDecay(
    initial_learning_rate, decay_steps, warmup_target= None,
    warmup_steps=0
)
opt = keras.optimizers.Adam(learning_rate=lr_schedule)
loss = tensor_train_spec(V_train,w_train, opt, 8, 5)

# Test final model

# Conclusions