## Week 4: Supervised Learning III
### Neural Networks
So, in this week we will be using pytorch, one of the best libraries for neural networks.
* To apply the perceptron and multilayer perceptron from Week 4 lecture to automatic detection of the number of days of ground frost and snow based on other weather variables.


To achieve the above goal, we'll need first to seperate the dataset into training and testing. We'll apply 70:30 split and then further split testing into validation:test as 50:50.

We'll first import the csv file, read it and then apply seperation

In [2]:
import numpy as np
import csv

file_path = "/Users/suli/Documents/source/repo/MachineLearning/Week 4/curated_data_1month_2010-2022_nonans.csv"

data_list = []
with open(file_path, newline='') as csvfile:
    reader = csv.reader(csvfile)
    row_count = 0
    for row in reader:
        if row_count > 0:
            data_list.append(row)
        row_count += 1
        
data = np.array(data_list)
print(data.shape)
print(data[0])

feat_col = [5, 6, 7, 8, 9, 10, 12]
ground_frost_col = 4
snow_col = 11 # should be 11 for snow and 12 for rain

feats = data[:, feat_col]
ground_frost_label = data[:, ground_frost_col]
snow_label = data[:, snow_col]

print("\n A peek at the dataset features: \n"+str(feats))
print("\n A peek at the ground frost labels: \n"+str(ground_frost_label))
print("\n A peek at the snow labels: \n"+str(snow_label))

(10296, 13)
['8' '4' '1' '1' '9.84928987' '89.36982749' '1022.665365' '64.51156417'
 '9.12500556' '6.45810733' '6.727447717' '0.697199793' '112.2352382']

 A peek at the dataset features: 
[['89.36982749' '1022.665365' '64.51156417' ... '6.45810733'
  '6.727447717' '112.2352382']
 ['89.44621093' '1022.708003' '57.48681167' ... '5.881910517'
  '6.230648281' '116.3547495']
 ['89.34354469' '1022.436839' '68.29351494' ... '4.628301269'
  '6.290806564' '57.53778808']
 ...
 ['87.83702931' '1006.457057' '13.8800195' ... '4.960640256'
  '1.856263012' '177.2424627']
 ['88.81163154' '1006.622483' '20.58531624' ... '4.936354968'
  '0.775835354' '135.4028786']
 ['82.76015165' '1005.938301' '10.53091935' ... '8.380819417'
  '3.545097582' '140.831213']]

 A peek at the ground frost labels: 
['9.84928987' '10.85267889' '12.97189949' ... '21.7275541' '23.77582838'
 '17.35386163']

 A peek at the snow labels: 
['0.697199793' '1.629525681' '1.172937726' ... '7.997095434' '8.468158997'
 '6.3599061']


Right, so now we will be working on splitting into training and validation and tests

In [3]:
from sklearn.model_selection import train_test_split

all_ids = np.arange(feats.shape[0])
print(all_ids)

random_seed =1

train_set_id, rem_set_id = train_test_split(all_ids, test_size=0.3, train_size=0.7, random_state=random_seed, shuffle=True)

validate_set_id, test_set_id = train_test_split(rem_set_id, test_size=0.5, train_size=0.5, random_state=random_seed, shuffle=True)

train_set_data = feats[train_set_id,:]
train_ground_frost_labels = ground_frost_label[train_set_id]
train_snow_labels = snow_label[train_set_id]

validate_set_data = feats[validate_set_id,:]
validate_ground_frost_labels = ground_frost_label[validate_set_id]
validate_snow_labels = snow_label[validate_set_id]

test_set_data = feats[test_set_id,:]
test_ground_frost_labels = ground_frost_label[test_set_id]
test_snow_labels = snow_label[test_set_id]



[    0     1     2 ... 10293 10294 10295]


Perfect, now we have all of our data ready but before moving to the next step. it's better to scale our training data such that all datapoints are on a 0-1 scale. This makes our neural network perform better. Simply because If one feature has values in the range of [0,1], and another feature is in the range of [1000,5000], the model may assign more importance to the larger values.


In [4]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(feats)
scaled_feats = scaler.transform(feats)
print("\n A peek at the scaled dataset features: \n"+str(scaled_feats))


scaled_train_data = scaled_feats[train_set_id, :]
scaled_val_data = scaled_feats[validate_set_id, :]
scaled_test_data = scaled_feats[test_set_id, :]


 A peek at the scaled dataset features: 
[[ 1.32716655  1.57084147 -0.88330353 ...  1.51819974 -0.59716362
   0.21295927]
 [ 1.34196535  1.57808582 -0.99442844 ...  1.03005351 -0.70725389
   0.27324641]
 [ 1.32207443  1.53201409 -0.82347666 ... -0.03198748 -0.69392287
  -0.58751254]
 ...
 [ 1.03019683 -1.18300764 -1.68424647 ...  0.24956567 -1.67661336
   1.16430985]
 [ 1.2190197  -1.15490117 -1.57817505 ...  0.2289915  -1.91603506
   0.55200694]
 [ 0.04658465 -1.27114612 -1.73722605 ...  3.1470957  -1.30236928
   0.6314481 ]]


Perfect, now we're ready to train our perceptron!

So we will start by the multiplication of the x and w. Where:
- x: input features (data points as a matrix)
- w: weight vector (learned parameters of the perceptron)
- the dot product xw gives a raw score that determines classification.

After this we will be applying ativation function (sign(xw))
- The perceptron uses a step function as an activation function
- The function determines the class based on whether xw is positive or negative.


In [5]:
# Let's build a perceptron from scratch.
def predict (x,w):
  # compute the dot product of x and w
  y_cap = np.matmul(x,w)
  # Compute the activation sign(xw)
  np.place(y_cap, y_cap>=0, 1)
  np.place(y_cap, y_cap<0,-1)
  return y_cap


    