# Basic Neural Net from scratch

This notebook has the purpose of creating a simple neural net from scratch only wiht pytorch. 

To do this, I need: 
 - linear
 - Relu
 - sigmoid
 - softmax
 - backwards, i.e. the parameters, gradients
 - some kind of loss function, MSE, RMSE? - cross entropy loss it is
 

class DotProductBias(Module):
    def __init__(self, n_users, n_movies, n_factors, y_range=(0,5.5)):
        self.user_factors = create_params([n_users, n_factors])
        self.user_bias = create_params([n_users])
        self.movie_factors = create_params([n_movies, n_factors])
        self.movie_bias = create_params([n_movies])
        self.y_range = y_range
        
    def forward(self, x):
        users = self.user_factors[x[:,0]]
        movies = self.movie_factors[x[:,1]]
        res = (users*movies).sum(dim=1)
        res += self.user_bias[x[:,0]] + self.movie_bias[x[:,1]]
        return sigmoid_range(res, *self.y_range)



before doing the model - do not forget to clean the data and transform it so that most of the variation is between 0 and 1.

With e.g. divide it by its maximum, so that it becomes like a percentage or use a sigmoid.

## import data and shift it

In [1]:
import pandas as pd

import os
iskaggle = os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')

if iskaggle:
    df = pd.read_csv("/kaggle/input/btcusdt-2023-6-9/btcusdt-2023-6_9.csv", index_col=0).reset_index(drop=True)
else:
    df = pd.read_csv("lesson5-random-forests/btc-data/btcusdt-2023-6_9.csv", index_col=0).reset_index(drop=True)


print(df.shape)
df.head(3)

(11716, 6)


Unnamed: 0,time,open,high,low,close,vol
0,2023.06.01 00:00,27103.1,27108.1,27080.6,27096.9,386.675
1,2023.06.01 00:15,27096.9,27096.9,27036.7,27047.0,408.68
2,2023.06.01 00:30,27047.0,27077.4,27041.0,27054.9,275.08


In [2]:
# shift data 3 times, so that in one row there is information on the last 3 candles 
# therefore: the original candle data is the target data

df_s1 = df.shift(1).add_suffix("_s1")
df_s2 = df.shift(2).add_suffix("_s2")
df_s3 = df.shift(3).add_suffix("_s3")

print(df_s3.shape)
df_s3.head(3)

(11716, 6)


Unnamed: 0,time_s3,open_s3,high_s3,low_s3,close_s3,vol_s3
0,,,,,,
1,,,,,,
2,,,,,,


In [3]:
df_merge = pd.concat([df_s3,df_s2, df_s1, df], axis=1)
print(df_merge.shape)
df_merge.head(5)

(11716, 24)


Unnamed: 0,time_s3,open_s3,high_s3,low_s3,close_s3,vol_s3,time_s2,open_s2,high_s2,low_s2,...,high_s1,low_s1,close_s1,vol_s1,time,open,high,low,close,vol
0,,,,,,,,,,,...,,,,,2023.06.01 00:00,27103.1,27108.1,27080.6,27096.9,386.675
1,,,,,,,,,,,...,27108.1,27080.6,27096.9,386.675,2023.06.01 00:15,27096.9,27096.9,27036.7,27047.0,408.68
2,,,,,,,2023.06.01 00:00,27103.1,27108.1,27080.6,...,27096.9,27036.7,27047.0,408.68,2023.06.01 00:30,27047.0,27077.4,27041.0,27054.9,275.08
3,2023.06.01 00:00,27103.1,27108.1,27080.6,27096.9,386.675,2023.06.01 00:15,27096.9,27096.9,27036.7,...,27077.4,27041.0,27054.9,275.08,2023.06.01 00:45,27054.9,27084.0,27054.8,27084.0,218.143
4,2023.06.01 00:15,27096.9,27096.9,27036.7,27047.0,408.68,2023.06.01 00:30,27047.0,27077.4,27041.0,...,27084.0,27054.8,27084.0,218.143,2023.06.01 01:00,27084.0,27113.9,27073.5,27100.0,329.412


In [4]:
df_merge.columns

Index(['time_s3', 'open_s3', 'high_s3', 'low_s3', 'close_s3', 'vol_s3',
       'time_s2', 'open_s2', 'high_s2', 'low_s2', 'close_s2', 'vol_s2',
       'time_s1', 'open_s1', 'high_s1', 'low_s1', 'close_s1', 'vol_s1', 'time',
       'open', 'high', 'low', 'close', 'vol'],
      dtype='object')

In [5]:
# do not use the time columns 

df_train = df_merge.filter(items = ['open_s3', 'high_s3', 'low_s3', 'close_s3', 'vol_s3',
       'open_s2', 'high_s2', 'low_s2', 'close_s2', 'vol_s2',
       'open_s1', 'high_s1', 'low_s1', 'close_s1', 'vol_s1', 
       'open', 'high', 'low', 'close', 'vol']).dropna()
print(df_train.shape)
df_train.head(3)

(11713, 20)


Unnamed: 0,open_s3,high_s3,low_s3,close_s3,vol_s3,open_s2,high_s2,low_s2,close_s2,vol_s2,open_s1,high_s1,low_s1,close_s1,vol_s1,open,high,low,close,vol
3,27103.1,27108.1,27080.6,27096.9,386.675,27096.9,27096.9,27036.7,27047.0,408.68,27047.0,27077.4,27041.0,27054.9,275.08,27054.9,27084.0,27054.8,27084.0,218.143
4,27096.9,27096.9,27036.7,27047.0,408.68,27047.0,27077.4,27041.0,27054.9,275.08,27054.9,27084.0,27054.8,27084.0,218.143,27084.0,27113.9,27073.5,27100.0,329.412
5,27047.0,27077.4,27041.0,27054.9,275.08,27054.9,27084.0,27054.8,27084.0,218.143,27084.0,27113.9,27073.5,27100.0,329.412,27100.0,27159.0,27100.0,27142.4,979.655


In [6]:
#df_train
#df_train.filter(items = ["open_s3"])
#df_train.loc[["open_s3"]]



# fastai neural net from scratch

https://www.kaggle.com/code/jhoward/linear-model-and-neural-net-from-scratch

In [7]:
import torch
from torch import tensor

from https://pytorch.org/tutorials/beginner/nn_tutorial.html

"PyTorch provides methods to create random or zero-filled tensors, which we will use to create our weights and bias for a simple linear model. These are just regular tensors, with one very special addition: we tell PyTorch that they require a gradient. **This causes PyTorch to record all of the operations done on the tensor, so that it can calculate the gradient during back-propagation automatically!**

For the weights, we set requires_grad after the initialization, since we don’t want that step included in the gradient. (Note that a trailing _ in PyTorch signifies that the operation is performed in-place.)"

## dep, indep & weights

In this setting we will just try to predict the high value based on the previous three candles. The results will likely not be good since point predictions are tricky.

In [8]:
t_dep = tensor(df_train['high'].values, dtype=torch.float)
print(t_dep.shape) # 1 D tensor
t_dep

torch.Size([11713])


tensor([27084.0000, 27113.9004, 27159.0000,  ..., 27058.1992, 27061.1992,
        27066.6992])

All variables are used instead of the high value which is the dependend value.

In [9]:
t_indep = tensor(df_train.loc[:, df_train.columns != "high"].values, dtype=torch.float)

t_indep.shape # 2 D tensor > rows with observations and columns with features

torch.Size([11713, 19])

The number of features needs to be the same as the lenght of the weights, because they will be multiplicated.

In [10]:
import math

n,c = t_indep.shape

# "We are initializing the weights here with Xavier initialisation (by multiplying with 1/sqrt(n))."
weights = torch.randn(c) / math.sqrt(n)
weights.requires_grad_()
bias = torch.zeros(n, requires_grad=True)
weights.shape

torch.Size([19])

In [11]:
t_indep*weights

tensor([[ 2.5784e+02, -4.4804e+01,  1.3625e+02,  ..., -9.5710e+01,
          3.6488e+02, -7.2410e-02],
        [ 2.5778e+02, -4.4785e+01,  1.3603e+02,  ..., -9.5776e+01,
          3.6509e+02, -1.0934e-01],
        [ 2.5730e+02, -4.4753e+01,  1.3605e+02,  ..., -9.5870e+01,
          3.6566e+02, -3.2518e-01],
        ...,
        [ 2.5709e+02, -4.4667e+01,  1.3589e+02,  ..., -9.5576e+01,
          3.6452e+02, -6.5783e-02],
        [ 2.5697e+02, -4.4688e+01,  1.3590e+02,  ..., -9.5682e+01,
          3.6448e+02, -6.0421e-02],
        [ 2.5711e+02, -4.4698e+01,  1.3592e+02,  ..., -9.5709e+01,
          3.6464e+02, -7.5299e-02]], grad_fn=<MulBackward0>)

### Sidebar: torch.randn
short check of the torch.randn function: https://pytorch.org/docs/stable/generated/torch.randn.html#torch.randn

In [12]:
torch.randn(19, 10)

tensor([[ 1.2629,  0.3949,  0.1772,  0.3931, -0.6135,  1.4620,  0.5037,  0.4781,
         -0.3800,  0.9131],
        [ 1.5714,  1.6454,  1.8434, -1.2316,  1.4499,  0.0425, -0.6158,  0.7689,
          0.5111,  1.7750],
        [-0.4827,  0.7573,  0.6069, -0.9836, -0.8243, -0.2080,  0.3320, -0.0388,
          0.7955, -0.6972],
        [-0.6047,  0.9427,  1.1139,  0.2379,  1.2735,  0.0436,  1.4785, -0.1974,
         -0.1995,  0.0701],
        [ 1.4937, -1.3698,  1.1514,  1.2670,  0.4267, -0.8906,  0.5688, -1.0516,
          1.2847,  0.3744],
        [-0.8127, -0.6426, -0.3871,  0.0048, -0.7318,  0.8114, -0.4377,  0.1703,
          0.9767, -0.6101],
        [ 0.4031, -1.1360, -0.7706, -1.0937, -1.2608,  1.0873, -0.2856,  0.5030,
          0.8613, -0.5208],
        [ 0.6159, -1.6401, -0.5317,  0.6502, -0.1975, -0.9155,  0.9679, -0.4870,
          1.0505, -0.7849],
        [ 0.3731,  0.5211,  1.5561,  0.1373, -0.1502, -0.1318,  0.5729,  0.3775,
          0.8752, -0.1949],
        [ 0.2111,  

In [13]:

# coefficients = weights
# make them from -0.5 to 0.5 from a normal distribution
coeffs = torch.rand( t_indep.shape[1] )-0.5

print(coeffs.shape)
coeffs

torch.Size([19])


tensor([ 0.1709,  0.4184, -0.4363,  0.4141,  0.0068, -0.0100,  0.0282,  0.3260,
        -0.4098,  0.1290,  0.0705, -0.0140,  0.4278,  0.4106, -0.0672,  0.2847,
        -0.1994,  0.0847, -0.2743])

In [14]:
(t_indep*coeffs)[0]

tensor([ 4.6318e+03,  1.1342e+04, -1.1815e+04,  1.1220e+04,  2.6103e+00,
        -2.7026e+02,  7.6313e+02,  8.8127e+03, -1.1083e+04,  5.2717e+01,
         1.9077e+03, -3.7909e+02,  1.1568e+04,  1.1109e+04, -1.8496e+01,
         7.7032e+03, -5.3936e+03,  2.2949e+03, -5.9834e+01])

As you can see, there are a lot of numbers up and down, since the prices range roughly from 30k to about 60k. 

Note to myself: if it does not interfere with cross entropy then I should use log on all prices.
cross entropy uses log in its calculations and it depends on the implementation if it does interfere (I guess right now).

## making the first prediction

Since we are making a linear classifier the formula for yhat or the predictions is the sum of the products of coefficients and observations. 

In other words: 

$ y = x_0 \times b_0 + x_1 \times b_1 $

where $x_0$ is the first vector of observations (=features) and $b_0$ is the first coefficient.

and then is $y$ the vector with the predictions the same length as the data frame, so this formula can be read "rowwise".

In [15]:
preds = (t_indep*coeffs).sum(axis=1)
print(preds.shape) # 1d vector of predictions
preds[0]

torch.Size([11713])


tensor(42388.0156)

In [16]:
# quick check for first row: 

print(t_indep[0])
print(coeffs)
print("\n first entry of Indep * coeffs: \n",t_indep[0][0]*coeffs[0])
print("\n Indep * coeffs: \n",t_indep[0]*coeffs)
print("\n sum of Indep * coeffs: \n", (t_indep[0]*coeffs).sum() ) #axis=1 is along the colums


tensor([27103.0996, 27108.0996, 27080.5996, 27096.9004,   386.6750, 27096.9004,
        27096.9004, 27036.6992, 27047.0000,   408.6800, 27047.0000, 27077.4004,
        27041.0000, 27054.9004,   275.0800, 27054.9004, 27054.8008, 27084.0000,
          218.1430])
tensor([ 0.1709,  0.4184, -0.4363,  0.4141,  0.0068, -0.0100,  0.0282,  0.3260,
        -0.4098,  0.1290,  0.0705, -0.0140,  0.4278,  0.4106, -0.0672,  0.2847,
        -0.1994,  0.0847, -0.2743])

 first entry of Indep * coeffs: 
 tensor(4631.8369)

 Indep * coeffs: 
 tensor([ 4.6318e+03,  1.1342e+04, -1.1815e+04,  1.1220e+04,  2.6103e+00,
        -2.7026e+02,  7.6313e+02,  8.8127e+03, -1.1083e+04,  5.2717e+01,
         1.9077e+03, -3.7909e+02,  1.1568e+04,  1.1109e+04, -1.8496e+01,
         7.7032e+03, -5.3936e+03,  2.2949e+03, -5.9834e+01])

 sum of Indep * coeffs: 
 tensor(42388.0156)


## loss: cross entropy 

In [17]:
import torch.nn.functional as F
loss_ce = F.cross_entropy(preds, t_dep)
loss_mae = F.l1_loss(preds, t_dep)
loss_mae = F.mse_loss(preds, t_dep)

print("mean absolute error:",loss_mae)
print("mean squared error:",loss_mae)
print("cross entropy:",loss_ce)

mean absolute error: tensor(2.4529e+08)
mean squared error: tensor(2.4529e+08)
cross entropy: tensor(2.1783e+12)


## gradients & learning rate

In [18]:
coeffs

tensor([ 0.1709,  0.4184, -0.4363,  0.4141,  0.0068, -0.0100,  0.0282,  0.3260,
        -0.4098,  0.1290,  0.0705, -0.0140,  0.4278,  0.4106, -0.0672,  0.2847,
        -0.1994,  0.0847, -0.2743])

In [19]:
coeffs.requires_grad_()

tensor([ 0.1709,  0.4184, -0.4363,  0.4141,  0.0068, -0.0100,  0.0282,  0.3260,
        -0.4098,  0.1290,  0.0705, -0.0140,  0.4278,  0.4106, -0.0672,  0.2847,
        -0.1994,  0.0847, -0.2743], requires_grad=True)

In [20]:
# I am not sure if the above way works if I only use the loss functions in a functional style
# therefore, I document a way to use it instantiated as a class

from torch import nn

cel = nn.CrossEntropyLoss()

preds = (t_indep*coeffs.requires_grad_()).sum(axis=1)

loss = cel(preds,t_dep)
print(loss)
loss.backward(retain_graph=True)

gradients = coeffs.grad
print(gradients)

tensor(2.1783e+12, grad_fn=<DivBackward1>)
tensor([9.5942e+11, 9.6102e+11, 6.0714e+11, 7.1157e+11, 1.4144e+13, 7.1157e+11,
        7.0193e+11, 4.3312e+11, 6.5440e+11, 1.2616e+13, 6.5440e+11, 7.0848e+11,
        5.9808e+11, 7.0733e+11, 4.3423e+12, 7.0733e+11, 6.6725e+11, 6.6860e+11,
        1.6099e+12])


In [21]:
loss.backward(retain_graph=True)
print(coeffs.grad)

tensor([1.9188e+12, 1.9220e+12, 1.2143e+12, 1.4231e+12, 2.8289e+13, 1.4231e+12,
        1.4039e+12, 8.6624e+11, 1.3088e+12, 2.5231e+13, 1.3088e+12, 1.4170e+12,
        1.1962e+12, 1.4147e+12, 8.6847e+12, 1.4147e+12, 1.3345e+12, 1.3372e+12,
        3.2198e+12])


In [22]:
# initialise coeffs and define the independent variable 
t_indep*coeffs

# define it as a function 
def calc_preds(coeffs, indeps): return (indeps*coeffs).sum(axis=1)
    
preds = (t_indep*coeffs).sum(axis=1)

In [23]:
# define the loss function

# herer it is mean absolute error
loss = torch.abs(preds-t_dep).mean()

# define it in a function
def calc_loss(coeffs, indeps, deps): return torch.abs(calc_preds(coeffs, indeps)-deps).mean()

# next steps

 - figure out when and how to use backward() and require_grad
 - put the pieces togehter: calc preds > calc loss > get gradients > subtract gradients*learningrate with preds

Note on gradients: \n
My current understand is that you need to specify the tensor of where to calculate the gradients on, to access them explicitly. \n
It seems, that the backward function needs to know the input, target and weights.

# pytorch beginner tutorial by jeremy howard

https://pytorch.org/tutorials/beginner/nn_tutorial.html

In [24]:
import torch
xb = torch.randn(64, 784)
weights = torch.randn(784, 10)

In [25]:
pred = xb@weights
pred.shape

torch.Size([64, 10])

In [26]:
print(xb.shape)
print(weights.shape)

torch.Size([64, 784])
torch.Size([784, 10])


# pytorch basics tutorial
see: https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html

In [141]:
# define the model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

In [142]:
model = NeuralNetwork()
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
  )
)


In [None]:
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

In [None]:
softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

### Sidebar: nn.linear

In [115]:
input_image = torch.ones(3,2,2)
print(input_image.size())

input_image[1].add_(1)
input_image[2].add_(2)
input_image[0,1,1].add_(1)

input_image

torch.Size([3, 2, 2])


tensor([[[1., 1.],
         [1., 2.]],

        [[2., 2.],
         [2., 2.]],

        [[3., 3.],
         [3., 3.]]])

In [116]:
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())
flat_image

torch.Size([3, 4])


tensor([[1., 1., 1., 2.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.]])

In [124]:

layer1 = nn.Linear(in_features=2*2, out_features=2,bias=False)
import torch.nn.init as init
init.ones_(layer1.weight)

linear_layer = layer1(flat_image)

print(f"\nlinear layer shape:{linear_layer.size()}" )
w = layer1.weight # has the in_features shape, meaning it is a vector of length 4
print(f"\nweight shape:{w.shape} = [out_features,in_features]")
print("\nlinear layer weights:\n",w)
#print("\noutput divided by input\n",linear_layer/flat_image)
print(f"\ny=output = \n{linear_layer}")
# output should be: 
flat_image@w.T


linear layer shape:torch.Size([3, 2])

weight shape:torch.Size([2, 4]) = [out_features,in_features]

linear layer weights:
 Parameter containing:
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.]], requires_grad=True)

y=output = 
tensor([[ 5.,  5.],
        [ 8.,  8.],
        [12., 12.]], grad_fn=<MmBackward0>)


tensor([[ 5.,  5.],
        [ 8.,  8.],
        [12., 12.]], grad_fn=<MmBackward0>)

There for the transformation of $ y = xW^T+b $ where x=in_features, y=out_features, W=weights and b=bias

 - nn.Linear generates a random weight matrix with shape of [out_features,in_features]
 - nn.Linear adds all the columns together
 - the added rows are multiplied with the weight matrix
 - optionally a bias term is added

By summing all input columns a matrix with length of in_features is present to be multiplied with the correct dimension of the weight matrix.

see: https://ashwinhprasad.medium.com/pytorch-for-deep-learning-nn-linear-and-nn-relu-explained-77f3e1007dbb

## sidebar: tensors

One can think of tensors in terms of the below element has an element of 3 which contains 2 elements which contain 4 elements which contain 2 elements, hence resultig in a nested list of two.

see also: 
 - https://stackoverflow.com/questions/52370008/understanding-pytorch-tensor-shape
 - https://towardsdatascience.com/understanding-dimensions-in-pytorch-6edf9972d3be

In [138]:
test = torch.randn(3,2,4,2)
test

tensor([[[[-1.5035,  0.2602],
          [-0.9302,  2.2516],
          [-0.4207,  0.4595],
          [-0.7512,  0.7634]],

         [[-0.5143, -0.5848],
          [-0.2461,  1.7794],
          [ 0.7803, -1.1677],
          [ 3.1528,  0.2449]]],


        [[[-0.3602, -0.3408],
          [ 0.1743, -0.6936],
          [ 0.0032,  1.9898],
          [ 0.1117, -1.2837]],

         [[-0.5208,  2.4331],
          [ 1.5974, -2.6103],
          [ 0.6077,  0.5067],
          [-0.5647, -1.1774]]],


        [[[-0.7844,  0.0668],
          [ 0.3072, -0.6778],
          [ 0.6296, -2.1064],
          [ 0.8055, -0.9905]],

         [[-1.4523,  0.6980],
          [ 1.5231, -0.2093],
          [-0.9006, -0.6028],
          [-1.0110, -0.5660]]]])

In [140]:
#the first layer of the tensor consists of three elements of which the third is printed
# this third element of the first layer has 2 elements which contain 4 elements which in turn contain 2
# the last two form then the vector as printed in the brackets
test[2] 

tensor([[[-0.7844,  0.0668],
         [ 0.3072, -0.6778],
         [ 0.6296, -2.1064],
         [ 0.8055, -0.9905]],

        [[-1.4523,  0.6980],
         [ 1.5231, -0.2093],
         [-0.9006, -0.6028],
         [-1.0110, -0.5660]]])

## next