In [28]:
# libraries used by the book
import numpy as np

![](https://images-na.ssl-images-amazon.com/images/I/61gAY7APCQL._SX397_BO1,204,203,200_.jpg) 

This jupyter notebook contains my notes while reading [Grokking Deep Learning by Andrew Trask](https://www.manning.com/books/grokking-deep-learning). The book is sold as "a very gentle introduction to Deep Learning" and covers the intuition more than the theory.

# Chapter 1: Introducing Deep Learning

Deep Learning (DL) is is an intersection of Machine Learning (ML) & Artificial Intelligence (AI). This book covers the science under the hood of the major DL frameworks so you can understand whats going on when you use popular DL frameworks like Torch, Tensorflow, Keras, etc.

The book covers everything past high school maths needed to grok DL. Find a personal problem I'm interested in which to apply DL to. This couldbe anything where there is a dataset to predict another. Trask (the author) used Twiter to predict the stock market, which led him from barely knowing programming to a job at a hedge fund in 18 months.

# Chapter 2: Fundamental Concepts

DL uses a subset of ML methods, primarily Artificial Neural Networks. 

## Supervised vs Unsupervised ml

Two main types of ML:

- Direct imitation, or formally **supervised ml** is basically a computer looking at a dataset A which predicts B, say weather sensor data predicting probability of rain, and trying to figure out the pattern b/w an input (sensor data) and a output set (actual weather), so when given a new input set it can apply the earlier learned pattern and come up with a prediction.
- Indirect imitation, or fomally **unsupervised ml** looks at a not previously understoond dataset A and tries to find patterns in it. For example, it sorts data into a bunch of clusters. Clustering is the essensce of unsupervised ml. The computer doesn't know what the clusters mean but thats where the human comes in.

## Parametric vs Non-Parametric Learning

A parametric model has a fixed number of parameters to change, while a non-parametric model has infinite parameters.

Supervised parametric dl models take in input data, process them based on a fixed number of adjustable parameters and makes a prediction. The ml model learns the optimium parameters by comparing its predictions to the actual truth, then going back and tinkering with the parameters.

Unsupervised parametric dl models are similar to the supervised since they also use parameters, but they cluster the data into groups and come up with as many parameters as needed.  

DL algos can be either supervised or unsupervised, and either parametric or non-parametric.

# Chp 3: Introduction to Neural Prediction

When using data to predict something, u need as many datapoints as you think the neural net needs to be accurate. For example, when trying to predict if something is in an image, you probabbly need to feed the neural net the entire image.

> Always present enough information to the network, where "enough information" is definned loosely as how much a human might need to make the same prediction.

## simplest possible neural net

In [8]:
# the network
weight = 0.1
def neural_network(input, weight):
    prediction = input * weight
    return prediction

# using the network to predict something
number_of_toes = [8.5, 9.5, 10, 9]
input = number_of_toes[0]
pred = neural_network(input,weight)
pred

0.8500000000000001

Being able to manipulate vectors is a cornerstone technique for Deep Learning. Some functions to do vector math:

In [27]:
def elementwise_multiplication(vec_a, vec_b):
    return [vec_a[i] * vec_b[i] for i in range(len(vec_a))]

def vector_sum(vec_a):
    return sum(vec_a)

def elementwise_addition(vec_a, vec_b):
    return [vec_a[i] + vec_b[i] for i in range(len(vec_a))]
    
def vector_average(vec_a):
    return sum(vec_a) / len(vec_a)

a = [2,2,4]
b = [3,3,9]

# to get the dot product of a and b
vector_sum(elementwise_multiplication(a,b))

48

Testing the [wikipedia example of a dot product](https://en.wikipedia.org/wiki/Dot_product):

In [33]:
a = [1, 3, -5]
b = [4, -2, -1]
vector_sum(elementwise_multiplication(a,b))

3

## neural net with 3 inputs and 1 output

In [53]:
weights = np.array([0.1, 0.2, 0])
def neural_network(input, weights):
    pred = input.dot(weights)
    return pred

toes = np.array([8.5, 9.5, 9.9, 9.0])
wlrec = np.array([0.65, 0.8, 0.8, 0.9])
nfans = np.array([1.2, 1.3, 0.5, 1.0])

# input corresponds to every entry for the  first game of the season
input = np.array([toes[0], wlrec[0], nfans[0]])
print(neural_network(input, weights))

# to go through all the inputs
for input in zip(toes,wlrec,nfans):
    print(neural_network(np.array(input), weights))

0.98
0.98
1.11
1.15
1.08


## neural net with 3 inputs and 3 outputs

In [64]:
weights = [[0.1, 0.1, -0.3], [0.1, 0.2, 0.0], [0.0, 1.3, 0.1]]

def neural_network(input, weights):
    pred = vect_mat_mul(input,weights)
    return pred

def vect_mat_mul(vect,matrix):
    out = []
    for m in matrix:
        out.append(np.dot(vect,m))
    return out
        
toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]

# to go through all the inputs
for input in zip(toes,wlrec,nfans):
    print(neural_network(input, weights))

[0.55500000000000005, 0.98000000000000009, 0.96500000000000008]
[0.64000000000000001, 1.1100000000000001, 1.1699999999999999]
[0.92000000000000004, 1.1500000000000001, 1.0900000000000001]
[0.68999999999999995, 1.0800000000000001, 1.2700000000000002]


## a stacked neural network 

3 inputs and 3 outputs, with 2 layers