<a href="https://colab.research.google.com/github/siddrrsh/StartOnAI/blob/master/Neural_Net_From_Scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Neural Network from Scratch Tutorial in Python**
###### Created by **(Karthik Bhargav, Keshav Shah, Sauman Das)** for [StartOnAI](https://startonai.com/)
---


#Overview

We will cover the following topics in this notebook.


*   Theory of how Perceptrons work and Learn
*   Coded Walkthrough of a Neural Network
*   Model the Wisconsin Breast Cancer Dataset
*   Review Various Applications of Neural Networks

So stay tuned!



# What Are Neural Networks?


In simple terms, neural networks are representative of the human brain, and they are specificially made to recognize patterns. They interpret data through various models. The patterns that these models detect are all numerical specifically in the form of vectors. 

Neural networks are extremely helpful for performing tasks involving clustering and classification. Because of the networks similarity to the human brain, it is able to recognize patterns in unlabeled data.

We will start off by investigating the most basic Neural Network: **The Perceptron**

## Perceptrons

<img src="https://tinyurl.com/ybcfd78e" alt="perceptron" width="400"/>

[1]



The Perceptron consists of two main components
1.   Neurons ($x_i$)
2.   Weights ($w_i$)

Perceptrons represent the most basic form of a Neural Network with only two layers, the input and output layer.  As shown in the diagram above, both layers are joined by weights represented by the arrows. Each individual neuron represents a number. For example, if there are three inputs, the input layer will consist of 3 neurons plus an additional bias neuron. The importance of the bias ($b$) will become clear later in this tutorial. The output layer simply consists of one neuron in this scenario which represents the number we are attempting to predict. 




##Forward Propagation

The process of going from the input layer to the output is known as Forward Propagation. To simplify the computations, we will use vector notation to represent the input features and the weights.

  $\vec{x}=\begin{bmatrix}  x_1 & x_2 & ... & x_n\end{bmatrix}$


  $\vec{w}=\begin{bmatrix}  w_1 & w_2 & ... & w_n \end{bmatrix}$

  Finally, to get the value of the output neuron, we simply take the dot product of these two vectors and add the bias. 

  $z=\vec{x}\cdot\vec{w}+b=x_1\times w_1+x_2\times w_2+...+x_n\times w_n+b$






###The Bias Term

To get a better understanding of this output, lets analyze it with just one input neuron. In other words, our output neuron will store the following.

$z=x_1\times w_1+b$

If we visualize this in two dimensional space, we know that this will represent a line with slope $w_1$ and intercept $b$. We can now easily see the role of the bias. Without it, our model would always go through the origin. Now, we can shift our model along the axes giving us more flexibility while training. However, we are still only able to represent linear models. To add non-linearities to our model we use an activation function.



###Activation Functions

Lets imagine that we are solving a binary classification problem. This means the range of our output $\hat{y}$ (predicted value) must be $(0, 1)$ since we are predicting a probablity that the input belongs to a certain class. However, the range of a linear equation is $(-\infty, \infty)$. Therefore, we must apply some other function to satisfy this constraint. In binary classification problems, the most common activation function is called the sigmoid function. 

$\sigma(x)=\frac{1}{1+e^{-x}}$


<img src="https://tinyurl.com/ycggxehs" alt="sigmoid_graph" width="400"/>

As you can see in this graph, $\sigma(x)\in(0, 1)$. This activation function makes it possible to predict a probablity for a binary output. As you go further into machine learning, you will see several other activation functions. The most common ones other than sigmoid are ReLU, tanh, and softmax.


###The Output

Now that we know all the parts of the perceptron, let's see how to get the final output. After forward propagation, we saw the output was

  $z=\vec{x}\cdot\vec{w}+b=x_1\times w_1+x_2\times w_2+...+x_n\times w_n+b$

Finally, we must apply the activation function to get our final output.

$\hat{y}=\sigma(z)$

That is all there is to get the output from a perceptron! To sum it up in three simple steps:



1.   Get the dot product of the weights and the input features $(\vec{x}\cdot\vec{w})$.
2.   Add the bias $(\vec{x}\cdot\vec{w}+b)$.
3.   Apply the activation function and that is the predicted value $(\hat{y}=\sigma(\vec{x}\cdot\vec{w}+b))$!

So far we know how to take the input values and return the corresponding output. However, we must adjust the weights to make the network fit the training data. The process of making these adjustments is known as **back propagation**.



In order to adjust our weights, first we must figure out a way to numerically signify the accuracy of our prediction. In other words, we need to figure out how close our predicted value is to the actual value. A simple way to do this is to use the **Sum of Squares Error**.

$\mathcal{L}(y, \hat{y})=(y-\hat{y})^{2}$

Although this function works, most real-life applications will not use this error function. We will discuss another group of cross entropy loss functions. 


##Loss Function

Several functions exist for accomplishing this task, however, the most common loss function for binary problems is called **Binary Cross-Entropy**.

$\mathcal{L}(y, \hat{y})=-(y\log(\hat{y}) + (1-y)\log(1-\hat{y}))$

Where $y$ is the actual value (0 or 1) and $\hat{y}$ is the predicted probability. Looking closer at this equation, we can see that the first term will cancel out if $y=0$, and similarly the second term will cancel out if $y=1$. Therefore, we can write the same equation as a piecewise function.

$\mathcal{L}(y, \hat{y})=\begin{cases}-\log(1-\hat{y}) & \text{if $y=0$} \\-\log(\hat{y}) & \text{if $y=1$}\end{cases}$

Keep in mind that $\hat{y}$ is a decimal value in the range $(0, 1)$. The $\log$ function returns a negative number for such values. As a result, we must take the negative of the log to return a positive value. 

To see why this function works as the error, try experimenting in the next code cell with different values of $y$ and $\hat{y}$ then analyze the corresponding loss function value.

In [None]:
import numpy as np
def binary_crossentropy(y, yhat):
  #code is derived from the piecewise function
  if y == 0:
    return -np.log(1.0-yhat)

  if y == 1:
    return -np.log(yhat)

y = 0 #@param [0, 1] {type:"raw"}
yhat = 0.05 #@param {type:"slider", min:0, max:1, step:0.01}

print(f'Loss: {binary_crossentropy(y, yhat)}')


Loss: 0.05129329438755058


##Back Propogation

To simplify this process, we will show back propagation with the Sum of Squares error as our loss function. 

$\mathcal{L}(y, \hat{y})=(y-\hat{y})^{2}$

Keep in mind that our goal is to find the global minimum of the loss concerning our weights. To update our weights, we first need to find out how much a small change in the weight will affect our loss function. In other words, this is what we need to find:

$\frac{\partial \mathcal{L}(y, \hat{y})}{\partial w}$

However, we cannot find the derivative of $(y-\hat{y})^2$ with respect to $w$ if it does not exist in the expression. Fortunately, we can use the chain rule to overcome this obstacle. 

$\frac{\partial \mathcal{L}(y, \hat{y})}{\partial w} = \frac{\partial \mathcal{L}(y, \hat{y})}{\partial \hat{y}} * \frac{\partial \hat{y}}{\partial{z}} * \frac{\partial z}{\partial w}$

As a reminder, during forward propagation, we defined $z=w \cdot x+b$. The expanded expression can easily be simplified. 

$\frac{\partial \mathcal{L}(y, \hat{y})}{\partial \hat{y}} * \frac{\partial \hat{y}}{\partial{z}} * \frac{\partial z}{\partial w} = -2(y-\hat{y}) * \sigma(z)(1-\sigma(z)) * x$

The first term, $-2(y-\hat{y})$, and the last term, $x$, are pretty easy to derive. The middle term requires us to take the derivative of the sigmoid function. We will not derive it here, but the sigmoid derivative can be cleanly written in terms of the sigmoid function itself as:

$\sigma^\prime(x)=\sigma(x)(1-\sigma(x))$

The value of $\frac{\partial \mathcal{L}(y, \hat{y})}{\partial w}$ that we solved for, gives us the value that we call a gradient. Now, we will see the graphical interpretation. 



In the graph below, the $x$-axis represents the weight, and the $y$-axis represents the function J, which is any arbitrary loss function. The value we solved for the above is called the gradient, or in simpler terms, it is the slope of the tangent line at a point. Our end goal is to reach the global cost minimum since it is the point where the loss is minimized. Here is the algorithm that we will repeat several times to achieve this task. 

$w = w-\frac{\partial \mathcal{L}(y, \hat{y})}{\partial w}$

Let's think through this by using the image below. To reach the minimum, the weight needs to decrease. The slope of the tangent line/gradient is a positive value in this case. As a result, subtracting this value will help us get closer to our desired weight. 

<img src="https://rasbt.github.io/mlxtend/user_guide/general_concepts/gradient-optimization_files/ball.png" alt="gradient_descent" width="400"/>

Repeating this process several times is how a neural network trains itself. At this point, we know how to feed the inputs into a neural network and adjust the weights using back propagation! Now, it's time to transition from simple networks with just 2 layers (perceptron), to networks with additional layers in the middle. All the concepts stay the same, the only difference is that there are more weights to train.


##Artificial Neural Networks

Artificial Neural Networks (ANN) are very similar to Perceptrons except they have one extra layer. The figure below shows an example of the ANN. The input and output layers do not change. The layer in the middle is called the hidden layer. Before, we only had one weight matrix, connecting the input to the output. Now, we have an extra set of connections. 

Here is what the two-weight matrices would look like in the figure below.

$W_1=
\begin{bmatrix} 
w_{1,1} & w_{1,2} & w_{1,3} & w_{1, 4} & w_{1, 5}\\
w_{2,1} & w_{2,2} & w_{2,3} & w_{2, 4} & w_{2, 5}\\
w_{3,1} & w_{3,2} & w_{3,3} & w_{3, 4} & w_{3, 5}      
\end{bmatrix}
$

Here $W_1$ represents the connections from the input layer to the hidden layer. Notice that the number of rows is the number of neurons in the input layer and the number of columns is the number of neurons in the hidden layer. 

$W_2=
\begin{bmatrix} 
\beta_{1,1} & \beta_{1,2}\\
\beta_{2,1} & \beta_{2,2}\\
\beta_{3,1} & \beta_{3,2}\\     
\beta_{4,1} & \beta_{4,2}\\ 
\beta_{5,1} & \beta_{5,2}\\  
\end{bmatrix}
$

$W_2$ is a matrix storing the weights ($\beta$) connecting the hidden layer to the output layer. There are two columns since the output layer in the image has 2 output neurons. 




![NN](https://drive.google.com/uc?export=view&id=1EHA2P4kLUQm_FkpYskyJ6QTSskjiaSeo)

[2]

Example of how a neural network can be visualized!


# Code

The following code is us building a neural network from scratch on the Wisconsin Breast Cancer dataset. 

## Imports

We begin the neural network here by importing some necessary libraries that will allow us to actually create the virtual NN, and also display what goes on internally to maximize the accuracy of the NN.

What is the purpose of each library?

  - The sklearn library is used to properly initialize the neural network and the necessary algorithm needed. 
  - From the sklearn library, we import the breast cancer dataset. 
  - We import matplotlib, pandas, and numpy which help us organize and visualize the data and outputs. 

In [None]:
# Loading in the data
import sklearn
from sklearn.datasets import load_breast_cancer 
from sklearn.model_selection import train_test_split
# Visualization
import matplotlib as mpl   
import matplotlib.pyplot as plt
import pandas as pd

# Building the network 
import numpy as np

# Progress Bar
import tqdm as tqdm

import warnings
warnings.filterwarnings("ignore") #supresses warnings

## Loading Dataset, Preprocessing 

Adjust the slider to view different portions of the data.

In [None]:
full_df = pd.read_csv('https://raw.githubusercontent.com/karthikb19/data/master/breastcancer.csv') #preprocessed data
full_df.drop(['Unnamed: 0'], inplace=True, axis=1)
start_index = 150 #@param {type:"slider", min:0, max:564, step:1}
full_df[start_index:start_index+5]

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,radius error,texture error,perimeter error,area error,smoothness error,compactness error,concavity error,concave points error,symmetry error,fractal dimension error,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
150,-0.320167,0.346815,-0.348429,-0.385345,1.219756,-0.539188,-0.721149,-0.579569,2.659279,-0.273259,0.054239,0.190772,0.003436,-0.122265,-0.007993,-0.785703,-0.41127,-0.04317,1.085796,-0.855568,-0.436776,-0.255213,-0.489715,-0.463884,-0.11698,-0.914547,-0.916656,-0.786396,0.47764,-1.085918,1
151,-1.678039,0.328198,-1.594021,-1.282659,-0.164412,0.495752,0.543639,-0.702606,1.498279,2.808612,-0.763969,1.351952,-0.803464,-0.662847,1.796413,1.603016,1.513163,-0.255665,0.308472,3.020373,-1.486271,0.658341,-1.464904,-1.108862,1.342755,1.124281,1.275717,-0.545359,0.681481,3.582864,1
152,-1.248609,-0.91911,-1.161112,-1.008772,0.771413,1.052926,4.042709,0.764814,2.688487,4.275833,1.513443,2.625622,0.597473,0.209301,1.309727,3.93361,12.07268,6.649601,1.806213,9.851593,-1.087016,-1.007551,-1.078879,-0.879102,-0.138898,0.145898,2.635815,0.647036,0.335276,2.324925,1
153,-0.845593,-1.445027,-0.869073,-0.776409,0.083955,-1.008427,-0.866033,-0.801139,0.067109,-0.247742,-0.649918,-0.789881,-0.711388,-0.546898,0.659367,-0.921794,-0.660943,-0.578137,0.404124,-0.823039,-0.886145,-1.527023,-0.923695,-0.7731,0.075898,-1.0468,-0.964439,-0.906686,-0.067552,-0.899167,1
154,-0.277565,-0.91911,-0.274287,-0.329885,-0.179357,-0.366919,0.051861,-0.363415,0.037902,-0.103146,-0.484255,-0.76956,-0.518326,-0.386066,0.514361,-0.296669,-0.047206,-0.366616,0.865433,-0.119491,-0.310456,-0.843079,-0.285682,-0.357354,0.676449,-0.18235,0.137744,-0.264733,1.534051,0.132121,1


- Here we are loading our breast cancer data inside of a dataframe to better visualize the features and labels of our data. This is also important as it gives us easy access to our data in the form of an array so that we can extract whatever data is necessary.

In [None]:
X_train = full_df.drop('target', inplace=False, axis=1) #remove 'target' column from input features
y_train = full_df['target'] #stores target (1 or 0) in a separate array

#since we shuffled, the index numbers were messed up, this resets them
X_train = X_train.reset_index(drop=True) 
y_train = y_train.reset_index(drop=True)

#convert to numpy arrays with float values
X_train = np.array(X_train, dtype=float)
y_train = np.array(y_train, dtype=float)

#reshape y_train to make matrix multiplication possible
y_train = np.array(y_train).reshape(-1, 1)


Here the data (within numpy arrays) is being trained so that the model has experience classifying whether the breast cancer is benign or malignant. 

# Initalizing Weights

In [None]:
class Perceptron:
  def __init__(self, x, y):
    
    self.input = np.array(x, dtype=float) 
    self.label = np.array(y, dtype=float)
    self.weights = np.random.rand(x.shape[1], y.shape[1]) #randomly initialize the weights
    self.z = self.input@self.weights #dot product of the vectors
    self.yhat = self.sigmoid(self.z) #apply activation function

    
  def sigmoid(self, x):
    return 1.0/(1.0+np.exp(-x))

  def sigmoid_deriv(self, x):
    s = sigmoid(x)
    return s(1-s)

  def forward_prop(self):
    self.yhat = self.sigmoid(self.input @ self.weights) #@ symbol represents matrix multiplication (also works for vectors)
    return self.yhat

  def back_prop(self):
    gradient = self.input.T @ (-2.0*(self.label - self.yhat)*self.sigmoid(self.yhat))  #self.input is the x value
    
    self.weights -= gradient #process of finding the minimum loss

Initializing weights is an important step of a neural network as the neural network needs values to adjust so that it can create a more balanced and efficient neural network. Weights act as the inputs for the activation functions and are essentially the value that each neuron provides into the neural network which is constantly tweaked with each passing epoch. 

Here we also define our activation functions which serve the purpose of interpreting the data to feed it into the next layer of the neural network. Activation functions are necessary because most data is not linear, so there need to be specialized functions that can deal with more complicated 


## Fitting the Data

In [None]:
simple_nn = Perceptron(X_train, y_train)
training_iterations = 1000


for i in range(training_iterations):
  simple_nn.forward_prop()
  simple_nn.back_prop()

yhat = simple_nn.forward_prop()

def mse(yhat, y):
  sum = 0.0
  for pred, label in zip(yhat, y):
    sum += (pred-label)**2
  return sum/len(yhat)


print(f'Mean Squared Error: {mse(yhat, simple_nn.label)}')

(30, 1)
Mean Squared Error: [0.01572726]


We highly recommend that you try experimenting with ```training_iterations``` parameter to see how the number of iterations affects the mean squared error. Hopefully, you should notice that the error goes down as you increase the number of iterations.

# Feature Reduction Techniques

In complex datasets, having a lot of different variables to deal with to raise the accuracy and usefulness of the model does not result in all the features being used. For example, let us look at the logistic regression equation for features:

$
Y = \sigma(\beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + ...)
$

- Y is the output of the logistic function
- $\sigma$ is the sigmoid activation function
  - Sigmoid is used here since we want to create a logistic regression, where the features are compressed to values between 0 and 1.
- $\beta$ are our coefficients for each of the features $x$

Following the training sequence of the model, if any of the $\beta$'s equal 0, we know that their corresponding $x$ feature was not used to make adjustments to the model. To make this clear realize that ***The amount of different features is used is equivalent to the number of non-zero coefficients ($\beta$ values).***

The purpose of us performing this feature reduction to get rid of unnecessary $\beta$ and $x$ values is to prevent something called overfitting from happening. Overfitting is when our loss function gets stuck in a local minimum instead of trying to locate the maximum-minimum possible for a specific scenario. Now let's go over the different ways we can go about solving this problem.


## Lasso L1 Penalty

A primary method used to reduce the amount of features for a machine learning mdodel is to implement a Lasso L1 penalty to the function. Using the logistic regresison equation that we identified earlier:

$
Y = \sigma(\beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + ...)
$

We can implement a Lasso penalty that attempts to reduce as many $\beta$ values as possible to 0. As a result of this, only the resulting $x$ features that remain due to non-zero $\beta$ coefficients will be implemented into the model, removing unnecessary features which in turn helps solve our overfitting problem. 

![alt text](https://drive.google.com/uc?export=view&id=1OiScfT_C41xzSsTuS3CM-Xp8dKJDL4oJ)

***Mathematical Representation***

This penalty can be mathematically represented using the concepts of loss functions:

$$
Loss = Cost(X, Y, \beta) + \lambda \sum_{i}|\beta_i|
$$
Where the cost function is represented by:
$$Cost = \sum_{i=1}^{n}(Y_i-\sum_{j=1}^{p}X_{ij}\beta_j)^2$$

Including different values of $\lambda$ will let more or less of the coefficients ($\beta$ values) to 0.


## Cross Validation

Cross validation is a method used to determine what the best value of $\lambda$ is. Cross validation involves splitting the data into training and testing sets using different values of $\lambda$ . The model is then trained using different values of this $\lambda$ and the accuracy is then evaluated using the training test. This is done for a certain $k$ iterations for the splitting of the training and tested data. Then the $\lambda$ value that had the best accuracy for the training output is implemented into the model.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/b/b5/K-fold_cross_validation_EN.svg/1280px-K-fold_cross_validation_EN.svg.png" alt="drawing" width="500"/>


## Principal Component Analysis

Principal Component Analysis (PCA) is an unsupervised learning technique used to reduce the number of features in a dataset in yet another way. In applications such as biology and genomics, there is an abundance of features, so this technique uses linear combinations of the original features to make new features. This is very useful to reduce the number of dimensions of a machine learning model so that overfitting is prevented, and the model is easier to interpret and comprehend for both the computer and the programmer. 

<img src="http://phdthesis-bioinformatics-maxplanckinstitute-molecularplantphys.matthias-scholz.de/fig_pca_illu3d.png" alt="drawing" width="1000"/>

Here is a sample table that can help us visualize how features can be linearly combined. The table below represents some statistics about high school swimmers. This data set will have the features ```Height```, ```Weight```, ```Freestyle```, and ```Backstroke```.

|Name |Height |Weight |Freestyle | Backstroke|
|--|--|--|--|--|
|Harry|65 in|110 lb|30 sec|26 sec|
|Joyce|70 in |95 lb|28 sec|27 sec|
|Troy| 54 in|100 lb|32 sec|33 sec|
|Mary|57 in|105 lb|40 sec|42 sec|

PCA allows us to develop linear combinations of the features that spread the data furthest apart on the lowest dimension grid possible in order to uncomplicate the data. You may be wondering, why do we need to do all this in the first place:
- features correspond to dimensions, and if we have 1000 features, this means that our data can only be represented on a 1000 dimension graph which is impossibly difficult for humans to understand and tweak so that the model is more efficient
- excessive features also cause overfitting which cases the loss function to be trapped at a local minima instead of locating the global minimum of the loss

If we use this PCA to reduce the space from 4 to 2 features as in our example, PCA will show us what the new best features to use are:

|Name |Height + Weight | Freestyle + Backstroke|
|--|--|--|
|Harry|175 in-lb|56 sec|
|Joyce|165 in-lb|55 sec|
|Troy| 154 in-lb|65 sec|
|Mary|162 in-lb|82 sec|


As we can see in this example, as a result of Principal Component Analysis, the 4 features were combined together to form 2 groups of 2 features, which shows us how our graph was instantly reduced from 4 dimensions all the way down 2 dimensions which are very easy to visualize since it only involves an $x$ and $y$ axis, something that is very fundamental that most people understand. 
 


# Applications

In this section, we will cover some applications of neural networks. These will be 


*   Medicine
*   Robotics
*   Finance
*   Understanding Natural Language




## Medicine 

There are many applications of ML/DL to medicine and I will name and describe a few here. 

### Disease Identification

Disease identification is an important sub-field of AI and Medicine. It works by collecting a set of images(generally by scraping the web, or datasets such as this [one](https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia) on Kaggle) and training a more specific neural network known as a convolutional neural network(CNN). CNN's are extremely accurate and are made for image recognition and identification tasks. 

The founder of StartOnAI, Sid Sharma, recently wrote a paper called ["DermaDetect A computer vision model for an accurate diagnosis of skin conditions and rashes"](https://www.dropbox.com/s/hc5yqap7spo44ip/DermaDetect.pdf?dl=0). The goal of DermaDetect was to detect skin rashes/diseases using computer vision, and it describes in detail what Sid went through to get a highly accurate model. Sid used many states of the art techniques, and eventually even beat out the state of the art model in skin detection(determined by using Google AutoML). 

### Clinical Trial Research
Identifying suitable for clinical trials is often difficult, but if we start using machine learning for predictive analysis on which candidates to select, we could access more data than we ever have. For instance, instead of just using genetic information, and family history, we could start using doctors' visits and even social media!

### Health Records
Over the past few months, we have seen the effect of COVID-19 first hand. Every day, thousands of researchers around the world and trying to find vaccines to cure and stop the spread of this deadly virus. And with the many initiatives going around, there has been a plethora of research papers being written, and the goal of these researchers is to get tangible information from the research papers which is where Natural Language Processing(NLP) comes into play. NLP allows computers to understand the text they see, and with the speed of computers, combining them with NLP will allow us to get new information extremely efficiently. 

If you want to be part of this initiative click this [link](https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge)

 

## Robotics

Next, let us talk about some important instances of robotics in the field of ML/DL.

### Computer Vision
Robotics and computer vision go hand in hand. For instance, many of the companies in the self-driving cars field, such as CommaAI, Tesla, and Uber, use various computer vision techniques to map out the environment in which the car is traveling in. With the amount of data, we take in when driving, we can simply use that data and feed it to a  vision model to help various types of robots to also understand and visualize the world! Along with this, we can use robots to find anomaly’s in structures such as buildings using computer vision. 


### Reinforcement Learning
Recently, reinforcement learning has been all the rage with many companies especially Google’s Deep Mind creating extremely successful RL bots in games such as Go, Chess, and Shogi. But now the time has come to apply the same skills used in those mentally challenging games into real life, by creating robots that can learn and understand the world through trial and error, and the way they understand the world is, you guessed it computer vision! With robotics, we can help use robots effectively in the real world! 


## Finance




### Stock Market Trading
Many people purchase stocks, and often they do it based on recent trends, and sentiment about a specific company, and they use this information to predict when to buy/sell stocks. This is when you could use recurrent neural networks(RNN’s) for sentiment analysis. We could look through the internet and find articles of a particular company by searching for the company’s name! Then we can take subsections of the beginning, middle, and end and assess their sentiment and give investors an accurate notification of when to buy or sell a particular stock.

# References


[1] https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6

[2] https://towardsdatascience.com/over-fitting-and-regularization-64d16100f45c

[3] https://emerj.com/ai-sector-overviews/machine-learning-in-robotics/

[4] https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c

[5] https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c

[6] https://machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/

[7] https://www.nature.com/articles/s41563-019-0360-1

[8] https://medium.com/@ocktavia/titanic-prediction-with-artificial-neural-network-in-r-5dd20fb98dea

[9] https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

[10] https://towardsdatascience.com/what-is-a-perceptron-210a50190c3b