# Neural Networks

### About this section

The __Fake News Detection__ part of the workshop uses a machine learning concept called a "Neural Network" (sometimes abbreviated to "NN") to classify information. If you are already familiar with the concept of neural networks and just want to see one in action, feel free to move ahead to the next part of the workshop. 

If you are unfamiliar with NNs and want some of the background and basics, you can start here. The theory and science behind neural networks is discussed (and is really cool!), but if you don't care about that and just want to understand what the code is really doing and what math goes into it, you can skip the "theory" and "comparison to the brain" sections and jump straight to "What do artificial neural networks look like?"

## What is a Neural Network?

### Theory

Simply put, this is an algorithm thats designed to mimic the human brain. Neurophysiologist Warren McCulloch and mathematician Walter Pitts are credited with proposing the first neural network in a paper published in 1943 (around the time early computers were just starting to be developed.) The goal of this paper was to explain how neurons might work, and they chose to represent these neurons using electrical circuits.

Donald Hebb built off of this theory in 1949, showing that neural pathways are strengthened each time they're used, which is critical to understand when trying to replicate how humans learn. He argued that if two nerves fire simultaneously, their connection becomes even stronger.

Decades later, we have much more accurate and advanced knowledge of how the brain works, and have continued improving neural networks accordingly. 

More detail about the early theorizations is linked at the bottom of the page if you are curious to read about this in more detail.

### Comparison to the Brain

![NN.png](attachment:NN.png)

The average human brain has an estimated 86 billion neurons. Each one possesses a "dendritic network" (the branches on the left hand side of the neuron) that receives electrical impulses from nearby neurons. We call this the "input" in our algorithms. These signals are processed by the neuron in the nucleus, and if a certain threshold of information received is met, the axon passes this signal through to the axon terminal, where it is then sent to other nearby neurons. When all these neurons are taking in and processing information simultaneously and communicating it to each other, it creates a "network" that allows you to learn and make decisions. (Estimates suggest that we have somewhere around 100 trillion to 1000 trillion total connections.)

The "perceptron," the first type of *artificial* neuron, is pictured on the left, where it would be given several inputs, processed in the "neucleus" with equations we give it, and then a 0 or 1 would be output depending on if the information threshold is met. More advanced artificial neurons are able to give much more complex outputs than just a 0 or 1. When we create neural networks today, we create many of these "neurons" to process information. We give them inputs, allow them to process these inputs based on our given criteria, create an output, send that output to other neurons, and eventually give us a final "output," the format of which varies significantly depending on the purpose of your network.

When we create neural networks today, we usually have anywhere from 10-1000 neurons, which is obviously significantly less than a real brain, but we are able to process a significant amount with that number. Today, our ability to create more and more powerful networks is generally limited by how powerful our computers are rather than our understanding of how neural networks should work.

## What do artificial neural networks look like?

<div>
<img src="attachment:Neural-Network.jpg" width="400"/>
</div>

The above picture is generally how neural networks are visualized. To move away from the neuron comparison, we can think of neural networks as an interconnected group of nodes. These nodes are generally organized into three main categories: an __input later__, an __output later__, and a much more variable __hidden layer__. The input later and output layer are exactly what they sounds like: the nodes that take in input data and the nodes that output the "final" set of data. The hidden layers are a bit more complicated, and are only necessary when a neural network needs to make sense of something difficult to define, contextual, or non-obvious. __Deep Learning__ is a buzzword in computer science right now, and this is generally in reference to neural networks that have numerous hidden layers. 

The lines connecting the nodes generally represent a "weight," or the "strength" of an input. For example, if you're trying to determine what is and isn't a football from a given set of sports ball images, an incredibly important factor in differentiating would be the shape, since it is so unusual in comparison to other objects in the same category. You would want to give this factor a higher "weight" than the other ones. Size would be an example of a factor that would not be heavily weighed, since a football has a comparable relative size to other similar objects. Trying to judge a football from size alone wouldn't be easy, although things like a tennis ball would be easily eliminated from the potential output set. 

So the neural network ultimately works by taking inputs from a dataset, multiplying them by "weights" that go into our hidden layers, adding the totals in each node, and then continue multiplying them by the appropriate weights until we reach an output. 

We go through two processes while training the network, called __forward propagation and __back propagation__. These work by applying sets of weights and calculating the output in forward propagation, and then measuring the error and adjusting the weights accordingly in pack propagation.

For example, if we want to create a Neural Network of a XOR operation, here's how it would look.

XOR Operations:

(exclusive or, meaning a 1 is outputting if __only__ one of the conditions is a 1)

(0, 0) = 0
<br>(0, 1) = 1
<br>(1, 0) = 1
<br>(1, 1) = 0

Lets try to get the last row from the table. Here's what our neural network might look like:

<div>
<img src="attachment:XORNN1.png" width="500"/>
</div>

For the hidden parts, we have chosen only a single layer with three neurons, but this could easily be altered for different results.

Now we need to assign weights to the connections. These are all selected randomly. (There are different ways of selecting these distributions, but descriptions are very complicated, so if you are curious, more information is linked at the bottom of the page.)

<div>
<img src="attachment:XORNN2.png" width="500"/>
</div>

The neural network then calculates the sum of the product of the inputs with the corresponding weights.

<br>1 * 0.8 + 1 * 0.2 = 1
<br>1 * 0.4 + 1 * 0.9 = 1.3
<br>1 * 0.3 + 1 * 0.5 = 0.8

<div>
<img src="attachment:XORNN3.png" width="500"/>
</div>

Now, within these hidden nodes, we are applying something called an __activation function__. These will be described in more detail after the description of the neural network, but essentially, these transform the input signal into an output signal and serve the purpose of allowing NNs to model complex patterns that a linear model might miss.

For our purposes, we will be using a sigmoid function, which is represented by f(x) = 1/(1 + e^(-x))

When this is applied to our hidden sums, here are the values we get:

<div>
<img src="attachment:XORNN4.png" width="500"/>
</div>

Applying the same pattern as before, we multiply the values by the weights, add them up, and apply the activation function, to get a final result shown here.

<div>
<img src="attachment:XORNN5.png" width="500"/>
</div>

As you can imagine, it's not what we hoped for. This tells us that the weights need to be adjusted (or that we could try another activation function.)

This is where __back propagation__ comes in. Back propagation is the process where we calculate the error with respect to each weight so we have a good guess as to how to adjust them. This is somewhat similar to forward propagation, except the math for back propagation is a bit more complicated.

When creating Neural Networks, you will usually be using a library (most commonly the open-source libraries __tensorflow__ or __pytorch__) that will take care of most of this for you. That being said, if you are curious to learn about the actual math that goes into back propagation, 3Blue1Brown on youtube has a great 4-video playlist on neural networks, two of which are on back propagation, that explain the concept well.

https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi

It isn't necessary to do the weight math on your own since the libraries will usually take care of it for you, but it is especially helpful to see how it's done at least once. Hopefully this section of the workshop has helped you get an understanding of what actually goes on in a neural network so you can understand how one actually functions, and what's going on within the deceptively simple code you'll see in the next section of this workshop.

### Activation Functions

An __activation function__ is, in simple terms, a "weighted sum" of its inputs that has an added bias. The output determines whether or not our neuron/node should be "fired." Since our nodes don't have inherent bounds or understandings of when it should fire, we need to give it one ourselves, to basically say "your potential output is high enough that we should consider this significant."

The most common ones are listed here:

#### Linear
f(x) = cx, where c is some constant
<br>A straight line function where activation is proportional to input.

#### Sigmoid
f(x) = 1/(1+e^(-x))
<br>Non-linear and looks like a "smoothed-out step function."

#### ReLu
f(x) = max(0,x)
<br>Gives an output of 0 if x = 0, otherwise, outputs x.

#### Which activation function do I use?

Unfortunately, there isn't always a simple answer for this. Sometimes, you should use more than one, sometimes, more than one could work just as well. When beginning and writing simple neural networks, you can simply test multiple and see which one (or which combination) works best. However, once you create more and more complex networks, it makes much more sense to pick your activation functions based on what type of data you're using and how you want that data processed. Since these are often somewhat complicated, we won't go over them here, but if you'd like further explanation of these activation functions, here's a good place to start:

https://missinglink.ai/guides/neural-network-concepts/7-types-neural-network-activation-functions-right/

## When and Why Should I Use a Neural Network?

Neural networks are incredibly good at finding patterns and trends in highly multivariate data. If you have a classification problem where you want a binary output from simple data, you might not want to use a neural network. For example, if you just want to know if something is mostly red or mostly blue and you're given the RGB values of pixels, a neural network would probably be up for the task, but it would likely be much more efficient to just write a program to find the average values of the pixels in question, sort them into "blue" and "red," and give you an answer "blue" or "red" based on the amounts of each.

However, what if you want a computer to recognize from a given photo whether or not a dog or a cat is pictured. This is a very common example in machine learning (and you can likely find many neural networks online solving this exact problem if you are curious to see one try this.) If you were to give a set of instructions for determining whether something was a cat or a dog to someone who had never seen or heard of either, what would you put in those instructions? You can say "cats are generally smaller than dogs," but what if the person is shown a picture of a corgi or a chihuahua? Would you tell them to look at the ears? Although cats' ears are generally pretty similar, there are definitely breeds of dogs with ears similar to cats.

The answer isn't simple, and is a great example of what a neural network is for. Our brains are able to make this distinction easily, likely because there are hundreds or thousands of tiny things we look for to make the distinction as soon as we see one that we are completely unaware of. It probably isn't just one or two things, its likely a combination of many characteristics, as well as previous "data" that we've processed, having seen cats and dogs before. Our brains aren't perfect at it either- although big cats, for example cheetas or panthers, have an extremely large number of the same characteristics as smaller cats, most people don't make the connection that they could be in the same category (even if not the same subcategory) until we're told. The different size, color, and facial shape leads us to believe that they'd be in a different animal category altogether, although when it's pointed out to us that they belong in the same grouping, our brains are suddenly able to see the connections that we otherwise might not have noticed.

As a result of the complexity of a classification problem like this, using a neural network to come up with a result makes a lot of sense. 

Why not use another classifier? In some cases, it *does* make sense to use another classifier rather than a neural network. However, due to the complexity of certain data-sets, a classifier might have a tough time getting the job done. If the data isn't in an easy-to-map shape like a linear function would be, any simple classifier might struggle to find a pattern to classify the data-set by. Although it is a good rule of thumb that a neural network is good for data that has a complex pattern, this isn't the case for all "complex shapes," as some classifiers are designed specifically for this type of mapping. A neural network is only one of the tools at your disposal for this type of problem.

A major reason to use a neural network is for the flexibility in working with datasets. For example, what if you want your algorithm to learn from pervious data and assume that the data is dependent on other datapoints in the set? For example, if you want to predict the next word given the beginning half of a sentence, the algorithm not only needs data of complete sentences, but needs to act depending on previous data. Another classification model would struggle with this kind of interdependency. 

Ultimately, there are advantages and disadvantages to using neural networks depending on the situation, and there isn't a definitive rule for when you should and shouldn't use one. However, it is accurate to say that neural networks can be used on an extremely wide variety of problems due to their flexibility, and neural networks will often still do the trick even if another type of algorithm might be easier.

## Types of Neural Networks

__This section isn't necessary for understanding the rest of the workshop, so feel free to skip it if you would rather move on to the coding section if you feel you understand enough__

There are many different types of neural networks that vary in types of layers used, proportions of neurons in each layer, and how each neuron is programmed to function. For this workshop, you will only see a basic neural network, but several different types of common NNs are described here, as well as when you would want to use each one.

### Convolutional Neural Network (CNN)

If you've heard of any type of NN, this is probably the one you heard about. 

<br>__Characteristics:__ They have convolution cells, often known as "pooling layers," which are layers in teh neural network that simplify the processed input data by reducing unnecessary features. They also contain convolution kernels, which are matrices that are useed on a given data-set to find features from whatever matrix of data is currently being looked at.

<br>__Uses:__ Most frequently used for image recognition

### Recurrent Neural Network (RNN)

These NNs are used when you want to best learn from sequential information. The NNs we've looked at previously assume that inputs are independent of each other. When understanding dogs vs. cats, it doesn't matter whether your neural network was just shown a picture of a dog or a cat- a picture of one doesn't affect a picture of the other. But if you want to, for example, predict the next word given half a sentence, you need to have some previous data- the words that came before the word you're currently trying to guess.

<br>__Characteristics:__ These NNs use a different type of cell than the ones we've previously talked about known as a "recurrent cell" (like the name implies.) This type of cell recieves it's own output with a delay of one or more iterations. Simply, previous outputs are used as inputs so the neural network has a "memory" and is able to learn from previous data to process further data.

<br>__Uses:__ Natural Language Processing and Speech Recognition

### Long/Short Term Memory (LSTM)

These are a type of RNN that's built to "have a memory" the same way an RNN is, but it's built to remember things more long-term than an RNN

<br>__Characteristics:__ It uses a *memory cell* that is able to process data with large time gaps, so it can "remember" things that happened a long time prior to the current data being processed.

<br>__Uses:__ Natural Language Processing and Recognition.

### Generative Adversarial Network (GAN)

This is likely another term that you've frequently heard if you've done some reading on machine learning. These attempt to create datasets and test the datasets on itself.

<br>__Characteristics:__ This type of network consists of two networks referred to as a __generator__ and a __discriminator__. The generator, as the name implies, generates datasets for given parameters. The discriminator tries to differentiate between this generated data and true sample data. The idea is that as the generator gets better at creating datasets, the generator gets better at detecting them, so they work together to hopefully create a very realistic dataset. These networks are constantly evolving, but a balance needs to be maintained between the two.

<br>__Uses:__ Frequently used for generating realistic images

### Support Vector Machine (SVM)

These are __not always considered neural networks__ but we'll give a brief description of these since they're a frequently used buzzword and are used frequently in conjuction with neural networks, if not on their own.

<br>__Characteristics:__ Only used for binary classification, or simply, when you only want certain possilibilities: "yes" or "no," 0 or 1, etc. No matter how many layers or dimensions (or inputs) you give your neural network, it will only give you a binary classification.

<br>__Uses:__ Any type of classification problem. __Some people__ consider this a good first choice for any classification problem, since it's often one of the most flexible and effective choices (although this is a debated claim.)

All of these are obviously simplifications of what actually goes on behind the scenes of these neural networks, and there are also hundreds more not discussed in this workshop. If you would like to see examples of other types of neural networks, and gain a further understanding of what goes into them, here's a good link to start with:

https://towardsdatascience.com/the-mostly-complete-chart-of-neural-networks-explained-3fb6f2367464

## Different Learning Methods

On top of having numerous choices for *types* of neural networks, we also have some options for types of *learning* that these newtorks do. Here are descriptions of the three main types and when you would use each:

#### Supervised Learning

The algorithm is given a set of inputs and outputs as training data. It them tries to predict the outcomes of a new testing set based on how it processed the training data.

#### Unsupervised Learning

This learning occurs without human help. With supervised learning, we label or classify the datasets beforehand and tell the algorithm what results it should get. With unsupervised learning, the data given is neither labeled or classified. The algorithm tries to learn from it without guidance. 

#### Reinforcement Learning

This can be a mix of supervised and unsupervised learning. The algorithm processes data, and then learns and changes based on feedback you give it on it's results.

## Conclusion

Hopefully you feel much more knowledgable on what a neural network actually is and how it works. To really appreciate how much neural networks are capable of, you should try them out yourself! The next part of this workshop goes over an example of a __fake news detector__ so you can see a neural network in action!

## Fun Links

### Neural network theory:

#### Neuroscience behind the theory:

https://medium.com/@eraiitk/brain-and-artificial-neural-networks-differences-and-similarities-1d337fe50168

https://towardsdatascience.com/the-differences-between-artificial-and-biological-neural-networks-a8b46db828b7

### Math Stuff

#### General Neural Networks (Where most of the pictures are from)

https://stevenmiller888.github.io/mind-how-to-build-a-neural-network/#:~:text=A%20neural%20network%20is%20a,learning%20implying%20multiple%20hidden%20layers.

#### Weighing Distributions

https://medium.com/hal24k-techblog/a-guide-to-generating-probability-distributions-with-neural-networks-ffc4efacd6a4

#### Activation Functions

https://medium.com/the-theory-of-everything/understanding-activation-functions-in-neural-networks-9491262884e0