# Neurons and the Brain: Origins of Neural Networks and Deep Learning

## Origins
- Original idea: create software that could mimic how the human brain operates.  
- Today, many AI neural networks actually work very differently from how the biological brain works.  
- We still don’t fully understand how the human brain works.  

### Timeline
- **1950s** – Early beginnings of neural networks  
- **1980s/1990s** – Found popularity  
- **2005 onwards** – Resurgence with **deep learning**  

### Applications
- Speech recognition  
- Computer vision  
- Natural Language Processing (NLP)  
- Many more areas of Machine Learning  

---

## Biological Inspiration
- In the brain, **neurons** send electrical impulses to other neurons.  
- Each neuron aggregates inputs from other neurons.  

![bio](biological_comparison.png)

---

## Artificial Neural Networks
- Neural networks use a **simplified mathematical model** of a neuron.  
- A **circle** represents a neuron:  
  - Takes one or more inputs (numbers)  
  - Performs a computation  
  - Outputs another number (which can be input to another neuron)  

### Simulating Many Neurons
- Instead of building one neuron at a time, we simulate **many neurons simultaneously**.  
- All neurons take inputs, compute, and output numbers that feed into other neurons.  

## Big Data and Neural Network Scaling
- With the progression of **Big Data**, traditional ML and early AI models struggled to produce good enough performance.  
- Neural networks scaled to handle this challenge:  
  - **Small networks** → modest performance improvements  
  - **Medium networks** → better performance with more capacity  
  - **Large networks** → significant performance gains with massive datasets  
- This scaling helped unlock the breakthroughs seen in **deep learning** today.  

# Demand Prediction: Neural Network Example

![demand_pred](demand_prediction.png)

- Finding the demand of t-shirts being sold in a shop

### Logistic Regression as a Neuron
- The logistic regression example above can be seen as a **single neuron**:  
  - Inputs: **x** (features)  
  - Output: **a** (activation, probability of a top-seller)  

### Activation Terminology
- We can change our function **f(x)** into an **activation (a)**.  
- **Activation (a)** is a term from neuroscience, referring to how strongly a neuron sends its output to other neurons.  

### Neuron as a Tiny Computer
- Another way to think of a neuron:  
  - A **tiny little computer** whose only job is:  
    - Take one or more numbers **x** as input  
    - Perform a computation  
    - Output one or more numbers **a**  

### Building Neural Networks
- A **neural network** is created by wiring together many such neurons.  
- Each neuron passes activations forward, allowing the network to learn complex patterns. 





### More complex example:
![layer_eg](demand_prediction_layers.png)

### Feature-Neuron Mapping
- Different contributing features affect the probability of being a top seller.  
- We categorize features into separate neurons, each an **individual logistic regression unit**:
  - **Affordability neuron**: function of price and shipping cost  
  - **Awareness neuron**: function of marketing  
  - **Perceived quality neuron**: function of price (as a signal of quality) and material  

- The outputs of these three neurons are wired into another neuron (another logistic regression unit).  
- This neuron takes the three values as input and outputs the **probability of being a top seller**.  

---

### Neural Network Terminology
- We group neurons into a **layer**:  
  - A grouping of neurons that take as input the same/similar features and output numbers together.  
- Layers can contain:  
  - **Multiple neurons** (e.g., the first layer with affordability, awareness, perceived quality)  
  - **Single neuron** (e.g., the final output neuron)  

#### Key Layers
- **Input layer (Layer 0)** ($\bar{x}$) → original features (e.g., price, shipping, marketing, material)  
- **Hidden layers** ($\bar{a}$) → intermediate activations (not directly observed in training data)  
- **Output layer** → final probability of being a top seller  

By convention, the input layer is layer 0 and when we say there are x layers, the input layer is not included in x.

---

### Why Hidden Layers are "Hidden"
- Training data includes:  
  - Input values ($\bar{x}$)  
  - Output values ($\bar{y}$)  
- But it does **not** include the intermediate outputs of hidden layers ($\bar{a}$).
- That’s why these layers are called **hidden**.
- The output of hidden layers are the **activations**, which are defined as 'higher level features'.

---

### Feature Access in Practice
- In practice, deciding which inputs belong to which neuron would be time-consuming.  
- Instead:  
  - Each neuron has access to **all features from the previous layer**.  
  - Example: the affordability neuron can see all features but learns (through weights) to ignore marketing and material if they’re irrelevant.  

---

### Neural Networks as Feature Learning
- Another way to view this neural network:  
  - A **logistic regression model** with inputs = improved features (affordability, awareness, perceived quality).  
  - These improved features are **learned** from the original inputs (price, shipping, marketing, material).  
- This is similar to **feature engineering**:  
  - Traditionally: we manually create features (e.g., $x_{1} \cdot x_{2}$).  
  - With neural networks: the model automatically **learns new features**.  

👉 **Note**: You don’t have to manually create hidden-layer features — the model computes them during training.  
You can have multiple hidden layers with multiple neurons.  
How many hidden layers and how many neurons per layer are answered when you set up the **Neural Network Architecture**

### Real Example: Face Recognition
![face](facial_recog.png)
- Say you have a 1000*1000 pixel picture.
- This can be translated into a 1000*1000 matrix in code, with each value from 0-214 of how bright the pixel is.
- This can be flattened into a 1 million length array ($1000 \cdot 1000$) and inputted into the first layer.
- One Layer may test for different types of lines in each neuron
- The next may check for specific facial features, like eyes, nose, ears etc
- The final hidden layer may aggregate the facial picture to find some sort of shape structure.
- The ouput layer produces a probability of being a specific person.
- The neural network would work out each of these hidden layers all by itself.
- As we are progressing forwards through the activation values, this process is an example of **Forward Propagation Algorithm**
