# **Neural Network (NN)**

**Neural Network (NN)**
- **ANN** (Artificial Neural Network)
  - Feedforward / MLP
- **CNN** (Convolutional Neural Network)
- **RNN** (Recurrent Neural Network) → LSTM, GRU
- **Transformer** → BERT, GPT
- **Autoencoder** → Variational Autoencoder (VAE)
- **GAN** (Generative Adversarial Network)

# **ANN →  Artificial Neural Network**

## **FNN →   Feedforward Neural Network**

- Each neuron take input- input will be multiplied by weight- calculate weighted sum of each layer -  add bias

    **`∑ ( 𝑤𝑖 ⋅ 𝑥𝑖 ) + 𝑏`**

- Apply activation function - get the output

    **`𝑦 = Activation ( ∑ ( 𝑤 𝑖 ⋅ 𝑥 𝑖 ) + 𝑏 )`**

- Result (y) will pass to the next layer until the output layer gives a final prediction.

- Calculate loss: Compare prediction with actual value

- Improve Accuracy using backpropagation: Errors are sent backward through the layers - Adjust weight to reduce error

    **`𝑤 = 𝑤 − 𝛼 ⋅ ∂ Loss / ∂ 𝑤`**

- Repeat Until Convergence - Train multiple **epochs** until the model **makes accurate predictions**.


# **CNN → Convolutional Neural Networks**

- Start with **input image (or data)** – break it into small patches.

- Apply **filters (kernels)** – slide over input – perform element-wise multiplication – calculate features (edges, textures, etc.).

     **`Feature Map = ∑ (Filter × Image Patch) + Bias`**

- Apply activation function (usually ReLU) – keep only important features.

    **`𝑦 = Activation ( ∑ (Filter × Patch) + 𝑏 )`**

- Apply **Pooling (e.g., Max Pooling)** – reduces size – keeps strongest features.

    **`Pooling = Pick maximum or average value from patch`**

- Repeat: Convolution → Activation → Pooling to learn deeper patterns.

  -Flatten the final feature map – turn it into a vector.

- Pass to **Fully Connected Layers (like ANN)** to make predictions.

- Calculate Loss: Compare predicted output with actual value.

- Improve Accuracy using **Backpropagation**: Send error backward – update filter and dense layer weights.

    **`𝑤 = 𝑤 − 𝛼 ⋅ ∂ Loss / ∂ 𝑤`**

- Repeat Until Convergence – train for multiple epochs until model predicts correctly.

# **RNN → Recurrent Neural Networks**

- Start with **input data (like sequence/text/time series)** – pass one value at a time (step-by-step).

- Each step: Multiply input by weight, add previous hidden state (memory), apply activation.

    **`ℎₜ = Activation ( 𝑤ₓ ⋅ 𝑥ₜ + 𝑤ₕ ⋅ ℎₜ₋₁ + 𝑏 )`**

- Pass the new hidden state ℎₜ to the next step – this helps the network "remember" past info.

- Final output is compared with actual value – calculate loss.

- Backpropagation Through Time (BPTT): Send error backward through all time steps – update weights.

    **`𝑤 = 𝑤 − 𝛼 ⋅ ∂ Loss / ∂ 𝑤`**

- Repeat until convergence – train over sequences multiple times.


**LSTM (Long Short-Term Memory)**

Same as RNN but with **extra gates** to **better handle long-term memory**.

- **Forget Gate**: Decides what to throw away from memory.
- **Input Gate**: Decides what new info to add.
- **Output Gate**: Decides what to show as output.

All gates work together to update the cell state (memory) and hidden state.

This solves the problem of **vanishing gradient** in long sequences.

|

**GRU (Gated Recurrent Unit)**

Similar to LSTM but **simpler** – uses only **2 gates**:

- **Update Gate**: Controls both memory and input.
- **Reset Gate**: Controls how much past info to forget.

Faster and lighter than LSTM, often with similar performance.

# **Transformer**

- Take **input sequence** (e.g., sentence) and convert to **embedding vectors**
- Add **Positional Encoding** → adds info about word order
- Pass through **Self-Attention** layers:
  - Each word looks at other words and decides what to focus on
  - Calculates weighted importance = **"Attention"**
- Each layer follows:
  - Self-Attention → Add & Normalize
  - Feedforward Layer → Add & Normalize
- Output is passed to the next layer or final prediction layer
- Use **Loss Function** to compare prediction with actual
- Use **Backpropagation** to update weights and learn
- Repeat over many epochs

**BERT (Bidirectional Encoder Representations from Transformers)**

- Uses only the **Encoder** part of Transformer
- Reads sentence **in both directions (left + right)** for full context
- Pretrained using **Masked Language Modeling** (hide random words and guess them)
- Good for **understanding tasks**: classification, question answering, etc.

|

**GPT (Generative Pretrained Transformer)**

- Uses only the **Decoder** part of Transformer
- Reads text **left to right (one direction only)**
- Pretrained to **predict the next word** in sequence
- Great for **generating text**: writing, answering, summarizing, etc.

# **Autoencoder**

- Input data (e.g., image or vector) is passed to the model.
- **Encoder** compresses the input into a smaller representation (**latent vector**).
- **Latent Vector** = compressed version of important features.
- **Decoder** tries to **reconstruct the original input** from the latent vector.
- Output is compared with original input → calculate **Reconstruction Loss**.
- Backpropagation is used to adjust weights and **minimize reconstruction error**.
- Trained to **learn only the essential features** of input data.

**Variational Autoencoder (VAE)**

- Works like autoencoder but **adds randomness** to latent space.
- Encoder doesn't give just one latent vector → it gives **mean (μ)** and **variance (σ)**.
- From μ and σ → sample a **random point (z)** in latent space.
- This makes the model **generate new, similar data** (good for generative tasks).
- Loss = Reconstruction Loss + KL Divergence (regularizes latent space)
- Trained to **generate new data** and learn **smooth, continuous features**.

# **Generative Adversarial Network (GAN)**

- GAN has **two models**:  
  - **Generator**: creates fake data  
  - **Discriminator**: checks if data is real or fake  
- Start with random noise as input → **Generator** tries to create realistic data (e.g., fake images).
- **Discriminator** sees both real and fake data → predicts if it's real (1) or fake (0).
- Both models compete:
  - Generator improves to **fool the discriminator**
  - Discriminator improves to **catch fakes**
- Loss:
  - Generator tries to **maximize** the chance of fooling the discriminator
  - Discriminator tries to **minimize** its error in detecting fake vs real
- Use **backpropagation** to update both models
- Trained until **Generator produces realistic data** that Discriminator can’t easily detect

# **Deep Reinforcement Learning (DRL)**

- An **agent** interacts with an **environment** (like a game or simulation).
- At each step:
  - Agent observes the **state** of the environment.
  - Chooses an **action** based on current state.
  - Environment returns a **reward** and new **state**.
- Goal: **Learn actions** that maximize total reward over time.
- Uses **Deep Neural Networks** to approximate the best actions (called a policy or Q-function).
- Learns by **trial and error**: good actions get **positive rewards**, bad ones get **penalties**.
- Uses **backpropagation** to update the neural network and improve decisions.
- Trained over many episodes until the agent learns the best strategy.