# Content

## Overview
This module explains **how machine learning and deep learning actually work**, and why modern AI systems
(including LLMs) behave the way they do.

It is **conceptual first**, with **light, real code** to ground ideas.

This module builds on:
- **Module 1:** Python fundamentals
- **Module 2:** Data & Pandas
- **Module 3:** LLM Fundamentals (inference, hallucinations, constraints)

By the end of this module, the behaviour of LLMs should feel *inevitable*, not mysterious.

## Learning Objectives
By the end of this module, you will be able to:
1. Explain AI vs ML vs DL vs Generative AI (GenAI)
2. Explain neural network anatomy (neurons, weights, biases, activations, layers)
3. Describe how models learn (loss, gradient descent, backpropagation, learning rate, epochs)
4. Explain overfitting vs underfitting and why data quality matters
5. Understand the “Big Two” DL frameworks: **PyTorch** and **TensorFlow/Keras**
6. Connect ML/DL concepts directly to **LLM behaviour**
7. Explain why **grounding (RAG)** is required for enterprise reliability

---
## Quick note on depth
You do **not** need to be a mathematician to be an effective AI engineer in a bank. But you **do** need a correct mental model.

# Group 1 — The Big Picture

## 4.1 AI vs ML vs DL vs GenAI
AI is the umbrella. ML learns from data. DL uses neural networks (many layers). GenAI generates new content. LLMs are DL + GenAI.

## 4.2 Where LLMs Fit
LLMs are deep neural networks trained for next-token prediction. They are not databases or truth engines.

## 4.3 Supervised vs Unsupervised vs Self-Supervised
LLMs are trained mostly via self-supervised learning (labels derived from data).

## 4.4 Regression vs Classification
Regression predicts numbers; classification predicts categories. LLMs internally classify next-token choices.

## 4.5 Training vs Inference
Training adjusts weights to reduce loss; inference uses weights to generate outputs.

# Group 2 — Anatomy of a Neural Network

## 4.6 What Is a Neuron?
Inputs × weights + bias → activation.

In [None]:
import math
def neuron(inputs, weights, bias):
    z = sum(x*w for x,w in zip(inputs, weights)) + bias
    return 1/(1+math.exp(-z))  # sigmoid
print(neuron([1.0,0.5,-1.2],[0.8,-0.3,0.1],0.05))

## 4.7 Weights and Biases
Weights are learned importance; bias shifts baseline. Training finds good weights/biases.

## 4.8 Activation Functions
Activations add non-linearity (ReLU, sigmoid). Without them, networks are linear.

In [None]:
import math
def relu(z): return max(0.0, z)
def sigmoid(z): return 1/(1+math.exp(-z))
for z in [-3,-1,0,1,3]:
    print(z, relu(z), round(sigmoid(z),4))

## 4.9 Layers
Input/hidden/output layers transform representations.

## 4.10 What 'Deep' Means
Deep means many layers, enabling hierarchical feature learning.

# Group 3 — How Models Learn (Training Mechanics)

## 4.11 Loss Functions
Loss measures error; training minimises loss.

In [None]:
def mse(y_true, y_pred):
    return sum((a-b)**2 for a,b in zip(y_true,y_pred))/len(y_true)
print(mse([10,12,13],[9,11,15]))

## 4.12 Gradient Descent (Foggy Mountain)
Optimisation is like walking downhill in fog guided by slope (gradient).

## 4.13 Backpropagation (High Level)
Error flows backward through layers to update weights.

## 4.14 Learning Rate
Step size: too small slow; too large unstable.

## 4.15 Epochs and Convergence
Epoch = one full pass through data. Too many → overfitting.

# Group 4 — Reality, Frameworks, and Why This Matters

## 4.16 Overfitting vs Underfitting
Overfitting memorises; underfitting too simple. Use validation sets.

## 4.17 Data Quality and Bias
Models reflect training data. Poor quality and bias drive poor outcomes.

## 4.18 PyTorch — Mental Model
Imperative, Pythonic, easy to debug (research-favoured).

In [None]:
try:
    import torch
    x=torch.tensor([1.0,2.0,3.0])
    w=torch.tensor([0.1,0.2,0.3])
    print((x*w).sum().item())
except Exception as e:
    print("PyTorch not available (ok):", type(e).__name__)

## 4.19 TensorFlow/Keras — Mental Model
Declarative, high-level API; fast to build models.

In [None]:
try:
    import tensorflow as tf
    model=tf.keras.Sequential([
        tf.keras.layers.Dense(4, activation="relu", input_shape=(3,)),
        tf.keras.layers.Dense(1)
    ])
    model.summary()
except Exception as e:
    print("TensorFlow not available (ok):", type(e).__name__)

## 4.20 Why This Matters for LLMs (Bridge to RAG)
LLMs generalise patterns not truth; hallucination is expected. Grounding/RAG supplies evidence + memory.

## Practice Exercises (Ungraded)
1) Regression vs classification examples in banking
2) Explain learning rate instability
3) Explain overfitting in fraud models
4) Explain why grounding reduces hallucination

## Module Summary
ML/DL fundamentals explain LLM behaviour; grounding (RAG) is required for enterprise reliability.