# A Walk in the `AI` Park

- A quick appreciation of the AI domain
- Gain an understanding of some key concepts
- Understand the progression leading upto GenAI

# `AI > ML > DL > GenAI`

# Machine Learning

- Learning Patterns
- Supervised Learning: input + expected output
    - Classification: is mail spam?, will it rain tomorrow, object detection
    - Regression: temperature tomorrow, rain (mm) in monsoon, prob of hitting oil
- Unsupervised Learning: only input (!)
    - Clustering: you might also like../often bought together../
- Others: semi-supervised, self-supervised, human in the loop

In [None]:
# Requires: scikit-learn 1.7.2, torch 2.6+, numpy 2.3.3
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Generate a toy "moons" dataset (nonlinear for illustration)
X, y = make_moons(n_samples=1000, noise=0.2)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# 1. Classical ML: Logistic Regression
clf = LogisticRegression()
clf.fit(X_train, y_train)
print(f"LogisticRegression test accuracy: {clf.score(X_test, y_test):.3f}")

In [None]:
import seaborn as sns

sns.scatterplot(x=X[:,0], y=X[:,1], hue=y)

In [None]:
sns.scatterplot(x=X[:,0], y=X[:,1], hue=clf.predict(X))

# Deep Learning

- Uses Artificial Neural Networks
- Fundamental Unit: Perceptron

    > [Input] -> [Weights & Biases] -> [Output]

- Arranged in Layers
- Ability to automatically adjust W&B : learning
- Highly scalable
- Terms & concepts: Learning Rate, epochs, Gradient Descent, loss(objective) function, optimizer, batches

In [None]:
import torch
import torch.nn as nn

# 2. Minimal Neural Net (PyTorch)
class TinyNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(2, 16), nn.ReLU(),
            nn.Linear(16, 2)
        )
    def forward(self, x):
        return self.net(x)

In [None]:
torch.manual_seed(0)
model = TinyNN()
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.05)

X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.long)
X_test_t = torch.tensor(X_test, dtype=torch.float32)
y_test_t = torch.tensor(y_test, dtype=torch.long)

for epoch in range(160):
    optimizer.zero_grad()
    out = model(X_train_t)
    loss = loss_fn(out, y_train_t)
    loss.backward()
    optimizer.step()
print(f"NeuralNet test accuracy: {(model(X_test_t).argmax(1) == y_test_t).float().mean().item():.3f}")

# Both models predict classes, but the neural network can fit more complex, nonlinear boundaries.

# Others

- CNN
- RNN
- LSTM

# Transformers

- Evolved NN architectures
- Encoder+Decoder
- Postional Encoding
- **ATTENTION!**

In [None]:
# Toy transformer encoder & decoder block (PyTorch)
import torch
import torch.nn as nn

class MiniEncoderBlock(nn.Module):
    def __init__(self, d_model=8):
        super().__init__()
        self.self_attn = nn.MultiheadAttention(d_model, num_heads=2, batch_first=True)
        self.ffn = nn.Sequential(nn.Linear(d_model, d_model), nn.ReLU(), nn.Linear(d_model, d_model))
    def forward(self, x):
        attn_out, _ = self.self_attn(x, x, x)
        return self.ffn(attn_out)

class MiniDecoderBlock(nn.Module):
    def __init__(self, d_model=8):
        super().__init__()
        self.self_attn = nn.MultiheadAttention(d_model, 2, batch_first=True)
        self.cross_attn = nn.MultiheadAttention(d_model, 2, batch_first=True)
        self.ffn = nn.Sequential(nn.Linear(d_model, d_model), nn.ReLU(), nn.Linear(d_model, d_model))
    def forward(self, x, enc_out):
        x_attn, _ = self.self_attn(x, x, x)
        x_ca, _ = self.cross_attn(x_attn, enc_out, enc_out)
        return self.ffn(x_ca)


In [None]:
dummy = torch.rand(1, 4, 8)  # (batch, seq_len, d_model)
enc = MiniEncoderBlock()
dec = MiniDecoderBlock()
encoded = enc(dummy)
decoded = dec(dummy, encoded)
print("Encoder output:", encoded.shape, "Decoder output:", decoded.shape)


In [None]:
tr = nn.Transformer(
    d_model=512,
    num_encoder_layers=5,
    num_decoder_layers=5,
    nhead = 4
)

In [None]:
print(tr)

# Attention

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

# Toy vocabulary and embeddings
main_token = 'orange'
tokens = ["orange", "is", "my", "favourite", "fruit"]

embed_dim = 4  # Small, illustrative dimension
torch.manual_seed(38)
embeddings = torch.randn(len(tokens), embed_dim)

embeddings.shape

In [None]:
# Simplest self-attention ("orange" attends to every word)
def simple_attention(Q, K, V):
    attn_scores = (Q @ K.T) / (K.shape[-1] ** 0.5)
    attn_weights = F.softmax(attn_scores, dim=-1)
    out = attn_weights @ V
    return attn_weights, out

Q = K = V = embeddings
attn_weights, attn_out = simple_attention(Q, K, V)

print("Attention weights for 'orange':")
for idx, token in enumerate(tokens):
    print(f"{token:10s}: {attn_weights[0, idx]:.3f}")

# Shows how much "orange" (the first word) pays attention to each token (including itself).