# Week 7-8: Deep Learning and AI
Welcome to Week 7-8! We’re diving into Deep Learning and AI, a fascinating leap from traditional machine learning. This guide will take you through the essentials of deep learning, from basic neural networks to advanced architectures, with practical examples using Python. Let’s get started!
## 1. What is Deep Learning?
**Deep learning** is a subset of machine learning that uses neural networks with multiple layers (hence "deep") to model complex patterns in data. It’s incredibly powerful for tasks like:
**Image Recognition:** Identifying objects in photos.

**Natural Language Processing (NLP):** Understanding and generating text.

**Game Playing:** Mastering games like Go or Chess.

Unlike traditional machine learning, deep learning automatically learns features from raw data, reducing the need for manual feature engineering.



## 2. Neural Networks: The Foundation
### How Neural Networks Work
**Neural networks** are inspired by the human brain, consisting of layers of interconnected nodes (neurons). Each connection has a weight that’s adjusted during training to improve predictions.
#### Key Components
**Input Layer:** Where data enters (e.g., pixel values of an image).

**Hidden Layers:** Perform computations to extract features. More layers = deeper network.

**Output Layer:** Produces the final prediction (e.g., a class label).

#### Training Process
**Forward Propagation:** Data passes through the network to generate predictions.

**Loss Function:** Measures the error between predictions and actual values (e.g., Cross-Entropy Loss for classification).

**Backpropagation:** Adjusts weights using gradients to minimize the loss.

**Optimizer:** Updates weights (e.g., Adam, SGD).

#### Activation Functions
These introduce non-linearity, enabling the network to learn complex patterns:
**ReLU (Rectified Linear Unit):** f(x) = max(0, x) – Fast and avoids vanishing gradients.

**Sigmoid:** f(x) = 1 / (1 + e^-x) – Outputs 0 to 1, great for binary classification.

**Softmax:** Converts outputs to probabilities, used for multi-class classification.



In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models

# Load and preprocess MNIST dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0  # Normalize to [0, 1]

# Build the model
model = models.Sequential([
    layers.Flatten(input_shape=(28, 28)),  # Flatten 28x28 images
    layers.Dense(128, activation='relu'),  # Hidden layer with 128 neurons
    layers.Dense(10, activation='softmax')  # Output layer for 10 digits
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=5)

# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc:.4f}")

### Key Steps Explained
1.	**Data Preprocessing:** Normalize pixel values to [0, 1] for faster convergence.
2.	**Model Architecture:** A simple feedforward network with one hidden layer.
3.	**Compilation:** Choose an optimizer, loss function, and metrics.
4.	**Training:** Fit the model to the training data over multiple epochs.
5.	**Evaluation:** Test the model on unseen data to assess generalization.
#### Tips
•	**Epochs:** Number of times the model sees the entire dataset. Too many can lead to overfitting.
•	**Batch Size:** Number of samples processed before updating weights (default is 32).
•	**Learning Rate:** Controls how much weights are adjusted; too high can cause instability.


This code builds a simple neural network to classify handwritten digits from the MNIST dataset. Here's what's happening:
#### 1.	Data Loading and Preprocessing:
o	We load the MNIST dataset, which consists of 28x28 grayscale images of handwritten digits (0-9).
o	We normalize the pixel values to be between 0 and 1.
#### 2.	Model Building:
o	We use a Sequential model, which is a linear stack of layers.
o	The Flatten layer converts the 2D images into 1D arrays.
o	The Dense layers are fully connected layers. The first one has 128 neurons with ReLU activation, and the second one has 10 neurons (one for each digit) with softmax activation.
#### 3.	Model Compilation:
o	We specify the optimizer (adam), the loss function (sparse_categorical_crossentropy for multi-class classification), and the metrics to track (accuracy).
#### 4.	Model Training:
o	We train the model on the training data for 5 epochs (iterations over the entire dataset).
#### 5.	Model Evaluation:
o	We evaluate the model's performance on the test data and print the accuracy.
This is a basic example, but it illustrates the fundamental steps in building and training a neural network.


## 3. Convolutional Neural Networks (CNNs): Image Processing
#### What are CNNs?
**CNNs** are specialized neural networks for processing grid-like data, such as images. They use convolutional layers to automatically learn spatial hierarchies of features, making them ideal for tasks like image classification and object detection.
#### Key Components
•	**Convolutional Layers:** Apply filters to detect patterns (e.g., edges, textures).
•	**Pooling Layers:** Reduce spatial dimensions (e.g., max pooling) to decrease computation and prevent overfitting.
•	**Fully Connected Layers:** Perform classification based on features extracted by convolutional layers.
**Example:** CNN for Image Classification
Here’s a CNN for classifying CIFAR-10 images (10 classes like cats, dogs, etc.):


In [None]:
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import cifar10

# Load and preprocess CIFAR-10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0  # Normalize

# Build the CNN
model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Compile and train
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))


### Tips:
•	Increase epochs or filters for better accuracy.
•	Add data augmentation (e.g., rotation) to improve generalization.

### Key Tips
•	**Filters:** Increase the number of filters in deeper layers to capture more complex features.
•	**Pooling:** Use max pooling to retain the most important features.
•	**Data Augmentation:** Apply transformations (e.g., rotation, flipping) to increase dataset diversity.
•	**Use Case:** Image classification, object detection, facial recognition.


## 4. Recurrent Neural Networks (RNNs): Sequence Modeling
### What are RNNs?
**RNNs** are designed for sequential data (e.g., time series, text) by maintaining a "memory" of previous inputs through loops in the network. They’re ideal for tasks like language modeling or stock price prediction.
### Key Variants
•	**Long Short-Term Memory (LSTM):** Addresses the vanishing gradient problem, allowing the network to learn long-term dependencies.
•	**Gated Recurrent Unit (GRU):** A simpler alternative to LSTM with similar performance.
**Example:** LSTM for Text Classification
Here’s an LSTM for sentiment analysis on the IMDB movie review dataset:


In [None]:
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence

# Load and preprocess IMDB data
max_features = 10000  # Vocabulary size
maxlen = 500  # Max sequence length
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)

# Build the LSTM model
model = models.Sequential([
    layers.Embedding(max_features, 128),
    layers.LSTM(128, dropout=0.2, recurrent_dropout=0.2),
    layers.Dense(1, activation='sigmoid')
])

# Compile and train
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# model.fit(x_train, y_train, epochs=3, batch_size=32, validation_data=(x_test, y_test))


### Explanation:
•	**Embedding:** Converts words to dense vectors.
•	**Dropout:** Prevents overfitting by randomly disabling neurons
### Key Tips
•	**Padding:** Use pad_sequences to ensure all inputs have the same length.
•	**Dropout:** Apply dropout to prevent overfitting in RNNs.
•	**Use Case:** Text classification, language translation, time series forecasting.


## 5. Transfer Learning: Leveraging Pre-trained Models
#### What is Transfer Learning?
**Transfer learning** involves using a pre-trained model (trained on a large dataset like ImageNet) and fine-tuning it for a specific task. This is especially useful when you have limited data.
#### How It Works
•	**Feature Extraction:** Use the pre-trained model’s layers to extract features, then train a new classifier on top.
•	**Fine-Tuning:** Unfreeze some layers of the pre-trained model and train them on your data for better performance.
**Example:** Transfer Learning with VGG16
Here’s how to use VGG16 for image classification:


In [None]:
from tensorflow.keras.applications import VGG16
from tensorflow.keras import layers, models

# Load pre-trained VGG16 model (without the top layer)
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# Freeze the base model layers
base_model.trainable = False

# Add custom layers on top
model = models.Sequential([
    base_model,
    layers.Flatten(),
    layers.Dense(256, activation='relu'),
    layers.Dense(10, activation='softmax')  # Assuming 10 classes
])

# Compile and train
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# model.fit(x_train, y_train, epochs=5, batch_size=32)


### Tips:
•	Resize your images to 224x224 for VGG16.
•	Unfreeze some layers later for fine-tuning.
### Key Tips
•	**Freeze Layers:** Prevent the pre-trained layers from updating during initial training.
•	**Fine-Tune:** Unfreeze some layers later for better accuracy.
•	**Use Case:** Image classification with small datasets, NLP tasks with BERT.


## 6. Generative Models: Creating New Data
#### What are Generative Models?
**Generative models** learn to create new data that resembles the training data. They’re used for tasks like image generation, style transfer, and data augmentation.
#### Key Types
•	**Generative Adversarial Networks (GANs):** Consist of a generator (creates fake data) and a discriminator (tries to distinguish real from fake). They compete, improving each other.
•	**Variational Autoencoders (VAEs):** Encode data into a latent space and decode it back, useful for generating new samples and anomaly detection.
**Example: Simple GAN (Conceptual)**


In [None]:
# Pseudocode for a GAN
generator = build_generator()
discriminator = build_discriminator()

# Train the GAN
for epoch in range(epochs):
    # Train discriminator on real and fake data
    # Train generator to fool the discriminator


### Key Tips
•	**Training Stability:** GANs can be tricky to train; use techniques like batch normalization and careful hyperparameter tuning.
•	**Use Case:** Image generation, data augmentation, art creation.
