# AI with Python - Key Learning Summary

This notebook provides a comprehensive summary of the key learning points from my AI course. It's organized chronologically to follow the learning journey from data analysis to advanced AI techniques, with detailed technical information about machine learning concepts.

## Table of Contents

### Data Fundamentals
1. [Data Analysis and Preparation](#data-analysis)
2. [Data Normalization Techniques](#data-normalization)

### Machine Learning Fundamentals
3. [Linear Regression](#linear-regression)
4. [Logistic Regression](#logistic-regression)
5. [Decision Trees](#decision-trees)
6. [Support Vector Machines](#svm)
7. [K-Means Clustering](#kmeans)

### Neural Networks
8. [Neural Network Fundamentals](#nn-fundamentals)
9. [Activation Functions](#activation-functions)
10. [Loss Functions](#loss-functions)
11. [Neural Networks for Regression](#nn-regression)
12. [Neural Networks for Classification](#nn-classification)
13. [Neural Network Layers](#nn-layers)

### Advanced Techniques
14. [Hyperparameter Tuning](#hyperparameter-tuning)
15. [Model Optimization Strategies](#model-optimization)
16. [Data Augmentation](#data-augmentation)
17. [Transfer Learning](#transfer-learning)
18. [Model Deployment](#model-deployment)

In [None]:
# Install necessary packages (uncomment if needed)
# !pip install pandas numpy matplotlib scikit-learn tensorflow keras seaborn

In [None]:
# Import common libraries used throughout this summary
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# For machine learning
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, classification_report, confusion_matrix

# Set plot styling
plt.style.use('ggplot')
sns.set()

# For reproducible results
np.random.seed(42)

<a id='data-analysis'></a>
## 1. Data Analysis and Preparation

### Key Learning Points:
- **Data visualization** is crucial for understanding your dataset
- **Handling missing values** is a critical preprocessing step
- **Categorical data** must be encoded for machine learning algorithms
- **Feature correlation** helps identify important variables

In [None]:
# Example: Load and explore a sample dataset
# Replace with your own dataset or use a built-in one
from sklearn.datasets import fetch_california_housing

# Load sample data
housing = fetch_california_housing()
df = pd.DataFrame(housing.data, columns=housing.feature_names)
df['PRICE'] = housing.target

# Display first few rows
print("Dataset shape:", df.shape)
df.head()

In [None]:
# Visualize data distributions
df.hist(figsize=(15, 10), bins=30)
plt.tight_layout()
plt.show()

In [None]:
# Correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Feature Correlation Matrix')
plt.show()

### Handling Missing Values

Common strategies for dealing with missing values include:
1. **Removing rows** with missing values (good when few instances have NaNs)
2. **Filling with statistics** like mean, median, or mode
3. **Predicting missing values** using other features

In [None]:
# Example: Create a sample dataframe with missing values
sample_df = pd.DataFrame({
    'A': [1, 2, np.nan, 4, 5],
    'B': [np.nan, 2, 3, 4, 5],
    'C': [1, 2, 3, 4, np.nan]
})

print("Original DataFrame with NaN values:")
print(sample_df)

# Strategy 1: Drop rows with any NaN
print("\nAfter dropping rows with NaN:")
print(sample_df.dropna())

# Strategy 2: Fill NaN with mean
print("\nAfter filling NaN with column means:")
print(sample_df.fillna(sample_df.mean()))

### Handling Categorical Data

Most machine learning algorithms require numerical input. Two common methods to convert categorical data are:
1. **Label Encoding** - Assign a number to each category 
2. **One-Hot Encoding** - Create binary columns for each category

In [None]:
### Handling Categorical Data

Most machine learning algorithms require numerical input. Two common methods to convert categorical data are:
1. **Label Encoding** - Assign a number to each category 
2. **One-Hot Encoding** - Create binary columns for each category
3. **Target Encoding** - Replace categories with the mean target value for that category
4. **Binary Encoding** - Convert the category to binary code, then use the digits as features

<a id='data-normalization'></a>
## 2. Data Normalization Techniques

### Key Learning Points:
- Data normalization is crucial for many machine learning algorithms
- It ensures all features contribute equally to model predictions
- Different techniques have different use cases and properties
- Normalization speeds up training and improves model convergence

In [None]:
# Example: Different normalization techniques
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler, Normalizer
import numpy as np
import matplotlib.pyplot as plt

# Create sample data with outliers
np.random.seed(42)
data = np.random.normal(0, 1, 1000)
# Add outliers
data = np.append(data, [10, -10, 8, -9])
data = data.reshape(-1, 1)  # Reshape for sklearn

# Apply different scalers
scalers = {
    'Raw Data': None,
    'Min-Max Scaling': MinMaxScaler(),
    'Z-Score Normalization': StandardScaler(),
    'Robust Scaling': RobustScaler(),
    'L2 Normalization': Normalizer()
}

# Create figure for visualization
plt.figure(figsize=(15, 10))

# Plot each scaling method
for i, (name, scaler) in enumerate(scalers.items()):
    plt.subplot(3, 2, i+1)
    
    if scaler:
        scaled_data = scaler.fit_transform(data)
    else:
        scaled_data = data
        
    plt.hist(scaled_data, bins=50)
    plt.title(name)
    plt.xlim(-3, 3) if name != 'Raw Data' else plt.xlim(-12, 12)
    
plt.tight_layout()
plt.show()

# Print statistics before and after scaling
print("Raw data statistics:")
print(f"  Min: {data.min():.2f}, Max: {data.max():.2f}")
print(f"  Mean: {data.mean():.2f}, Std: {data.std():.2f}")

scaled = StandardScaler().fit_transform(data)
print("\nStandardized data statistics:")
print(f"  Min: {scaled.min():.2f}, Max: {scaled.max():.2f}")
print(f"  Mean: {scaled.mean():.2f}, Std: {scaled.std():.2f}")

### Scaling Techniques Comparison

| Technique | Formula | Use Case | Pros | Cons |
|-----------|---------|----------|------|------|
| **Min-Max Scaling** | $(x - min) / (max - min)$ | Neural networks, when bounded output is needed | Preserves shape, maps to fixed range [0,1] | Sensitive to outliers |
| **Z-Score Normalization** | $(x - mean) / std$ | SVM, Linear/Logistic Regression | Handles outliers better than Min-Max | Output not bounded, may exceed desired range |
| **Robust Scaling** | $(x - median) / IQR$ | When outliers are present | Very robust to outliers | Not as widely used as Z-score |
| **L2 Normalization** | $x / \sqrt{\sum{x^2}}$ | Text processing, when direction matters | Preserves direction | Changes magnitude completely |

Choosing the right scaling technique depends on:
1. The nature of your data (outliers, distribution)
2. The algorithm you're using (some require specific scaling)
3. The importance of interpretability

In [None]:
# Example: Simple Linear Regression
from sklearn.linear_model import LinearRegression

# Create sample height-weight data
np.random.seed(42)
heights = np.random.normal(170, 10, 100)  # Mean 170cm, std 10cm
weights = heights * 0.6 + np.random.normal(0, 5, 100)  # Weight = height*0.6 + noise

# Reshape for scikit-learn (should be 2D)
X = heights.reshape(-1, 1)  # Independent variable (height)
y = weights  # Dependent variable (weight)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Model coefficient (slope): {model.coef_[0]:.4f}")
print(f"Model intercept: {model.intercept_:.4f}")
print(f"Mean Squared Error: {mse:.4f}")
print(f"R² Score: {r2:.4f}")

In [None]:
# Visualize the linear regression
plt.figure(figsize=(10, 6))
plt.scatter(X_test, y_test, color='blue', label='Actual data')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Linear model')
plt.xlabel('Height (cm)')
plt.ylabel('Weight (kg)')
plt.title('Height vs Weight: Linear Regression')
plt.legend()
plt.show()

### Multiple Linear Regression

In [None]:
# Example: Multiple Linear Regression
from sklearn.preprocessing import StandardScaler

# Use California housing dataset
X = df.drop('PRICE', axis=1)
y = df['PRICE']

# Scale the features (important for multiple regression)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Create and train the model
multi_model = LinearRegression()
multi_model.fit(X_train, y_train)

# Make predictions
y_pred = multi_model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.4f}")
print(f"R² Score: {r2:.4f}")

# Display feature importance (coefficients)
coef_df = pd.DataFrame({'Feature': X.columns, 'Coefficient': multi_model.coef_})
coef_df = coef_df.sort_values('Coefficient', ascending=False)
print("\nFeature Importance:")
print(coef_df)

<a id='logistic-regression'></a>
## 3. Logistic Regression

### Key Learning Points:
- Used for **binary classification** problems
- Predicts probability using a **sigmoid function** (values between 0 and 1)
- Can be extended to multi-class classification
- Evaluation uses **accuracy, precision, recall, F1-score**
- Key parameter: `max_iter` (increase for complex datasets)

In [None]:
# Example: Logistic Regression
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer

# Load breast cancer dataset (binary classification)
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Create and train the model
log_model = LogisticRegression(max_iter=1000)  # Increased for convergence
log_model.fit(X_train, y_train)

# Make predictions
y_pred = log_model.predict(X_test)
y_prob = log_model.predict_proba(X_test)  # Probability estimates

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

In [None]:
# Visualize confusion matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['Malignant', 'Benign'],
            yticklabels=['Malignant', 'Benign'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

### Making Predictions with Logistic Regression

In [None]:
# Example: Making a prediction for a new data point

# Get feature names
feature_names = cancer.feature_names

# Create a new hypothetical sample (using mean values)
new_sample = X[0].reshape(1, -1)  # Using the first sample as an example

# Scale the new sample using the same scaler
new_sample_scaled = scaler.transform(new_sample)

# Get prediction
prediction = log_model.predict(new_sample_scaled)
probability = log_model.predict_proba(new_sample_scaled)

print(f"Predicted class: {prediction[0]} ({'Benign' if prediction[0] == 1 else 'Malignant'})")
print(f"Probability of malignant: {probability[0][0]:.4f}")
print(f"Probability of benign: {probability[0][1]:.4f}")

<a id='decision-trees'></a>
## 4. Decision Trees

### Key Learning Points:
- Decision trees make predictions by following a series of decisions
- They can handle both **classification and regression** problems
- Decision trees are easy to interpret and visualize
- Risk of overfitting if tree is too deep
- Hyperparameters like `max_depth` control model complexity

In [None]:
# Example: Decision Tree Classifier
from sklearn.tree import DecisionTreeClassifier, plot_tree

# Use breast cancer dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the model
dt_model = DecisionTreeClassifier(max_depth=3, random_state=42)  # Limiting depth to prevent overfitting
dt_model.fit(X_train, y_train)

# Make predictions
y_pred = dt_model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

In [None]:
# Visualize the decision tree
plt.figure(figsize=(20, 10))
plot_tree(dt_model, feature_names=cancer.feature_names, 
          class_names=['Malignant', 'Benign'], filled=True, rounded=True)
plt.title('Decision Tree for Breast Cancer Classification')
plt.show()

### Feature Importance in Decision Trees

In [None]:
# Display feature importance from decision tree
importances = pd.DataFrame({'Feature': cancer.feature_names, 'Importance': dt_model.feature_importances_})
importances = importances.sort_values('Importance', ascending=False).head(10)

plt.figure(figsize=(10, 6))
sns.barplot(x='Importance', y='Feature', data=importances)
plt.title('Top 10 Feature Importance in Decision Tree')
plt.show()

<a id='nn-regression'></a>
## 5. Neural Networks for Regression

### Key Learning Points:
- Neural networks can model complex non-linear relationships
- Architecture design is crucial (number of layers, neurons per layer)
- **Activation functions** add non-linearity
- Common activations for hidden layers: ReLU, tanh, sigmoid
- Output layer for regression: Linear activation (or none)

In [None]:
# Example: Neural Network for Regression
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Create a simple temperature conversion dataset (°F to °C)
fahrenheit = np.linspace(-200, 200, 100)
celsius = (fahrenheit - 32) * 5/9

# Reshape for neural network input
X = fahrenheit.reshape(-1, 1)  # Input: temperature in Fahrenheit
y = celsius  # Output: temperature in Celsius

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build a neural network model
model = keras.Sequential([
    layers.Dense(16, activation='relu', input_shape=(1,)),  # Hidden layer with 16 neurons
    layers.Dense(8, activation='relu'),                     # Another hidden layer
    layers.Dense(1)                                         # Output layer (no activation for regression)
])

# Compile the model
model.compile(
    optimizer='adam',
    loss='mse',  # Mean squared error for regression
    metrics=['mae']  # Mean absolute error
)

# Display model summary
model.summary()

In [None]:
# Train the model
history = model.fit(
    X_train, y_train,
    epochs=100,
    batch_size=16,
    validation_split=0.2,
    verbose=0  # Set to 1 to see training progress
)

# Evaluate on test data
loss, mae = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Mean Absolute Error: {mae:.4f}")

In [None]:
# Visualize training history
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Loss over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history['mae'], label='Training MAE')
plt.plot(history.history['val_mae'], label='Validation MAE')
plt.title('Mean Absolute Error over Epochs')
plt.xlabel('Epoch')
plt.ylabel('MAE')
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
# Make predictions and visualize results
y_pred = model.predict(X_test)

plt.figure(figsize=(10, 6))
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.scatter(X_test, y_pred, color='red', label='Predicted')
plt.xlabel('Temperature (°F)')
plt.ylabel('Temperature (°C)')
plt.title('Neural Network: Fahrenheit to Celsius Conversion')
plt.legend()
plt.show()

<a id='nn-classification'></a>
## 6. Neural Networks for Classification

### Key Learning Points:
- Output layer uses **softmax activation** for multi-class classification
- Loss function is typically **categorical cross-entropy**
- Convolutional Neural Networks (CNNs) are ideal for image data
- Data preprocessing: normalization, resizing, augmentation
- Transfer learning leverages pre-trained models

In [None]:
# Example: Neural Network for Classification (MNIST digits)
from tensorflow.keras.datasets import mnist

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Data preprocessing
# Normalize pixel values to range [0, 1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Display a sample image
plt.figure(figsize=(6, 6))
plt.imshow(x_train[0], cmap='gray')
plt.title(f'Label: {y_train[0]}')
plt.axis('off')
plt.show()

# Reshape images for the model (flatten)
x_train_flat = x_train.reshape(x_train.shape[0], 28*28)
x_test_flat = x_test.reshape(x_test.shape[0], 28*28)

In [None]:
# Build a simple neural network for digit classification
model = keras.Sequential([
    layers.Dense(128, activation='relu', input_shape=(28*28,)),
    layers.Dropout(0.2),  # Prevent overfitting
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(10, activation='softmax')  # 10 output classes (digits 0-9)
])

# Compile the model
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',  # For integer labels
    metrics=['accuracy']
)

# Display model summary
model.summary()

In [None]:
# Train the model
history = model.fit(
    x_train_flat, y_train,
    epochs=10,
    batch_size=128,
    validation_split=0.1,
    verbose=1
)

In [None]:
# Evaluate on test data
test_loss, test_acc = model.evaluate(x_test_flat, y_test)
print(f"Test accuracy: {test_acc:.4f}")

In [None]:
# Visualize predictions
def plot_prediction(i, model):
    # Make prediction
    prediction = model.predict(x_test_flat[i:i+1])[0]
    predicted_label = np.argmax(prediction)
    true_label = y_test[i]
    
    # Create plot
    plt.figure(figsize=(12, 4))
    
    # Display image
    plt.subplot(1, 2, 1)
    plt.imshow(x_test[i], cmap='gray')
    plt.title(f'True: {true_label}, Predicted: {predicted_label}')
    plt.axis('off')
    
    # Display probability distribution
    plt.subplot(1, 2, 2)
    bars = plt.bar(range(10), prediction)
    plt.xticks(range(10))
    plt.xlabel('Digit')
    plt.ylabel('Probability')
    plt.title('Prediction Probabilities')
    
    # Highlight correct and predicted
    bars[true_label].set_color('green')
    if predicted_label != true_label:
        bars[predicted_label].set_color('red')
    
    plt.tight_layout()
    plt.show()

# Show predictions for a few examples
for i in range(5):
    plot_prediction(i, model)

### Convolutional Neural Networks (CNNs)

In [None]:
# Example: CNN for MNIST
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten

# Reshape data for CNN (add channel dimension)
x_train_cnn = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test_cnn = x_test.reshape(x_test.shape[0], 28, 28, 1)

# Build CNN model
cnn_model = keras.Sequential([
    # Convolutional layers
    Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D(pool_size=(2, 2)),
    Conv2D(64, kernel_size=(3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    
    # Flatten and dense layers
    Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(10, activation='softmax')
])

# Compile the model
cnn_model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Display model summary
cnn_model.summary()

Note: Training the CNN model would require more computational resources and time. In a real setting, you would run:

```python
# Train the CNN model
cnn_history = cnn_model.fit(
    x_train_cnn, y_train,
    epochs=10,
    batch_size=128,
    validation_split=0.1
)

# Evaluate on test data
cnn_test_loss, cnn_test_acc = cnn_model.evaluate(x_test_cnn, y_test)
print(f"CNN Test accuracy: {cnn_test_acc:.4f}")
```

CNNs typically achieve higher accuracy (>99%) on MNIST compared to simple neural networks.

<a id='svm'></a>
## 7. Support Vector Machines

### Key Learning Points:
- SVMs find the optimal hyperplane that separates classes
- **Kernel functions** transform data into higher dimensions
- Common kernels: linear, rbf (radial basis function), polynomial
- Key parameters: `C` (regularization) and `gamma` (kernel coefficient)
- Effective for high-dimensional data with relatively few samples

In [None]:
# Example: Support Vector Machine
from sklearn import svm

# Use breast cancer dataset
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Create and train models with different kernels
kernels = ['linear', 'rbf', 'poly']
models = {}
scores = {}

for kernel in kernels:
    # Create and train the model
    models[kernel] = svm.SVC(kernel=kernel, gamma='auto', probability=True)
    models[kernel].fit(X_train, y_train)
    
    # Evaluate
    y_pred = models[kernel].predict(X_test)
    scores[kernel] = accuracy_score(y_test, y_pred)
    
    print(f"SVM with {kernel} kernel - Accuracy: {scores[kernel]:.4f}")
    print(classification_report(y_test, y_pred))
    print("---")

In [None]:
# Compare kernel performance
plt.figure(figsize=(10, 6))
plt.bar(scores.keys(), scores.values())
plt.ylim(0.9, 1.0)  # Focus on the relevant range
plt.title('SVM Performance with Different Kernels')
plt.xlabel('Kernel')
plt.ylabel('Accuracy')
for kernel, score in scores.items():
    plt.text(kernel, score, f"{score:.4f}", ha='center')
plt.show()

### SVM Parameter Tuning

In [None]:
# Example: SVM Parameter Tuning
from sklearn.model_selection import GridSearchCV

# Define parameter grid
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [1, 0.1, 0.01, 0.001],
    'kernel': ['rbf']
}

# Create the grid search model
grid = GridSearchCV(svm.SVC(), param_grid, refit=True, verbose=0, cv=5)
grid.fit(X_train, y_train)

# Best parameters and score
print(f"Best parameters: {grid.best_params_}")
print(f"Best cross-validation score: {grid.best_score_:.4f}")

# Evaluate with best model
best_model = grid.best_estimator_
y_pred = best_model.predict(X_test)
print(f"Test accuracy with tuned model: {accuracy_score(y_test, y_pred):.4f}")

<a id='kmeans'></a>
## 8. K-Means Clustering

### Key Learning Points:
- **Unsupervised learning** algorithm for finding groups in data
- Groups data into `k` clusters based on similarity
- The algorithm iteratively assigns points to the nearest cluster centroid
- Determining optimal `k` can use the Elbow Method or Silhouette Score
- Preprocessing (scaling) is important for K-means

In [None]:
# Example: K-means Clustering
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

# Generate sample data with 4 clusters
X, y = make_blobs(n_samples=500, centers=4, random_state=42)

# Visualize the data
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis')
plt.title('Generated Dataset with 4 Clusters')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

In [None]:
# Find optimal number of clusters using the Elbow Method
inertia = []
k_range = range(1, 10)

for k in k_range:
    model = KMeans(n_clusters=k, random_state=42)
    model.fit(X)
    inertia.append(model.inertia_)

# Plot Elbow Method
plt.figure(figsize=(10, 6))
plt.plot(k_range, inertia, 'o-')
plt.xlabel('Number of Clusters (k)')
plt.ylabel('Inertia (Sum of Squared Distances)')
plt.title('Elbow Method for Optimal k')
plt.grid(True)
plt.show()

In [None]:
# Apply K-means with the optimal number of clusters
optimal_k = 4  # From the elbow method
kmeans = KMeans(n_clusters=optimal_k, random_state=42)
cluster_labels = kmeans.fit_predict(X)

# Get cluster centers
centers = kmeans.cluster_centers_

# Visualize the clusters
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=cluster_labels, cmap='viridis', alpha=0.7)
plt.scatter(centers[:, 0], centers[:, 1], c='red', marker='X', s=200, label='Centroids')
plt.title(f'K-means Clustering (k={optimal_k})')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()

<a id='nn-fundamentals'></a>
## 8. Neural Network Fundamentals

### Key Learning Points:
- Neural networks are inspired by the human brain's structure
- They consist of interconnected layers of artificial neurons
- Each connection has a **weight** and each neuron has a **bias**
- Neural networks learn by updating weights and biases through backpropagation
- Deep Learning refers to neural networks with many hidden layers
- Neural networks excel at finding complex patterns in data

In [None]:
# Create a visual representation of a simple neural network
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Circle, Rectangle, FancyArrowPatch

def draw_neural_network(ax, layer_sizes, weights=None):
    """Draw a neural network diagram on a matplotlib axis."""
    v_spacing = 1
    h_spacing = 1.5
    
    # Draw nodes
    for n, layer_size in enumerate(layer_sizes):
        layer_name = "Input" if n == 0 else "Hidden" if n < len(layer_sizes)-1 else "Output"
        for m in range(layer_size):
            x = n * h_spacing
            y = (layer_size - m - 1) * v_spacing
            circle = Circle((x, y), 0.3, fill=False)
            ax.add_patch(circle)
            
            # Add labels inside the circles
            if n == 0:
                ax.text(x, y, f"$x_{m+1}$", ha='center', va='center')
            elif n == len(layer_sizes)-1:
                ax.text(x, y, f"$y_{m+1}$", ha='center', va='center')
            else:
                ax.text(x, y, f"$h_{n},{m+1}$", ha='center', va='center')
                
        # Add layer labels below
        ax.text(n * h_spacing, -1, f"{layer_name} Layer", ha='center')
    
    # Draw edges
    if weights is None:
        # Generate random weights for visualization
        weights = []
        for i in range(len(layer_sizes) - 1):
            weights.append(np.random.randn(layer_sizes[i], layer_sizes[i+1]))
    
    for n, (layer_size_a, layer_size_b) in enumerate(zip(layer_sizes[:-1], layer_sizes[1:])):
        for i in range(layer_size_a):
            for j in range(layer_size_b):
                angle = 0
                x = n * h_spacing
                y = (layer_size_a - i - 1) * v_spacing
                x_end = (n + 1) * h_spacing
                y_end = (layer_size_b - j - 1) * v_spacing
                
                # Set edge color based on weight value
                if n < len(weights):
                    color = 'red' if weights[n][i, j] < 0 else 'blue'
                    line_width = abs(weights[n][i, j]) + 0.5
                else:
                    color = 'black'
                    line_width = 1.0
                    
                ax.add_patch(FancyArrowPatch((x, y), (x_end, y_end), 
                                            arrowstyle='->', 
                                            color=color,
                                            linewidth=line_width,
                                            mutation_scale=10))
    
    # Set axis properties
    ax.set_xlim(-0.5, len(layer_sizes) * h_spacing + 0.5)
    ax.set_ylim(-1.5, max(layer_sizes) * v_spacing + 0.5)
    ax.axis('off')

# Create example neural network
fig, ax = plt.subplots(figsize=(12, 8))
# Input, Hidden, Output layer sizes
draw_neural_network(ax, [3, 4, 2])
plt.title('Simple Neural Network Architecture', fontsize=15)
plt.show()

# Create example for forward propagation
print("Neural Network Forward Propagation")
print("---------------------------------")
print("For each neuron in a layer:")
print("1. Multiply each input by its corresponding weight")
print("2. Sum all weighted inputs and add bias")
print("3. Apply activation function to the sum")
print("\nMathematically:")
print("z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b")
print("a = activation(z)")
print("\nwhere:")
print("z = weighted sum + bias")
print("a = activation (output of the neuron)")
print("w = weights")
print("x = inputs")
print("b = bias")

### Weights and Biases

Neural networks learn by adjusting two types of parameters:

1. **Weights** (w): Control the strength of the connection between neurons
   - Positive weights amplify signals
   - Negative weights inhibit signals
   - Larger absolute values indicate stronger influence

2. **Biases** (b): Allow the neuron to shift its activation function
   - Act like an intercept term in linear regression
   - Help the network learn the threshold of activation
   - Without biases, all neurons would be inactive when all inputs are zero

### Backpropagation

The method neural networks use to learn from data:

1. **Forward Pass**: Calculate predictions using current weights
2. **Calculate Loss**: Measure error between predictions and actual values
3. **Backward Pass**: Compute gradients of weights with respect to the loss
4. **Update Weights**: Adjust weights to reduce the loss
   - w = w - learning_rate * gradient

This process repeats over many iterations (epochs) until the model converges.

<a id='activation-functions'></a>
## 9. Activation Functions

### Key Learning Points:
- Activation functions introduce non-linearity into neural networks
- Without them, neural networks would behave like linear models
- Each activation function has unique properties and use cases
- The choice of activation function affects training dynamics and model performance

In [None]:
# Visualize common activation functions
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

# Create input values
x = np.linspace(-5, 5, 1000)

# Define activation functions
activations = {
    'Linear': lambda x: x,
    'ReLU': lambda x: np.maximum(0, x),
    'Sigmoid': lambda x: 1 / (1 + np.exp(-x)),
    'Tanh': lambda x: np.tanh(x),
    'Leaky ReLU': lambda x: np.where(x > 0, x, x * 0.1),
    'ELU': lambda x: np.where(x > 0, x, np.exp(x) - 1),
    'Softmax': lambda x: np.exp(x) / np.sum(np.exp(x))  # Simplified - not really for plotting
}

# Plot activation functions
plt.figure(figsize=(18, 12))
for i, (name, activation) in enumerate(activations.items()):
    if name != 'Softmax':  # Skip softmax as it's only meaningful for vectors
        plt.subplot(3, 2, i+1)
        plt.plot(x, activation(x))
        plt.title(name)
        plt.grid(True)
        plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
        plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
        
plt.tight_layout()
plt.show()

# Create a table explaining activation functions
from IPython.display import display, Markdown

activation_table = """
| Activation | Formula | Range | Use Case | Advantages | Disadvantages |
|------------|---------|-------|----------|------------|---------------|
| **ReLU** | max(0, x) | [0, ∞] | Hidden layers in CNNs, default for many networks | Fast computation, reduces vanishing gradient | Dead neurons problem (never activate) |
| **Sigmoid** | 1/(1+e^(-x)) | (0, 1) | Binary classification, output layer | Smooth gradient, bounded output | Vanishing gradient problem, not zero-centered |
| **Tanh** | (e^x - e^(-x))/(e^x + e^(-x)) | (-1, 1) | Hidden layers when zero-centered output needed | Zero-centered output, bounded | Vanishing gradient problem (less severe than sigmoid) |
| **Leaky ReLU** | max(0.01x, x) | (-∞, ∞) | Hidden layers | Addresses dead neurons problem | Still can suffer saturation in negative region |
| **ELU** | x if x>0 else α(e^x-1) | (-α, ∞) | Hidden layers | Smooth function, addresses dead neurons | More computationally expensive than ReLU |
| **Softmax** | e^xi/Σe^x | (0, 1) | Multi-class classification output | Converts scores to probabilities | Only used in output layer |
"""

display(Markdown(activation_table))

<a id='loss-functions'></a>
## 10. Loss Functions

### Key Learning Points:
- Loss functions measure how well a model performs on training data
- They quantify the difference between predictions and actual values
- Different problems require different loss functions
- Optimization algorithms work to minimize the loss function

In [None]:
# Visualize different loss functions
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.losses import (
    MeanSquaredError, 
    MeanAbsoluteError, 
    BinaryCrossentropy,
    CategoricalCrossentropy,
    Huber
)

# Create true and predicted values
y_true = np.array([0, 0, 1, 1, 1])
y_pred_range = np.linspace(0, 1, 100)
losses = []

# Calculate loss values for different prediction values
for p in y_pred_range:
    # Binary predictions (for binary cross-entropy)
    y_pred_binary = np.array([p, p, p, p, p])
    
    # MSE
    mse = np.mean((y_true - y_pred_binary) ** 2)
    
    # MAE
    mae = np.mean(np.abs(y_true - y_pred_binary))
    
    # Binary Cross-Entropy
    # Adding small epsilon to avoid log(0)
    epsilon = 1e-15
    bce = -np.mean(y_true * np.log(y_pred_binary + epsilon) + 
                  (1 - y_true) * np.log(1 - y_pred_binary + epsilon))
    
    # Huber Loss (delta=1.0)
    delta = 1.0
    error = y_true - y_pred_binary
    huber = np.mean(np.where(np.abs(error) <= delta, 
                           0.5 * error ** 2, 
                           delta * (np.abs(error) - 0.5 * delta)))
    
    losses.append([mse, mae, bce, huber])

losses = np.array(losses)

# Plot loss functions
plt.figure(figsize=(15, 8))

loss_names = ['Mean Squared Error', 'Mean Absolute Error', 
              'Binary Cross-Entropy', 'Huber Loss']

for i, name in enumerate(loss_names):
    plt.subplot(2, 2, i+1)
    plt.plot(y_pred_range, losses[:, i])
    plt.title(name)
    plt.xlabel('Prediction (when true=0 for 2/5, true=1 for 3/5)')
    plt.ylabel('Loss')
    plt.grid(True)
    
plt.tight_layout()
plt.show()

# Display loss function table
from IPython.display import display, Markdown

loss_table = """
| Loss Function | Formula | Use Case | Pros | Cons |
|---------------|---------|----------|------|------|
| **Mean Squared Error (MSE)** | $\\frac{1}{n}\\sum_{i=1}^{n}(y_i - \\hat{y}_i)^2$ | Regression | Penalizes large errors more | Sensitive to outliers |
| **Mean Absolute Error (MAE)** | $\\frac{1}{n}\\sum_{i=1}^{n}\\lvert y_i - \\hat{y}_i\\rvert$ | Regression | More robust to outliers | Provides less gradient near optimum |
| **Binary Cross-Entropy** | $-\\frac{1}{n}\\sum_{i=1}^{n}[y_i\\log(\\hat{y}_i) + (1-y_i)\\log(1-\\hat{y}_i)]$ | Binary classification | Ideal for probability outputs | Unstable with perfect predictions |
| **Categorical Cross-Entropy** | $-\\sum_{i=1}^{n}\\sum_{j=1}^{m}y_{ij}\\log(\\hat{y}_{ij})$ | Multi-class classification | Works well with softmax activation | Computationally expensive |
| **Sparse Categorical CE** | Same as CCE but with integer labels | Multi-class with integer labels | Memory efficient | Same as CCE |
| **Huber Loss** | MSE for small errors, MAE for large errors | Regression | Combines MSE and MAE benefits | Has a hyperparameter to tune |
| **KL Divergence** | $\\sum_{i=1}^{n}p(x_i)\\log\\frac{p(x_i)}{q(x_i)}$ | Comparing distributions | Measures information loss | Asymmetric |
"""

display(Markdown(loss_table))

<a id='nn-layers'></a>
## 13. Neural Network Layers

### Key Learning Points:
- Neural networks consist of different types of layers stacked together
- Each layer type serves a specific purpose and has unique properties
- Modern architectures combine multiple layer types for optimal performance
- Layer choice depends on the data type and task

In [None]:
# Create visualizations for different neural network layers
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras import layers

# Create a function to visualize layer transformations
def visualize_layer_transformation(layer, input_shape, title):
    """Visualize how a layer transforms input data."""
    # Create random input data
    if len(input_shape) == 3:  # For 2D layers (height, width, channels)
        np.random.seed(42)
        input_data = np.random.rand(1, *input_shape)
        
        # Display input
        plt.figure(figsize=(12, 5))
        plt.subplot(1, 2, 1)
        plt.imshow(input_data[0, :, :, 0], cmap='viridis')
        plt.title(f"Input (shape: {input_data.shape[1:]})")
        plt.colorbar()
        
        # Apply layer and display output
        output = layer(input_data).numpy()
        plt.subplot(1, 2, 2)
        if len(output.shape) == 4:  # Convolutional output
            plt.imshow(output[0, :, :, 0], cmap='viridis')
            plt.title(f"Output (shape: {output.shape[1:]})")
        else:  # Flattened output
            plt.bar(range(min(50, output.shape[1])), output[0, :min(50, output.shape[1])])
            plt.title(f"Output (shape: {output.shape[1:]})")
        plt.colorbar()
        
    else:  # For 1D layers
        np.random.seed(42)
        input_data = np.random.rand(1, input_shape[0])
        
        # Display input
        plt.figure(figsize=(12, 5))
        plt.subplot(1, 2, 1)
        plt.bar(range(input_shape[0]), input_data[0])
        plt.title(f"Input (shape: {input_data.shape[1:]})")
        
        # Apply layer and display output
        output = layer(input_data).numpy()
        plt.subplot(1, 2, 2)
        plt.bar(range(output.shape[1]), output[0])
        plt.title(f"Output (shape: {output.shape[1:]})")
    
    plt.suptitle(title)
    plt.tight_layout()
    plt.show()
    
    return output.shape[1:]

# Define various layer types to visualize
# 1. Dense Layer (Fully Connected)
dense_layer = layers.Dense(64, activation='relu')
dense_output_shape = visualize_layer_transformation(dense_layer, (100,), "Dense Layer (Fully Connected)")

# 2. Conv2D Layer (Convolutional)
conv_layer = layers.Conv2D(16, kernel_size=(3, 3), activation='relu', padding='same')
conv_output_shape = visualize_layer_transformation(conv_layer, (28, 28, 1), "Conv2D Layer (Convolutional)")

# 3. MaxPooling2D Layer
pool_layer = layers.MaxPooling2D(pool_size=(2, 2))
pool_output_shape = visualize_layer_transformation(pool_layer, (28, 28, 16), "MaxPooling2D Layer")

# 4. Flatten Layer
flatten_layer = layers.Flatten()
flatten_output_shape = visualize_layer_transformation(flatten_layer, (14, 14, 16), "Flatten Layer")

# 5. Dropout Layer (can't easily visualize the dropout effect)
# Instead, create a table of layer types
from IPython.display import display, Markdown

layer_table = """
| Layer Type | Purpose | Use Cases | Example Parameters |
|------------|---------|-----------|-------------------|
| **Dense (Fully Connected)** | Basic neural network layer with connections between all neurons | General-purpose, final classification layers | `Dense(units=64, activation='relu')` |
| **Conv2D (Convolutional)** | Applies learnable filters to detect features in images | Image classification, object detection | `Conv2D(filters=32, kernel_size=(3,3), activation='relu')` |
| **MaxPooling2D** | Reduces spatial dimensions by taking maximum values | Downsampling feature maps, reducing parameters | `MaxPooling2D(pool_size=(2,2))` |
| **AveragePooling2D** | Reduces spatial dimensions by averaging values | Alternative to max pooling, preserves more information | `AveragePooling2D(pool_size=(2,2))` |
| **Flatten** | Converts multi-dimensional data to 1D | Transitioning from convolutional to dense layers | `Flatten()` |
| **Dropout** | Randomly sets input units to 0 during training | Preventing overfitting | `Dropout(rate=0.5)` |
| **BatchNormalization** | Normalizes layer inputs for each mini-batch | Stabilizes and accelerates training | `BatchNormalization()` |
| **LSTM/GRU** | Processes sequential data with memory | Time series, text, speech processing | `LSTM(units=128, return_sequences=True)` |
| **Embedding** | Maps discrete entities to vectors | Text processing, representing categorical data | `Embedding(input_dim=10000, output_dim=100)` |
| **Add/Concatenate** | Combines outputs from multiple layers | Skip connections, ensemble models | `Concatenate()` or `Add()` |
"""

display(Markdown(layer_table))

<a id='hyperparameter-tuning'></a>
## 14. Hyperparameter Tuning

### Key Learning Points:
- Hyperparameters are model configuration settings set before training
- They control model complexity, learning process, and architecture
- Proper tuning significantly improves model performance
- Different tuning strategies balance computational cost and effectiveness

In [None]:
# Example: Hyperparameter tuning using grid search
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load example dataset
cancer = load_breast_cancer()
X, y = cancer.data, cancer.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define parameter grid for random forest
param_grid = {
    'n_estimators': [10, 50, 100],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10]
}

# Create base model
rf = RandomForestClassifier(random_state=42)

# Create grid search
grid_search = GridSearchCV(
    estimator=rf,
    param_grid=param_grid,
    cv=5,
    n_jobs=-1,
    scoring='accuracy',
    verbose=0
)

# Fit grid search (this would normally take some time)
# We'll simulate results for faster execution
print("Simulating grid search results (in practice you would run grid_search.fit(X_train, y_train))")

# Create simulated results
np.random.seed(42)
results = []
for n_estimators in param_grid['n_estimators']:
    for max_depth in param_grid['max_depth']:
        for min_samples_split in param_grid['min_samples_split']:
            # Create a realistic but simulated accuracy score
            base_score = 0.93
            n_estimator_bonus = n_estimators / 1000  # More trees are better up to a point
            depth_penalty = 0 if max_depth is None else (30 - max_depth) / 100  # Deeper can overfit
            split_effect = (5 - abs(min_samples_split - 5)) / 100  # Best around middle value
            random_effect = np.random.normal(0, 0.01)  # Random variation
            
            score = base_score + n_estimator_bonus - depth_penalty + split_effect + random_effect
            score = min(0.99, max(0.85, score))  # Keep in realistic range
            
            results.append({
                'n_estimators': n_estimators,
                'max_depth': max_depth if max_depth is not None else "None",
                'min_samples_split': min_samples_split,
                'score': score
            })

# Convert to dataframe
results_df = pd.DataFrame(results)
best_params = results_df.loc[results_df['score'].idxmax()]

print(f"\nBest parameters found:")
for param, value in best_params.items():
    if param != 'score':
        print(f"  {param}: {value}")
print(f"Best score: {best_params['score']:.4f}")

# Visualize results
plt.figure(figsize=(15, 10))

# Create heatmap for each n_estimators value
n_estimator_values = sorted(param_grid['n_estimators'])
for i, n_est in enumerate(n_estimator_values):
    plt.subplot(1, len(n_estimator_values), i+1)
    
    # Filter data for this n_estimators value
    subset = results_df[results_df['n_estimators'] == n_est].copy()
    
    # Create pivot table for heatmap
    heatmap_data = subset.pivot_table(
        index='min_samples_split', 
        columns='max_depth', 
        values='score'
    )
    
    # Plot heatmap
    sns.heatmap(heatmap_data, annot=True, cmap='viridis', fmt='.3f')
    plt.title(f'n_estimators = {n_est}')
    plt.xlabel('max_depth')
    plt.ylabel('min_samples_split')

plt.tight_layout()
plt.show()

# Display hyperparameter info
from IPython.display import display, Markdown

hyperparameter_table = """
### Common Hyperparameters by Algorithm

| Algorithm | Key Hyperparameters | Tuning Strategy |
|-----------|---------------------|-----------------|
| **Neural Networks** | Learning rate, batch size, epochs, number of layers, neurons per layer, dropout rate, activation functions | Start with defaults, then gradually adjust one at a time |
| **Random Forest** | Number of trees, max depth, min samples split, max features | More trees usually better (diminishing returns), control depth to prevent overfitting |
| **Gradient Boosting** | Learning rate, number of estimators, max depth, subsample | Small learning rate with many estimators often works well |
| **SVM** | Kernel type, C, gamma | Start with linear kernel, then RBF; search C and gamma on log scale |
| **k-NN** | Number of neighbors (k), distance metric, weights | Try odd values of k to avoid ties, scale features before tuning |

### Tuning Approaches

1. **Manual Tuning**: Adjust parameters based on experience and repeated experiments
2. **Grid Search**: Exhaustively search through a specified parameter grid
3. **Random Search**: Sample random combinations from parameter distributions
4. **Bayesian Optimization**: Build a probabilistic model of the objective function
5. **Genetic Algorithms**: Evolve parameter combinations using principles of natural selection
6. **Automated ML (AutoML)**: Automatically search and optimize hyperparameters

For neural networks, consider these additional techniques:
- Learning rate schedulers
- Early stopping based on validation loss
- Cyclical learning rates
- Weight decay (L2 regularization)
"""

display(Markdown(hyperparameter_table))

<a id='data-augmentation'></a>
## 16. Data Augmentation

### Key Learning Points:
- Data augmentation artificially expands the training dataset
- It creates new training examples by applying transformations to existing data
- Helps prevent overfitting and increases model robustness
- Common in image, audio, and text classification tasks

In [None]:
# Example: Image data augmentation with Keras
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing.image import load_img, img_to_array
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import tensorflow as tf

# Create a simple 3x3 checkered image as a sample
sample_image = np.zeros((100, 100, 3))
for i in range(0, 100, 20):
    for j in range(0, 100, 20):
        if (i // 20 + j // 20) % 2 == 0:
            sample_image[i:i+20, j:j+20, :] = 1.0

# Setup data generator with various augmentations
datagen = ImageDataGenerator(
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

# Reshape for Keras (needs a batch dimension)
x = sample_image.reshape((1,) + sample_image.shape)

# Visualize augmentations
plt.figure(figsize=(15, 15))
plt.subplot(4, 4, 1)
plt.title("Original")
plt.imshow(sample_image)
plt.axis('off')

i = 1
for batch in datagen.flow(x, batch_size=1):
    plt.subplot(4, 4, i+1)
    plt.imshow(batch[0])
    plt.axis('off')
    plt.title(f"Augmentation {i}")
    i += 1
    if i >= 16:
        break
plt.tight_layout()
plt.show()

# Display data augmentation table
from IPython.display import display, Markdown

augmentation_table = """
### Common Data Augmentation Techniques

#### For Images:
| Technique | Description | Implementation | When to Use |
|-----------|-------------|----------------|------------|
| **Rotation** | Rotate image by random angle | `rotation_range` in ImageDataGenerator | Most image tasks |
| **Flipping** | Mirror image horizontally or vertically | `horizontal_flip`, `vertical_flip` | When orientation doesn't change class |
| **Scaling** | Zoom in or out randomly | `zoom_range` | Most image tasks |
| **Translation** | Shift image horizontally or vertically | `width_shift_range`, `height_shift_range` | Object recognition |
| **Color Jittering** | Alter brightness, contrast, saturation | `brightness_range` | When lighting varies |
| **Cutout/Random Erasing** | Blank out random rectangles | Custom implementation | Object detection, classification |

#### For Text:
| Technique | Description | When to Use |
|-----------|-------------|------------|
| **Synonym Replacement** | Replace words with synonyms | Sentiment analysis, classification |
| **Random Insertion/Deletion** | Insert or delete random words | Short text classification |
| **Backtranslation** | Translate to another language and back | When meaning preservation is important |
| **Word Swapping** | Randomly swap adjacent words | Most NLP tasks |

#### For Time Series:
| Technique | Description | When to Use |
|-----------|-------------|------------|
| **Time Warping** | Stretch or compress segments | Most time series tasks |
| **Magnitude Warping** | Adjust the magnitude | Sensor data, signals |
| **Frequency Warping** | Modify frequency components | Audio, vibration data |
| **Jittering** | Add random noise | Robust models for noisy data |

### Data Augmentation Best Practices:
1. **Choose domain-appropriate transformations** - Augmentations should preserve class information
2. **Apply multiple transformations** - Combine techniques for greater variety
3. **Set reasonable ranges** - Too extreme transformations may hurt training
4. **Test impact** - Measure performance with and without augmentation
5. **Consider computational cost** - On-the-fly augmentation can slow training
"""

display(Markdown(augmentation_table))

<a id='transfer-learning'></a>
## 17. Transfer Learning

### Key Learning Points:
- Transfer learning uses knowledge gained from one task to improve performance on another
- Pre-trained models act as feature extractors or starting points for fine-tuning
- Drastically reduces training time and data requirements
- Particularly powerful for image, audio, and language tasks

In [None]:
# Example: Transfer Learning with pre-trained models
import tensorflow as tf
from tensorflow.keras.applications import ResNet50, VGG16, MobileNetV2
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt

# Diagram to show transfer learning concept
def plot_transfer_learning_diagram():
    # Create a figure
    fig, ax = plt.subplots(figsize=(12, 8))
    
    # Hide axes
    ax.axis('off')
    
    # Draw the pre-trained model box
    pretrained = plt.Rectangle((0.1, 0.4), 0.35, 0.4, fill=True, color='lightblue', alpha=0.5)
    ax.add_patch(pretrained)
    ax.text(0.275, 0.65, 'Pre-trained Model\n(e.g., ResNet, VGG)', 
            ha='center', va='center', fontsize=12, fontweight='bold')
    ax.text(0.275, 0.5, 'Trained on ImageNet\n(1M+ images, 1000 classes)', 
            ha='center', va='center', fontsize=10)
    
    # Draw the custom model box
    custom = plt.Rectangle((0.55, 0.4), 0.35, 0.4, fill=True, color='lightgreen', alpha=0.5)
    ax.add_patch(custom)
    ax.text(0.725, 0.65, 'Custom Model\nfor Your Task', 
            ha='center', va='center', fontsize=12, fontweight='bold')
    ax.text(0.725, 0.5, 'Small dataset\n(e.g., 100s of images)', 
            ha='center', va='center', fontsize=10)
    
    # Add arrow connecting them
    ax.annotate('', xy=(0.55, 0.6), xytext=(0.45, 0.6),
                arrowprops=dict(arrowstyle='->', lw=2))
    
    # Add text explaining approaches
    ax.text(0.5, 0.85, 'Transfer Learning Approaches', 
            ha='center', va='center', fontsize=14, fontweight='bold')
    
    # Feature extraction approach
    ax.text(0.5, 0.3, 'Feature Extraction:', 
            ha='center', va='center', fontsize=12, fontweight='bold')
    ax.text(0.5, 0.25, 'Freeze pre-trained layers, replace and retrain only the classifier', 
            ha='center', va='center', fontsize=10)
    
    # Fine-tuning approach
    ax.text(0.5, 0.15, 'Fine-Tuning:', 
            ha='center', va='center', fontsize=12, fontweight='bold')
    ax.text(0.5, 0.1, 'Initialize with pre-trained weights, then retrain some or all layers with a small learning rate', 
            ha='center', va='center', fontsize=10)

    plt.show()

# Show the concept diagram
plot_transfer_learning_diagram()

# Compare popular pre-trained models
models_info = {
    "ResNet50": {
        "year": "2015",
        "parameters": "25M",
        "ImageNet Accuracy": "76.0%",
        "Special Features": "Residual connections to solve vanishing gradient",
        "Usage": "Balanced accuracy/size tradeoff"
    },
    "VGG16": {
        "year": "2014",
        "parameters": "138M",
        "ImageNet Accuracy": "71.3%",
        "Special Features": "Simple, uniform architecture",
        "Usage": "Feature extraction, simple to understand"
    },
    "MobileNetV2": {
        "year": "2018",
        "parameters": "3.5M",
        "ImageNet Accuracy": "71.8%",
        "Special Features": "Inverted residuals and linear bottlenecks",
        "Usage": "Mobile and edge applications"
    },
    "EfficientNetB0": {
        "year": "2019",
        "parameters": "5.3M",
        "ImageNet Accuracy": "77.1%",
        "Special Features": "Compound scaling method",
        "Usage": "Efficient use of parameters"
    },
    "BERT-base": {
        "year": "2018",
        "parameters": "110M",
        "Task": "NLP",
        "Special Features": "Bidirectional training of Transformer",
        "Usage": "Text classification, QA, sentiment analysis"
    }
}

# Create a pandas DataFrame
import pandas as pd
models_df = pd.DataFrame(models_info)
display(models_df.transpose())

# Example code for implementing transfer learning
print("\nExample code for transfer learning with ResNet50:")
code_example = """
# Load pre-trained model (without the top classifier)
base_model = tf.keras.applications.ResNet50(
    weights='imagenet',  # Load weights pre-trained on ImageNet
    include_top=False,   # Don't include the ImageNet classifier at the top
    input_shape=(224, 224, 3)
)

# Freeze the base model
base_model.trainable = False

# Create new model on top
inputs = tf.keras.Input(shape=(224, 224, 3))
# Use the pre-trained model's preprocessing
x = tf.keras.applications.resnet50.preprocess_input(inputs)
# The base model contains multiple layers
x = base_model(x, training=False)
# Convert features to a single 1280-element vector per image
x = tf.keras.layers.GlobalAveragePooling2D()(x)
# Add a dropout layer for regularization
x = tf.keras.layers.Dropout(0.2)(x)
# A Dense classifier with number of classes equal to your task
outputs = tf.keras.layers.Dense(10, activation='softmax')(x)
model = tf.keras.Model(inputs, outputs)

# Compile
model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Train (feature extraction)
history = model.fit(
    train_generator,
    steps_per_epoch=len(train_generator),
    epochs=10,
    validation_data=validation_generator
)

# Fine tuning (optional)
# Unfreeze some layers for fine-tuning
base_model.trainable = True
for layer in base_model.layers[:-10]:  # Freeze all except last 10 layers
    layer.trainable = False

# Compile with a lower learning rate
model.compile(
    optimizer=tf.keras.optimizers.Adam(1e-5),  # Very low learning rate
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Continue training
history_fine = model.fit(
    train_generator,
    steps_per_epoch=len(train_generator),
    epochs=5,
    validation_data=validation_generator
)
"""
print(code_example)

# Display transfer learning best practices
from IPython.display import display, Markdown

transfer_learning_tips = """
### Transfer Learning Best Practices

#### When to Use Transfer Learning:
- When you have a **small dataset** (hundreds to thousands of examples)
- When your task is **similar** to the pre-training task
- When you need to **reduce training time**
- When you want to achieve **better performance** with limited data

#### Choosing a Pre-trained Model:
1. **Consider the source domain** - If available, choose a model pre-trained on data similar to yours
2. **Model size vs. performance tradeoff** - Larger models may perform better but require more resources
3. **Computational constraints** - Some models may be too large for your hardware
4. **Framework compatibility** - Ensure the model is available in your framework of choice

#### Implementation Strategies:

**1. Feature Extraction** (easier, faster):
- Freeze all pre-trained layers
- Replace and train only the classifier head
- Best when new dataset is small and similar to original dataset

**2. Fine-Tuning** (better results, more complex):
- Start with a pre-trained model
- Unfreeze some or all layers
- Continue training with a very small learning rate
- Best when new dataset is large enough and somewhat different from original

**3. Progressive Fine-Tuning**:
- Start by training only the top layers
- Gradually unfreeze deeper layers
- Use lower learning rates for earlier layers

#### Domain-Specific Considerations:

| Domain | Popular Pre-trained Models | Notes |
|--------|----------------------------|-------|
| **Computer Vision** | ResNet, VGG, EfficientNet | Most models pre-trained on ImageNet |
| **Natural Language** | BERT, GPT, RoBERTa | Pre-trained on large text corpora |
| **Audio** | Wav2Vec, Whisper | Speech recognition, audio classification |
| **Multi-modal** | CLIP, DALL-E | Connecting vision and language |
"""

display(Markdown(transfer_learning_tips))

## Conclusion

This comprehensive notebook summarizes the key learning points from the AI with Python course, covering everything from basic data analysis to advanced machine learning techniques. Here's what we've covered:

### Data Fundamentals
- **Data Analysis** - Understanding your dataset through visualization and statistics
- **Data Normalization** - Transforming features for optimal algorithm performance
- **Handling Missing Values and Categorical Data** - Essential preprocessing techniques

### Machine Learning Algorithms
- **Linear & Logistic Regression** - The foundations of predictive modeling
- **Decision Trees** - Interpretable models for classification and regression
- **Support Vector Machines** - Powerful classifiers for complex datasets
- **K-Means Clustering** - Unsupervised learning for pattern discovery

### Neural Networks
- **Neural Network Fundamentals** - Understanding neurons, weights, and backpropagation
- **Activation Functions** - Adding non-linearity to model complex patterns
- **Loss Functions** - Measuring and optimizing model performance
- **Network Layers** - Building blocks for designing custom architectures
- **Regression & Classification** - Applications to various problem types

### Advanced Techniques
- **Hyperparameter Tuning** - Optimizing model configuration
- **Data Augmentation** - Expanding training datasets artificially
- **Transfer Learning** - Leveraging pre-trained models for new tasks
- **Model Optimization** - Strategies for faster and more efficient models

By mastering these techniques, you can apply AI to a wide range of problems across different domains - from image recognition to natural language processing to time series forecasting. The field continues to evolve rapidly, but these fundamental concepts provide a solid foundation for further exploration and practical applications.

### Next Steps

- **Explore Reinforcement Learning** - Agents that learn through interaction with environments
- **Dive Deeper into Transformers** - The architecture behind modern NLP models
- **Study Generative Models** - GANs, VAEs, and diffusion models for content creation
- **Practice with Real-World Projects** - Apply these techniques to solve meaningful problems
- **Keep Learning** - The field of AI evolves rapidly, stay curious!

### K-means on Real Data

In [None]:
# Example: K-means on the California Housing dataset
from sklearn.decomposition import PCA

# Apply PCA to reduce to 2 dimensions for visualization
pca = PCA(n_components=2)
housing_pca = pca.fit_transform(X_scaled)

# Apply K-means
kmeans = KMeans(n_clusters=3, random_state=42)
housing_clusters = kmeans.fit_predict(X_scaled)

# Visualize clusters in 2D PCA space
plt.figure(figsize=(10, 6))
plt.scatter(housing_pca[:, 0], housing_pca[:, 1], c=housing_clusters, cmap='viridis', alpha=0.7)
plt.title('Housing Data Clustered with K-means (PCA Visualization)')
plt.xlabel('PCA Component 1')
plt.ylabel('PCA Component 2')
plt.show()

## Conclusion

This notebook summarizes the key learning points from the AI with Python course. Here's what we've covered:

1. **Data Analysis** - Exploration, visualization, handling missing values, and encoding categorical data
2. **Linear Regression** - Predicting continuous values with linear models
3. **Logistic Regression** - Binary classification with probability models
4. **Decision Trees** - Interpretable models for classification and regression
5. **Neural Networks for Regression** - Modeling complex relationships for continuous targets
6. **Neural Networks for Classification** - Including CNNs for image data
7. **Support Vector Machines** - Finding optimal decision boundaries
8. **K-Means Clustering** - Unsupervised learning for finding patterns

By mastering these techniques, you can apply AI to a wide range of problems across different domains.

### Next Steps

- Explore reinforcement learning
- Dive deeper into natural language processing
- Learn more about advanced neural network architectures
- Apply these techniques to real-world projects