
# Comprehensive Machine Learning Notebook (20,000 Words)

This notebook covers machine learning techniques and algorithms, with detailed explanations, examples, and practical applications.

### Table of Contents:
1. **Introduction to Machine Learning**
2. **Supervised Learning**
   - Classification
   - Regression
3. **Unsupervised Learning**
   - Clustering
   - Dimensionality Reduction
4. **Semi-Supervised Learning**
5. **Reinforcement Learning**
6. **Anomaly Detection**
7. **Case Studies and Real-World Applications**
8. **Future Directions**

The content will be progressively added, covering each topic in depth.

Stay tuned for updates!



# 1. Introduction to Machine Learning

## What is Machine Learning?

Machine Learning (ML) is a branch of artificial intelligence (AI) that focuses on building systems that learn from and make decisions based on data. The core idea behind ML is that instead of explicitly programming the system to perform a task, the system can learn patterns and relationships from examples or experience, improving its performance over time.

## Why is Machine Learning Important?

ML is essential in modern data-driven technologies because it allows systems to adapt, improve, and provide insights without requiring extensive human intervention. Machine learning powers technologies like recommendation systems (e.g., Netflix, Amazon), autonomous vehicles, and personal assistants (e.g., Siri, Google Assistant). 

## Types of Machine Learning

1. **Supervised Learning**: 
   - The model learns from labeled data (input-output pairs). The goal is to predict the output for unseen inputs based on the learned patterns.
   - **Examples**: Classification (identifying if an email is spam or not), Regression (predicting house prices).

2. **Unsupervised Learning**: 
   - The model works with unlabeled data and tries to find patterns or groupings in the data.
   - **Examples**: Clustering (grouping customers by purchasing behavior), Dimensionality Reduction (compressing data).

3. **Reinforcement Learning**: 
   - The model interacts with an environment and learns from feedback (rewards and penalties) to optimize its actions.
   - **Examples**: Game AI, Robot Navigation.

4. **Semi-Supervised Learning**: 
   - This combines a small amount of labeled data with a large amount of unlabeled data, allowing the model to benefit from both.
   - **Examples**: Face Recognition, Website Classification.

5. **Anomaly Detection**:
   - The task is to identify rare or unusual patterns in the data, which might indicate fraud or errors.
   - **Examples**: Credit Card Fraud Detection, Cyber Intrusion.

## Machine Learning Workflow

1. **Problem Definition**: Identify the problem to be solved and gather requirements.
2. **Data Collection**: Collect relevant data needed for the problem.
3. **Data Preprocessing**: Clean, normalize, and prepare data for modeling.
4. **Modeling**: Choose a machine learning algorithm and train a model on the data.
5. **Evaluation**: Test the model's performance on unseen data.
6. **Deployment**: Use the model in a production environment to make predictions.
7. **Monitoring and Maintenance**: Continuously monitor the model's performance and update as needed.

In the next sections, we'll dive into the key types of machine learning in detail.



# 2. Supervised Learning

Supervised learning is a type of machine learning where the model is trained on labeled data, meaning that each input has a corresponding output. The goal is for the model to learn the mapping from inputs to outputs so that it can make predictions on new, unseen data.

Supervised learning is broadly classified into two categories:
1. **Classification**: The goal is to predict a discrete label or category.
2. **Regression**: The goal is to predict a continuous value.

## 2.1 Classification

Classification is the process of predicting the class or category of a given input based on the learned relationships from the training data.

### Key Algorithms for Classification:

1. **Logistic Regression**: 
   - Despite its name, logistic regression is a classification algorithm. It uses the logistic function to output probabilities that can be used to classify inputs.
   
2. **K-Nearest Neighbors (KNN)**: 
   - KNN is a non-parametric, instance-based learning algorithm. It classifies a data point based on the majority class among its nearest neighbors.
   
3. **Support Vector Machines (SVM)**: 
   - SVM is a powerful classifier that works by finding the hyperplane that best separates the data into different classes.
   
4. **Decision Trees**: 
   - Decision trees are simple, interpretable models that recursively split the data into subgroups to predict the class of an input.
   
5. **Random Forests**: 
   - Random forests are an ensemble learning technique that combines multiple decision trees to improve classification performance.

6. **Neural Networks**: 
   - Neural networks are powerful models capable of handling complex classification tasks, particularly in the context of deep learning.

### 2.1.1 Logistic Regression

Logistic Regression is used for binary classification tasks (where there are only two possible classes). It estimates the probability that an instance belongs to a particular class using the sigmoid function:

\[ \sigma(z) = rac{1}{1 + e^{-z}} \]

Where \( z = w_0 + w_1x_1 + w_2x_2 + ... + w_nx_n \). The predicted class is determined by the probability threshold (usually 0.5).

#### Example in Python:

```python
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Sample data
X = [[1, 2], [2, 3], [3, 4], [4, 5]]
y = [0, 0, 1, 1]  # Labels

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the model
logreg = LogisticRegression()
logreg.fit(X_train, y_train)

# Predict and evaluate
y_pred = logreg.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
```

### 2.1.2 K-Nearest Neighbors (KNN)

KNN classifies data points based on the proximity of its neighbors in the feature space. It is a lazy learning algorithm, meaning that it doesn't learn a model but stores all the training data, making predictions based on the majority vote among the K closest points.

#### Example in Python:

```python
from sklearn.neighbors import KNeighborsClassifier

# Initialize the KNN classifier with K=3
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Predict and evaluate
y_pred_knn = knn.predict(X_test)
accuracy_knn = accuracy_score(y_test, y_pred_knn)
print(f"KNN Accuracy: {accuracy_knn * 100:.2f}%")
```

### 2.1.3 Support Vector Machine (SVM)

SVM works by finding a hyperplane in a high-dimensional space that maximally separates the data points into different classes. For two classes, it tries to maximize the margin between them.

#### Example in Python:

```python
from sklearn.svm import SVC

# Initialize the SVM classifier
svm = SVC(kernel='linear')
svm.fit(X_train, y_train)

# Predict and evaluate
y_pred_svm = svm.predict(X_test)
accuracy_svm = accuracy_score(y_test, y_pred_svm)
print(f"SVM Accuracy: {accuracy_svm * 100:.2f}%")
```

### 2.1.4 Decision Trees

Decision Trees recursively split the data into subgroups based on feature values to predict a target class. The splits are based on the feature that provides the highest information gain.

#### Example in Python:

```python
from sklearn.tree import DecisionTreeClassifier

# Initialize the Decision Tree classifier
dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)

# Predict and evaluate
y_pred_dt = dt.predict(X_test)
accuracy_dt = accuracy_score(y_test, y_pred_dt)
print(f"Decision Tree Accuracy: {accuracy_dt * 100:.2f}%")
```

### 2.1.5 Random Forests

Random Forest is an ensemble learning technique that combines multiple decision trees, each trained on a different subset of the data, to improve the accuracy and robustness of the model.

#### Example in Python:

```python
from sklearn.ensemble import RandomForestClassifier

# Initialize the Random Forest classifier
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

# Predict and evaluate
y_pred_rf = rf.predict(X_test)
accuracy_rf = accuracy_score(y_test, y_pred_rf)
print(f"Random Forest Accuracy: {accuracy_rf * 100:.2f}%")
```

In the next section, we will cover **Regression** techniques in supervised learning.



## 2.2 Regression

In regression, the task is to predict a continuous output variable based on one or more input variables. Unlike classification, where the output is a discrete label, regression predicts real-valued outcomes.

### Key Algorithms for Regression:

1. **Linear Regression**:
   - A simple yet powerful method that models the relationship between input features and the target variable using a linear equation.
   
2. **Polynomial Regression**:
   - Extends linear regression by introducing polynomial terms to capture non-linear relationships.
   
3. **Ridge and Lasso Regression**:
   - These are regularized versions of linear regression that prevent overfitting by penalizing large coefficients.
   
4. **Support Vector Regression (SVR)**:
   - Similar to support vector machines for classification, SVR finds the best-fitting line within a margin of tolerance.

5. **Decision Tree Regression**:
   - Similar to decision tree classification, decision tree regression splits the data into smaller regions to make continuous predictions.

6. **Random Forest Regression**:
   - An ensemble of decision trees that aggregates the predictions of multiple trees to improve accuracy and reduce variance.

### 2.2.1 Linear Regression

Linear Regression is the simplest form of regression where the relationship between the independent variable(s) and the dependent variable is modeled as a straight line. The goal is to find the line (or hyperplane in higher dimensions) that best fits the data.

The equation for a linear regression model is:
\[ y = w_0 + w_1x_1 + w_2x_2 + ... + w_nx_n + \epsilon \]

Where:
- \( y \) is the predicted value,
- \( w_0, w_1, ..., w_n \) are the model parameters (coefficients),
- \( x_1, x_2, ..., x_n \) are the input features, and
- \( \epsilon \) is the error term.

#### Example in Python:

```python
from sklearn.linear_model import LinearRegression
import numpy as np

# Sample data (X: input, y: target)
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])

# Initialize and train the model
lr = LinearRegression()
lr.fit(X, y)

# Predict on new data
y_pred = lr.predict(X)
print(f"Predicted values: {y_pred}")
```

### 2.2.2 Polynomial Regression

Polynomial Regression captures non-linear relationships by adding polynomial terms to the features. The model remains linear in the parameters but can fit more complex relationships.

The equation for polynomial regression is:
\[ y = w_0 + w_1x + w_2x^2 + ... + w_nx^n + \epsilon \]

#### Example in Python:

```python
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 6])

# Transform the input to polynomial features
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

# Initialize and train the polynomial regression model
poly_reg = LinearRegression()
poly_reg.fit(X_poly, y)

# Predict on new data
y_pred_poly = poly_reg.predict(X_poly)
print(f"Polynomial Regression Predictions: {y_pred_poly}")
```

### 2.2.3 Ridge and Lasso Regression

Ridge and Lasso are regularization techniques that prevent overfitting by adding a penalty term to the linear regression cost function.

- **Ridge Regression**: Adds an L2 penalty (squared magnitude of coefficients).
- **Lasso Regression**: Adds an L1 penalty (absolute value of coefficients), which can result in some coefficients being exactly zero (i.e., feature selection).

#### Example in Python:

```python
from sklearn.linear_model import Ridge, Lasso

# Initialize Ridge and Lasso models
ridge = Ridge(alpha=1.0)
lasso = Lasso(alpha=0.1)

# Train the models
ridge.fit(X, y)
lasso.fit(X, y)

# Predict on new data
y_pred_ridge = ridge.predict(X)
y_pred_lasso = lasso.predict(X)

print(f"Ridge Predictions: {y_pred_ridge}")
print(f"Lasso Predictions: {y_pred_lasso}")
```

### 2.2.4 Support Vector Regression (SVR)

Support Vector Regression (SVR) is a regression technique based on the principles of support vector machines (SVM). SVR aims to fit the best line within a given margin, allowing some error but trying to keep it within a defined tolerance level.

#### Example in Python:

```python
from sklearn.svm import SVR

# Initialize the SVR model
svr = SVR(kernel='linear')

# Train the model
svr.fit(X, y)

# Predict on new data
y_pred_svr = svr.predict(X)
print(f"SVR Predictions: {y_pred_svr}")
```

### 2.2.5 Decision Tree Regression

Decision Tree Regression splits the data into smaller and smaller subsets based on feature values, and then predicts the target value by averaging the values in each region.

#### Example in Python:

```python
from sklearn.tree import DecisionTreeRegressor

# Initialize and train the Decision Tree model
dt_reg = DecisionTreeRegressor()
dt_reg.fit(X, y)

# Predict on new data
y_pred_dt_reg = dt_reg.predict(X)
print(f"Decision Tree Regression Predictions: {y_pred_dt_reg}")
```

### 2.2.6 Random Forest Regression

Random Forest Regression is an ensemble learning method that combines the predictions of multiple decision trees to improve accuracy and reduce variance.

#### Example in Python:

```python
from sklearn.ensemble import RandomForestRegressor

# Initialize and train the Random Forest model
rf_reg = RandomForestRegressor(n_estimators=100, random_state=42)
rf_reg.fit(X, y)

# Predict on new data
y_pred_rf_reg = rf_reg.predict(X)
print(f"Random Forest Regression Predictions: {y_pred_rf_reg}")
```

In the next section, we will explore **Unsupervised Learning**.



# 3. Unsupervised Learning

Unsupervised learning is a type of machine learning where the model is trained on data without labeled outputs. Instead of learning from labeled examples, the algorithm tries to uncover hidden patterns or structures in the data.

### Key Techniques in Unsupervised Learning:
1. **Clustering**: Grouping similar data points together.
2. **Dimensionality Reduction**: Reducing the number of input variables or features while retaining the essential information.

## 3.1 Clustering

Clustering is the task of dividing a dataset into groups, or clusters, where data points within the same cluster are more similar to each other than to those in other clusters. It is one of the most common tasks in unsupervised learning.

### Key Algorithms for Clustering:
1. **K-Means**: A simple and widely used clustering algorithm that partitions the data into K clusters.
2. **Hierarchical Clustering**: Builds a hierarchy of clusters using a bottom-up or top-down approach.
3. **DBSCAN**: Density-Based Spatial Clustering of Applications with Noise, a robust method for finding clusters of arbitrary shape.

### 3.1.1 K-Means Clustering

K-Means is an iterative algorithm that partitions the dataset into K clusters. It starts by randomly initializing K cluster centroids and assigns data points to the nearest centroid. The centroids are updated iteratively until convergence.

#### Example in Python:

```python
from sklearn.cluster import KMeans
import numpy as np

# Sample data
X = np.array([[1, 2], [2, 3], [3, 4], [8, 9], [9, 10], [10, 11]])

# Initialize K-Means with K=2
kmeans = KMeans(n_clusters=2, random_state=42)
kmeans.fit(X)

# Cluster assignments and centroids
clusters = kmeans.labels_
centroids = kmeans.cluster_centers_
print(f"Cluster labels: {clusters}")
print(f"Centroids: {centroids}")
```

### 3.1.2 Hierarchical Clustering

Hierarchical clustering creates a tree-like structure of nested clusters, also known as a dendrogram. There are two approaches:
- **Agglomerative** (bottom-up): Each data point starts as its own cluster, and pairs of clusters are merged iteratively.
- **Divisive** (top-down): All data points start in one cluster, and splits are performed iteratively.

#### Example in Python:

```python
from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pyplot as plt

# Perform hierarchical clustering
Z = linkage(X, 'ward')

# Plot the dendrogram
plt.figure(figsize=(8, 4))
dendrogram(Z)
plt.show()
```

### 3.1.3 DBSCAN (Density-Based Spatial Clustering)

DBSCAN is a clustering algorithm that groups together data points that are closely packed, marking points that lie alone in low-density regions as outliers.

#### Example in Python:

```python
from sklearn.cluster import DBSCAN

# Initialize DBSCAN
dbscan = DBSCAN(eps=1, min_samples=2)
dbscan.fit(X)

# Cluster labels
clusters_dbscan = dbscan.labels_
print(f"DBSCAN Cluster labels: {clusters_dbscan}")
```

## 3.2 Dimensionality Reduction

Dimensionality reduction is the process of reducing the number of input variables or features in a dataset, while preserving as much of the original information as possible. It is commonly used for data visualization and to reduce computational complexity.

### Key Algorithms for Dimensionality Reduction:
1. **Principal Component Analysis (PCA)**: Projects the data onto a lower-dimensional subspace that maximizes the variance.
2. **t-Distributed Stochastic Neighbor Embedding (t-SNE)**: A non-linear technique particularly suited for visualizing high-dimensional datasets.

### 3.2.1 Principal Component Analysis (PCA)

PCA is a linear transformation technique that projects the data onto a lower-dimensional space while retaining as much variance as possible.

#### Example in Python:

```python
from sklearn.decomposition import PCA

# Initialize PCA to reduce to 2 dimensions
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

print(f"PCA-transformed data:
{X_pca}")
```

### 3.2.2 t-SNE

t-SNE is a non-linear dimensionality reduction technique primarily used for data visualization in 2D or 3D spaces.

#### Example in Python:

```python
from sklearn.manifold import TSNE

# Initialize t-SNE
tsne = TSNE(n_components=2)
X_tsne = tsne.fit_transform(X)

print(f"t-SNE-transformed data:
{X_tsne}")
```

In the next section, we will explore **Semi-Supervised Learning**.



# 4. Semi-Supervised Learning

Semi-supervised learning is a hybrid approach that combines a small amount of labeled data with a large amount of unlabeled data. This can be especially useful when labeling data is expensive or time-consuming, but a large amount of unlabeled data is readily available.

In semi-supervised learning, the goal is to leverage the unlabeled data to improve the model's performance compared to using only the labeled data.

### Key Concepts in Semi-Supervised Learning:

1. **Self-Training**: The model is initially trained on the small labeled dataset. It then predicts the labels for the unlabeled data, and those confident predictions are added to the labeled dataset for further training.
2. **Co-Training**: Two models are trained on different subsets of the feature space. Each model makes predictions on the unlabeled data, and the most confident predictions are added to the training set of the other model.
3. **Generative Models**: These models explicitly model the joint probability distribution of the features and labels, allowing them to infer labels for the unlabeled data.
4. **Graph-Based Methods**: These methods use a graph structure to represent the relationships between labeled and unlabeled data points.

### Applications of Semi-Supervised Learning:

- **Text Classification**: Assigning categories to documents with a small labeled dataset and a large unlabeled corpus.
- **Image Classification**: Classifying images when labeling is costly.
- **Speech Recognition**: Learning to transcribe speech with minimal labeled examples.
- **Medical Diagnosis**: Using a small labeled set of patient data to classify diseases.

### 4.1 Self-Training

Self-training is one of the simplest semi-supervised learning techniques. In self-training, a model is trained on the available labeled data, and then it predicts labels for the unlabeled data. The most confident predictions are added to the labeled set for further training.

#### Example in Python:

```python
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_classes=2, random_state=42)

# Split into labeled and unlabeled data
X_labeled, X_unlabeled, y_labeled, _ = train_test_split(X, y, test_size=0.9, random_state=42)

# Train a classifier on the labeled data
clf = RandomForestClassifier()
clf.fit(X_labeled, y_labeled)

# Predict on the unlabeled data
y_unlabeled_pred = clf.predict(X_unlabeled)

# Add the most confident predictions to the labeled dataset (simulating self-training)
X_new_labeled = X_unlabeled[y_unlabeled_pred == 1][:50]  # Selecting confident predictions
y_new_labeled = y_unlabeled_pred[:50]

# Retrain the classifier on the expanded labeled dataset
X_combined = np.vstack((X_labeled, X_new_labeled))
y_combined = np.hstack((y_labeled, y_new_labeled))
clf.fit(X_combined, y_combined)

# Predict on new data and evaluate
y_pred = clf.predict(X)
print(f"Accuracy after self-training: {clf.score(X, y) * 100:.2f}%")
```

### 4.2 Co-Training

In co-training, two different models are trained on separate feature subsets of the same data. The models make predictions on the unlabeled data, and the most confident predictions from one model are used to train the other.

#### Co-Training Example (conceptual):

```python
# This example illustrates the co-training process

# Train two models on different feature subsets (e.g., RandomForest on one subset, SVM on another)
model1.fit(X_labeled[:, :10], y_labeled)
model2.fit(X_labeled[:, 10:], y_labeled)

# Each model predicts on the unlabeled data
y_pred1 = model1.predict(X_unlabeled[:, :10])
y_pred2 = model2.predict(X_unlabeled[:, 10:])

# Add confident predictions from model1 to model2's training set, and vice versa
X_new_labeled1 = X_unlabeled[y_pred1 == 1][:50, :10]  # Model1 confident predictions
X_new_labeled2 = X_unlabeled[y_pred2 == 1][:50, 10:]  # Model2 confident predictions

# Continue co-training process until convergence
```

### 4.3 Applications

Semi-supervised learning has proven highly effective in real-world applications, particularly when labeled data is scarce. Some use cases include:

- **Face Recognition**: Leveraging a small set of labeled images with many unlabeled images.
- **Medical Imaging**: Reducing the need for labeled medical scans by training with unlabeled scans.
- **Targeted Marketing**: Using semi-supervised learning to identify potential customer segments.

In the next section, we will explore **Reinforcement Learning**.



# 5. Reinforcement Learning

Reinforcement learning (RL) is a type of machine learning where an agent interacts with an environment and learns to maximize cumulative rewards over time by taking actions and receiving feedback (rewards or penalties). Unlike supervised learning, where the correct output is given, reinforcement learning relies on feedback signals to guide the agent's learning process.

### Key Concepts in Reinforcement Learning:

1. **Agent**: The learner or decision-maker that interacts with the environment.
2. **Environment**: The setting or system with which the agent interacts.
3. **State**: The current situation of the environment as observed by the agent.
4. **Action**: The set of all possible moves the agent can make.
5. **Reward**: Feedback received from the environment after taking an action (positive or negative).
6. **Policy**: A strategy that defines the agent's actions based on the current state.
7. **Value Function**: The expected cumulative reward of being in a particular state or taking a particular action.
8. **Q-Value**: The expected reward for taking a particular action in a given state.

### Key Algorithms in Reinforcement Learning:

1. **Q-Learning**: A model-free algorithm where the agent learns the value of each action-state pair through trial and error.
2. **Deep Q-Networks (DQN)**: Combines Q-learning with deep neural networks to handle large, high-dimensional environments.
3. **Policy Gradient Methods**: The agent directly learns a policy that maps states to actions using a probabilistic framework.
4. **Actor-Critic Methods**: Combines policy gradients with value-based methods by using two models: an actor (policy) and a critic (value function).

### 5.1 Q-Learning

Q-Learning is a value-based reinforcement learning algorithm where the agent learns the optimal policy by estimating the value of state-action pairs (Q-values). The agent updates its Q-values iteratively using the Bellman equation.

The update rule for Q-learning is:

\[ Q(s, a) \leftarrow Q(s, a) + lpha \left[ r + \gamma \max_{a'} Q(s', a') - Q(s, a) ight] \]

Where:
- \( s \) is the current state,
- \( a \) is the current action,
- \( r \) is the reward received after taking action \( a \),
- \( lpha \) is the learning rate,
- \( \gamma \) is the discount factor (which determines the importance of future rewards),
- \( s' \) is the next state.

#### Example in Python (Q-Learning for GridWorld):

```python
import numpy as np

# Initialize parameters
states = 5  # Number of states
actions = 2  # Number of actions
q_table = np.zeros((states, actions))  # Q-table
alpha = 0.1  # Learning rate
gamma = 0.9  # Discount factor

# Define a simple reward function
rewards = np.array([0, 0, 0, 0, 1])  # Final state gives a reward of 1

# Q-Learning loop (simplified)
for episode in range(1000):
    state = 0  # Start at state 0
    done = False

    while not done:
        action = np.random.choice(actions)  # Random action
        next_state = state + 1 if action == 1 else state  # Move to next state if action is 1
        reward = rewards[next_state]  # Get reward from next state

        # Update Q-values
        q_table[state, action] += alpha * (reward + gamma * np.max(q_table[next_state, :]) - q_table[state, action])

        # Transition to next state
        state = next_state
        if state == 4:  # If final state is reached
            done = True

print(f"Learned Q-table:
{q_table}")
```

### 5.2 Deep Q-Networks (DQN)

Deep Q-Networks (DQN) is an extension of Q-learning that uses a deep neural network to approximate the Q-value function. This is particularly useful in environments with high-dimensional state spaces (e.g., video games). The key components of DQN include:

- **Experience Replay**: Stores the agent's experiences and samples them randomly to break the correlation between consecutive experiences.
- **Target Network**: A separate network that is periodically updated to stabilize learning.

#### Example in Python (Pseudocode for DQN):

```python
import tensorflow as tf
from tensorflow.keras import layers

# Define the neural network architecture
def create_dqn(input_shape, num_actions):
    model = tf.keras.Sequential()
    model.add(layers.Dense(24, activation='relu', input_shape=(input_shape,)))
    model.add(layers.Dense(24, activation='relu'))
    model.add(layers.Dense(num_actions, activation='linear'))
    model.compile(optimizer='adam', loss='mse')
    return model

# Initialize the DQN
dqn = create_dqn(input_shape=4, num_actions=2)

# Train the DQN (pseudocode)
for episode in range(1000):
    state = env.reset()  # Reset environment
    done = False

    while not done:
        # Select action using epsilon-greedy policy
        action = np.random.choice(actions)

        # Take action in the environment
        next_state, reward, done, _ = env.step(action)

        # Store experience in replay memory
        replay_memory.append((state, action, reward, next_state, done))

        # Sample a batch of experiences and update the network
        batch = random.sample(replay_memory, batch_size)
        for state, action, reward, next_state, done in batch:
            target = reward + gamma * np.max(dqn.predict(next_state)) * (1 - done)
            target_f = dqn.predict(state)
            target_f[0][action] = target
            dqn.fit(state, target_f, epochs=1, verbose=0)

        state = next_state
```

### 5.3 Policy Gradient Methods

In policy gradient methods, the agent directly learns a policy that maps states to actions. The policy is often represented as a probability distribution over actions, and the goal is to optimize the policy to maximize the expected reward.

The update rule in policy gradient methods is based on the following equation:

\[ 
abla J(	heta) = \mathbb{E} \left[ 
abla \log \pi_	heta(a|s) Q(s, a) ight] \]

Where:
- \( \pi_	heta(a|s) \) is the policy (the probability of taking action \( a \) in state \( s \)),
- \( Q(s, a) \) is the expected reward for taking action \( a \) in state \( s \).

#### Example in Python (Pseudocode for Policy Gradient):

```python
import tensorflow as tf

# Define a policy network
def create_policy_network(input_shape, num_actions):
    model = tf.keras.Sequential()
    model.add(layers.Dense(24, activation='relu', input_shape=(input_shape,)))
    model.add(layers.Dense(24, activation='relu'))
    model.add(layers.Dense(num_actions, activation='softmax'))  # Softmax output for probabilities
    model.compile(optimizer='adam', loss='categorical_crossentropy')
    return model

# Initialize the policy network
policy_network = create_policy_network(input_shape=4, num_actions=2)

# Train the policy network (pseudocode)
for episode in range(1000):
    state = env.reset()
    done = False

    while not done:
        # Select action using the policy network
        action_probabilities = policy_network.predict(state)
        action = np.random.choice(range(len(action_probabilities)), p=action_probabilities)

        # Take action in the environment
        next_state, reward, done, _ = env.step(action)

        # Calculate the policy gradient and update the network
        with tf.GradientTape() as tape:
            log_probs = tf.math.log(policy_network(state)[action])
            loss = -log_probs * reward  # Minimize negative reward

        gradients = tape.gradient(loss, policy_network.trainable_variables)
        policy_network.optimizer.apply_gradients(zip(gradients, policy_network.trainable_variables))

        state = next_state
```

In the next section, we will explore **Anomaly Detection**.
