# The Importance of Understanding Antiretroviral Therapy (ART) for HIV

Antiretroviral Therapy (ART) plays a critical role in managing HIV infection. By suppressing the viral load and boosting the immune system, ART improves the quality of life for patients, helps control drug resistance, and tailors treatment strategies to individual needs. In this way, ART has been instrumental in turning HIV from a life-threatening condition into a manageable chronic illness for millions of people worldwide.

However, optimising ART regimens is a complex challenge. Each patient’s response to treatment can vary depending on numerous factors, such as their baseline viral load, CD4 count, and the combination of medications they receive. This is where data science plays a crucial role: by analysing patient data, we can uncover insights that improve treatment outcomes, drive innovations in therapy design, and support public health policies.

In this assignment, students are expected to harness the power of **machine learning algorithms** to analyse and predict patient outcomes based on their ART regimens and health indicators. By doing so, you'll not only engage with real-world clinical data but also learn how to apply advanced predictive modeling techniques that can ultimately enhance the way we understand and manage HIV.

As we work through this dataset, you'll apply various machine learning algorithms to identify patterns and trends that influence treatment success. By the end of this assignment, you'll have a stronger grasp of how machine learning can be used to make data-driven decisions in clinical settings, particularly in managing chronic diseases like HIV.


## Data Pre-processing Code: Important Instructions

In this assignment, you are provided with a pre-processing script that will prepare the dataset for analysis. This code is essential for ensuring that the data is cleaned and formatted correctly.

You will need to run the code provided below to ensure the data is correctly processed before starting your analysis.

**IMPORTANT: You SHOULD NOT modify the code.** It is crucial that this script remains unchanged, as any modifications could result in errors or inconsistencies in the dataset that could affect your subsequent analysis.

Please follow these instructions carefully:

1. Copy the code as it is.
2. Run the code before you begin your machine learning analysis.
3. Do not attempt to change or adjust the code in any way.

**CAUTION: DO NOT MODIFY, DO NOT MODIFY, DO NOT MODIFY.**


In [1]:
import numpy as np
import pandas as pd
import random
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score

def set_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)

    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)

    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

# Call the seed function at the beginning of your code
set_seed(42)

# Load dataset
All_Data = pd.read_csv("https://figshare.com/ndownloader/files/40584980")

# Drop unnecessary columns
All_Data.drop(['VL (M)', 'CD4 (M)'], axis=1, inplace=True)

# Replace categorical codes with meaningful labels
replace_dict = {
    "Gender": {1: "Male", 2: "Female"},
    "Ethnic": {1: "Asian", 2: "African", 3: "Caucasian", 4: "Other"},
    "Base Drug Combo": {0: "FTC + TDF", 1: "3TC + ABC", 2: "FTC + TAF", 3: "DRV + FTC + TDF", 4: "FTC + RTVB + TDF", 5: "Other"},
    "Comp. INI": {0: "DTG", 1: "RAL", 2: "EVG", 3: "Not Applied"},
    "Comp. NNRTI": {0: "NVP", 1: "EFV", 2: "RPV", 3: "Not Applied"},
    "Extra PI": {0: "DRV", 1: "RTVB", 2: "LPV", 3: "RTV", 4: "ATV", 5: "Not Applied"},
    "Extra pk-En": {0: "False", 1: "True"}
}
All_Data.replace(replace_dict, inplace=True)

# Set drug-related columns to NaN where 'Drug (M)' equals 0
Drug_Combo = ['Base Drug Combo', 'Comp. INI', 'Comp. NNRTI', 'Extra PI', 'Extra pk-En']
All_Data.loc[All_Data["Drug (M)"] == 0, Drug_Combo] = np.nan

# Drop 'Drug (M)' column
All_Data.drop(['Drug (M)'], axis=1, inplace=True)

# Display first few rows
All_Data.head()

Unnamed: 0,VL,CD4,Rel CD4,Gender,Ethnic,Base Drug Combo,Comp. INI,Comp. NNRTI,Extra PI,Extra pk-En,PatientID,Timestep
0,29.944271,793.4583,30.834505,Male,Caucasian,FTC + TDF,DTG,Not Applied,Not Applied,False,0,0
1,29.24198,467.4189,30.35598,Male,Caucasian,FTC + TDF,DTG,Not Applied,Not Applied,False,0,1
2,28.748991,465.12485,30.40532,Male,Caucasian,FTC + TDF,DTG,Not Applied,Not Applied,False,0,2
3,28.101835,692.0069,30.248816,Male,Caucasian,FTC + TDF,DTG,Not Applied,Not Applied,False,0,3
4,28.813837,641.75714,29.944712,Male,Caucasian,FTC + TDF,DTG,Not Applied,Not Applied,False,0,4


## Simplifying Categorical Data

To simplify the dataset and make it easier to work with, we will convert all categorical variables into numeric levels. This step is necessary because many machine learning algorithms require numeric input rather than categorical strings.

We will be using a **LabelEncoder** to transform the following categorical columns:

- Gender
- Ethnic group
- Base Drug Combo
- Companion INI
- Companion NNRTI
- Extra PI
- Extra pk-En

The **LabelEncoder** is used here to convert each unique category in these columns into a numeric value. For example, if the `Gender` column contains values like `['Male', 'Female', 'Other']`, the LabelEncoder might convert them into `[0, 1, 2]`. Each unique string is mapped to a unique integer, allowing us to replace the categorical values with numerical ones.

**Note**: This is *not* one-hot encoding. Each category is represented by a single integer rather than creating multiple binary columns, which helps to simplify the representation of categorical data without expanding the dataset's dimensionality.

This transformation will allow us to proceed with the machine learning algorithms, as they will now be able to process the dataset efficiently.

**IMPORTANT: You SHOULD NOT modify the code.** It is crucial that this script remains unchanged, as any modifications could result in errors or inconsistencies in the dataset that could affect your subsequent analysis.

Please follow these instructions carefully:

1. Copy the code as it is.
2. Run the code before you begin your machine learning analysis.
3. Do not attempt to change or adjust the code in any way.

**CAUTION: DO NOT MODIFY, DO NOT MODIFY, DO NOT MODIFY.**


In [2]:
set_seed(42)

import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Convert all categorical levels back into numeric levels using LabelEncoder
label_cols = ['Gender', 'Ethnic', 'Base Drug Combo', 'Comp. INI', 'Comp. NNRTI', 'Extra PI', 'Extra pk-En']
le = LabelEncoder()

# Apply LabelEncoder to each categorical column
for col in label_cols:
    All_Data[col] = le.fit_transform(All_Data[col].astype(str))

# Display first few rows
All_Data.head()

Unnamed: 0,VL,CD4,Rel CD4,Gender,Ethnic,Base Drug Combo,Comp. INI,Comp. NNRTI,Extra PI,Extra pk-En,PatientID,Timestep
0,29.944271,793.4583,30.834505,1,2,4,0,2,3,0,0,0
1,29.24198,467.4189,30.35598,1,2,4,0,2,3,0,0,1
2,28.748991,465.12485,30.40532,1,2,4,0,2,3,0,0,2
3,28.101835,692.0069,30.248816,1,2,4,0,2,3,0,0,3
4,28.813837,641.75714,29.944712,1,2,4,0,2,3,0,0,4


# Assignment: Code Reading Comprehension

## Predicting Patient's Future State for Illness Condition Management

In managing chronic illnesses like HIV, predicting a patient's future condition is essential for optimising treatment strategies and ensuring timely intervention. For **Antiretroviral Therapy (ART)** in HIV management, a critical question is whether, by analysing a patient's early information (data from the first 3 timesteps), we can accurately predict their health status at a future point (i.e., at the 59th timestep).

In this assignment, we aim to predict whether the viral load (VL) at the 59th timestep is below 200 using the early information of a patient. Viral load ≤ 200 is a critical clinical indicator of **viral suppression**, a key goal of ART in HIV treatment. Knowing whether viral load will be suppressed in the future helps inform treatment adjustments and patient care.

To achieve this, we will use a **3-layer Multi-Layer Perceptron (MLP)**. The input to the MLP will include all features across the initial 3 timesteps (0, 1, 2), and the target will be predicting whether the viral load at the 59th timestep is less than or equal to 200.

## Instructions

For this question, we do **not** expect you to code everything from scratch. Instead, we have provided a **working example** of the required steps. Your task is to read and understand the code.

Whenever you see the following:

```python
#-----------------------------------------
# [Your answer here:]
# (WRITE HERE)
#-----------------------------------------
```
We expect you to write your understanding or explanation in the space marked as **(WRITE HERE)**. Make sure to comment your answers with a `#` to ensure the code still runs correctly.

Your comprehension will be tested on the following topics:

1. **Data Preprocessing**: Understanding how the input data is prepared, reshaped, and standardised.
2. **MLP Backbone Network Definition**: Comprehending how the MLP layers are defined and what their roles are.
3. **Training the Model**: Recognising the steps in the training loop, including forward pass, loss calculation, and backpropagation.

## Why This is Important

Predicting future viral load is crucial in managing HIV. Clinicians rely on viral suppression (VL ≤ 200) to gauge the effectiveness of ART. Being able to predict this based on early patient information helps adjust treatments proactively, improving patient outcomes.

Now, proceed to the provided code example and fill in the explanation wherever prompted. Understanding each step in this process will enhance your ability to read, understand, and eventually build complex models for clinical data prediction tasks.

Good luck!


## Assignment Marks Breakdown (Total: 10 Marks)

This question is worth **10 marks** and is divided into the following sections:

1. **Data Preprocessing**: Understanding how the input data is prepared, reshaped, and standardised.
   - There are **6 specific places** where students are expected to comment on the code.
   - Each valid and correct comment is awarded **1 mark**, resulting in a total of **6 marks** for this section.

2. **MLP Backbone Network Definition**: Comprehending how the MLP layers are defined and what their roles are.
   - There are **2 specific places** where students are expected to comment on the code.
   - Each valid and correct comment is awarded **1 mark**, resulting in a total of **2 marks** for this section.

3. **Training the Model**: Recognising the steps in the training loop, including forward pass, loss calculation, and backpropagation.
   - There are **2 specific places** where students are expected to comment on the code.
   - Each valid and correct comment is awarded **1 mark**, resulting in a total of **2 marks** for this section.

Thus, the total marks breakdown is as follows:  
**6 marks** for Data Preprocessing + **2 marks** for MLP Backbone Network Definition + **2 marks** for Training the Model = **10 Marks** in total.

### Important Notice:

It is expected that some of the required comments may not be immediately straightforward or directly obvious to the students. However, it should be noted that most of the answers can either be determined by **searching for relevant information online** or through **practical test-and-trial** methods to understand the underlying meaning behind the code.

Students are **strongly encouraged** to take their time in reviewing the code and applying appropriate methods to arrive at the correct explanations. This assignment has been designed not only to test comprehension but also to develop problem-solving skills.

This is a **strict requirement** and no partial marks will be awarded unless the comments accurately reflect an understanding of the relevant section of the code.


In [3]:
# -----------------------------------------
# Setting a random seed (set_seed(42)) ensures reproducibility (that any random operations produce the same results each time the code is run)
# Libraries provide tools for data handling (NumPy, Pandas)
#                                      data preprocessing (LabelEncoder, StandardScaler)
#                                      machine learning/model building (PyTorch)
# -----------------------------------------
set_seed(42)

import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
import torch.optim as optim

# -----------------------------------------
# Selects rows where 'Timestep' is 0, 1, or 2 means only the earliest timesteps are included.
# Using .pivot() reshapes the data, setting PatientID as the index and each timestep as a separate column for each feature
# Each patient has a single row with columns for each feature across timesteps
# Renaming the columns with suffixes like _T0, _T1, etc., ensures each feature-timestep pair is uniquely identifiable, simplifying later analysis
# Applying .reset_index() method to data_reshaped to convert PatientID back from an index to a regular column, giving the DataFrame a new integer index
# -----------------------------------------
training_data = All_Data[All_Data['Timestep'].isin([0, 1, 2])]
data_reshaped = training_data.pivot(index='PatientID', columns='Timestep')
data_reshaped.columns = [f'{col[0]}_T{col[1]}' for col in data_reshaped.columns]
data_reshaped = data_reshaped.reset_index()

# -----------------------------------------
# Selects only the rows with 'Timestep' is 59 as the final point/step for each patient
# Creat a new column 'VL_below_200' to indicate if the viral load (VL) is below 200, with 1 representing below 200 and 0 for above
# This binary target variable allows classification (modeling)
# ------------------------------------------
target_data = All_Data[All_Data['Timestep'] == 59].reset_index(drop=True)
target_data['VL_below_200'] = (target_data['VL'] <= 200).astype(int)

# -----------------------------------------
# 'X_data' is created by dropping the PatientID column from data_reshaped, leaving only the features for the model
# 'y_data' is created contains the target variable ('VL_below_200') for classification
# This separation ensures X_data is ready as input, while y_data serves as output during model training
# -----------------------------------------
X_data = data_reshaped.drop(columns=['PatientID'])
y_data = target_data['VL_below_200']

# -----------------------------------------
# StandardScaler() removing the mean and scaling to unit variance to standardizes the features in 'X_data'
# This step ensures that features have similar scales which help to improve model convergence and performance
# .fit_transform() on scaler performs two operations:
#   Firstly, computes the mean and standard deviation of each feature in X_data (fitting)
#   Secondly, uses these statistics to standardize X_data by scaling each feature to have a mean of 0 and a standard deviation of 1 (transforming)
# -----------------------------------------
scaler = StandardScaler()
X_data = scaler.fit_transform(X_data)

# -----------------------------------------
# train_test_split() splits X_data and y_data into training (80%) and testing (20%) sets
# random_state=42 achieves reproducibility
# stratify=y_data maintains class distribution in both sets
# This step ensures balanced representation of target classes in training and testing, which aids generalization
# -----------------------------------------
X_train, X_test, y_train, y_test = train_test_split(
    X_data, y_data, test_size=0.2, random_state=42, stratify=y_data
)

# -----------------------------------------
# Converting X_train, y_train, X_test, and y_test to PyTorch tensors for following model training in PyTorch
# .view(-1, 1) reshapes y_train and y_test to ensure they have the correct dimensions for the model (target values is column vectors)
# -----------------------------------------
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train.values, dtype=torch.float32).view(-1, 1)

X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test.values, dtype=torch.float32).view(-1, 1)


In [6]:
# Check if Timestep == 59 is the final timestep for each patient
final_timesteps = All_Data.groupby('PatientID')['Timestep'].max()
if (final_timesteps == 59).all():
    print("Timestep 59 is the final timestep for all patients.")
else:
    print("Not all patients have Timestep 59 as the final timestep.")
    print(final_timesteps[final_timesteps != 59])  # Display which patients differ


Timestep 59 is the final timestep for all patients.


In [4]:
set_seed(42)

class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()

        # -----------------------------------------
        # These lines define the layers of the MLP model
        #   self.layer1 is the input layer, which takes as input the number of features in X_train_tensor and outputs 64 units
        #   self.layer2 is a hidden layer with 64 input units and 32 output units, allowing for complex transformations
        #   self.layer3 is the output layer with 1 output unit, which will produce a single value suitable for binary classification
        # -----------------------------------------
        self.layer1 = nn.Linear(X_train_tensor.shape[1], 64)

        self.layer2 = nn.Linear(64, 32)

        self.layer3 = nn.Linear(32, 1)

    def forward(self, x):
        x = torch.relu(self.layer1(x))

        x = torch.relu(self.layer2(x))

        # -----------------------------------------
        # The sigmoid activation function is applied in self.layer3 to the output layer to produce a probability score between 0 and 1
        # The values  represents the model’s confidence in predicting the positive class
        # When the value close to 1 indicate a high likelihood of the positive class and when the values close to 0 indicate a high likelihood of the negative class
        # -----------------------------------------
        x = torch.sigmoid(self.layer3(x))

        return x


In [5]:
set_seed(42)

model = MLP()

criterion = nn.BCELoss()

optimizer = optim.Adam(model.parameters(), lr=0.001)

num_epochs = 500
for epoch in range(num_epochs):
    model.train()

    outputs = model(X_train_tensor)

    loss = criterion(outputs, y_train_tensor)

    optimizer.zero_grad()

    loss.backward()

    optimizer.step()

    model.eval()

    with torch.no_grad():
        test_outputs = model(X_test_tensor)

        # -----------------------------------------
        # the loss on the test set is calculated using the model's predictions (test_outputs) and the true labels (y_test_tensor)
        # This is done within the torch.no_grad() context to prevent gradient calculations
        # This saves memory and computation since only evaluate the model, not update its weights
        # The result of 'test_loss' can help monitor model performance on unseen data during training
        # -----------------------------------------
        test_loss = criterion(test_outputs, y_test_tensor)

    # -----------------------------------------
    # prints the training and test losses every 50 epochs
    # providing monitoring on the model's progress during training to identify if the model is learning effectively or it might be overfitting
    # when train loss decreases, but test (or validation) loss remains high or starts to increase, indicating the model might be overfitting
    # In this case, both the training loss and test loss decrease steadily throughout the training process, which suggests that the model is learning effectively without overfitting, the model could generalizing well to unseen data
    # -----------------------------------------
    if (epoch+1) % 50 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Train Loss: {loss.item():.4f}, Test Loss: {test_loss.item():.4f}')

print("Training and validation complete!")


Epoch [50/500], Train Loss: 0.1450, Test Loss: 0.1433
Epoch [100/500], Train Loss: 0.1192, Test Loss: 0.1205
Epoch [150/500], Train Loss: 0.1131, Test Loss: 0.1157
Epoch [200/500], Train Loss: 0.1097, Test Loss: 0.1126
Epoch [250/500], Train Loss: 0.1071, Test Loss: 0.1103
Epoch [300/500], Train Loss: 0.1047, Test Loss: 0.1084
Epoch [350/500], Train Loss: 0.1026, Test Loss: 0.1069
Epoch [400/500], Train Loss: 0.1004, Test Loss: 0.1056
Epoch [450/500], Train Loss: 0.0982, Test Loss: 0.1043
Epoch [500/500], Train Loss: 0.0960, Test Loss: 0.1030
Training and validation complete!


### Outcome Interpretation

Our three-layer MLP, utilising a **90% threshold**, achieved an overall accuracy of **93.72%** in predicting whether a patient’s viral load (VL) would remain below 200 at a future time point. The model demonstrated a **Type I error (false positive rate)** of **62.00%**, indicating that **some patients with a VL > 200 were incorrectly classified as having VL ≤ 200**. Conversely, the **Type II error (false negative rate)** was **4.67%**, reflecting the missed predictions for patients with VL ≤ 200. The model yielded an **F1 score** of **0.9672**, underscoring its effectiveness in identifying true positives.

Notably, the high threshold contributed to the model accurately identifying **1,653 true positives**; further optimisation may improve its sensitivity.

---

**IMPORTANT NOTE**:

You **DO NOT** need to modify anything in this section. This analysis is provided only to demonstrate what a complete outcome interpretation after modelling might look like. It is intended to give you an understanding of how a machine learning model’s performance can be assessed in practice.

For your assignment, please do not alter anything below this point.

**DO NOT MODIFY, DO NOT MODIFY, DO NOT MODIFY.**


In [None]:
set_seed(42)

from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score

model.eval()
with torch.no_grad():
    test_outputs = model(X_test_tensor)
    predictions = test_outputs >= 0.9
    predictions = predictions.numpy().astype(int).flatten()
    y_test_np = y_test_tensor.numpy().astype(int).flatten()

accuracy = accuracy_score(y_test_np, predictions)
tn, fp, fn, tp = confusion_matrix(y_test_np, predictions).ravel()

type_1_error = fp / (fp + tn)
type_2_error = fn / (fn + tp)

f1 = f1_score(y_test_np, predictions)

print(f'Accuracy: {accuracy * 100:.2f}%')
print(f'Type I Error (False Positive Rate): {type_1_error:.4f}')
print(f'Type II Error (False Negative Rate): {type_2_error:.4f}')
print(f'F1 Score: {f1:.4f}')

print(f'Confusion Matrix: \nTN: {tn}, FP: {fp}, FN: {fn}, TP: {tp}')

Accuracy: 93.72%
Type I Error (False Positive Rate): 0.6200
Type II Error (False Negative Rate): 0.0467
F1 Score: 0.9672
Confusion Matrix: 
TN: 19, FP: 31, FN: 81, TP: 1653


## Question 2: Transforming an MLP Setup into an RNN Setup (Total: 10 Marks)

This question is worth **10 marks** and must be answered in **exactly 4 paragraphs**. For each paragraph, students are required to address specific aspects of transforming the MLP setup into an RNN (Recurrent Neural Network) setup. Any deviation from the required number of paragraphs will result in **mark deductions**:

- **More than 4 paragraphs**: Students will have marks **deducted** for each additional paragraph.
- **Less than 4 paragraphs**: Students will have marks **deducted** for each missing paragraph.

### Paragraph Requirements and Marks Breakdown:

1. **Paragraph 1 (2 Marks)**:  
   Students must **philosophically explain** why using an RNN could be better than an MLP for this particular task. The explanation should focus on RNN's ability to handle sequential data and how it may capture temporal dependencies that an MLP may not be able to capture. This paragraph must be concise but insightful.

2. **Paragraph 2 (3 Marks)**:  
   Students must explain why the data should be presented as **three dimensions** when using an RNN: **batch size, feature size, and the time dimension**. The answer should discuss the importance of organising data in this manner for RNNs to properly process sequences over time, and how it differs from MLPs that work with fixed-size inputs.

3. **Paragraph 3 (3 Marks)**:  
   Students must provide an explanation of the **memory of an RNN**: how it is **initialised** and how the memory is **updated** during training. This includes describing the hidden state, how it carries information between time steps, and the role it plays in learning temporal patterns in sequential data.

4. **Paragraph 4 (2 Marks)**:  
   Students must explain how the experimental setup would change if **5 time steps** were used instead of 3. They should discuss how the input data shape would change, how the RNN would handle the additional temporal information, and what changes would be necessary in the model architecture to accommodate more time steps.

### Marks Breakdown:

- **Paragraph 1**: 2 Marks
- **Paragraph 2**: 3 Marks
- **Paragraph 3**: 3 Marks
- **Paragraph 4**: 2 Marks

**Total: 10 Marks**

### Important Notice:

This is **not an easy question** to answer. Students are expected to approach this task as if writing different subsections for a top-tier academic conference. This includes balancing **conciseness**, **clarity**, and **soundness of methodology**. Each paragraph should be well-structured and clearly address the specific point of the question.

Failure to adhere to these instructions, including the paragraph limit, will result in penalties.


# WRITE HERE
**Paragraph 1 (2 Marks):**  
RNNs are well-suited for tasks where temporal dependencies play a critical role. Unlike MLPs, which process inputs independently without considering the sequence order, RNNs are designed to retain information from previous time steps, allowing them to learn/capture temporal patterns and relationships/dependencies between consecutive data points. This ability is crucial for medical time-series data (where prior values may influence future outcomes), such as tracking patients' viral load changes/trends over time.

---

**Paragraph 2 (3 Marks):**  
For an RNN to process sequential data effectively, when feeding data into an RNN, the data must be arranged in three dimensions: batch size, time steps, and feature size.  This structure allows the RNN to process multiple sequences simultaneously within each batch, enabling efficient learning. The time dimension allows the RNN to interpret sequences in chronological order. The feature dimension represents the various attributes per timestep (e.g., viral load, CD4 count). Unlike MLPs, which take fixed-size inputs without sequential context, RNNs process each sequence step-by-step, updating the hidden state at each time step to capture evolving information across the sequence. This organization making RNNs preferable in working with time-dependent data, such as predicting future viral load trends based on historical values.

---

**Paragraph 3 (3 Marks):**  
RNNs use a hidden state serves as a form of short-term memory for the sequence to retain and carry information across time steps. At the start of each sequence, the hidden state is initialized (set to zero or a random value). As the model moves through each time step, the hidden state is updated based on the current input and the previous hidden state, effectively “remembering” relevant information from earlier steps. This dynamic/iterative updating allows the RNN to learn/capture temporal dependencies and accumulate information throughout the sequence.

---

**Paragraph 4 (2 Marks):**  
Increasing Time Steps from 3 to 5, the input shape would change to `[batch_size, 5, feature_size]`, means the RNN will now process two additional time steps which will extending the RNN's ability to consider a longer temporal context. This might improve performance on tasks where long-term dependencies are relevant. In addition, this modification may also need adjustments in the model parameters, such as increasing the hidden layer size, to handle the the additional information and added complexity.


---
