# 🎲 Random Walk Benchmark Model

This notebook implements the **Random Walk model** as the baseline for stock price forecasting, described in **Section 3.1** of the paper:

**"The Application and Effectiveness of Machine Learning and Deep Learning Methods in Analyzing and Predicting the Shanghai Stock Index"**

---

## 🔧 Key Steps:
- Simulates random walk predictions for both training and test datasets
- Computes standard evaluation metrics: **RMSE**, **MAE**, **MAPE**, and **R²**
- Provides a naive benchmark to compare with advanced ML/DL models

---

## 📊 Paper Context:
This notebook supports the empirical findings reported in **Table 1** of the article and highlights the limitations of the Random Walk assumption in capturing stock market dynamics.


In [1]:
!pip install yfinance
import yfinance as yf




[notice] A new release of pip is available: 24.0 -> 24.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
# Import the necessary libraries
import yfinance as yf
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

In [13]:
# Download the data
start = '2010-01-04'
end = '2020-01-23'
data = yf.download('000001.SS', start, end)

# Reset the index
data = data.reset_index()

# Drop missing values
data = data.dropna()

# Determine the length of the training data (70%)
train_len = int(len(data["Adj Close"]) * 0.7)

# Set the training and test data
train_data = data.iloc[:train_len]
test_data = data.iloc[train_len:]

# Define a function to implement a random walk
def random_walk(last_value):
    return last_value + np.random.normal()

# Create a new dataframe for the random walk
random_walk_df = train_data[['Adj Close']].copy()

# Add a new column for the random walk predictions
random_walk_df['Random Walk Prediction'] = np.nan

# Set the first prediction to be the last value in the training data
random_walk_df.loc[random_walk_df.index[0], 'Random Walk Prediction'] = random_walk_df.loc[random_walk_df.index[0], 'Adj Close']

# Generate the random walk predictions
for i in range(1, len(random_walk_df)):
    random_walk_df.loc[random_walk_df.index[i], 'Random Walk Prediction'] = random_walk(last_value=random_walk_df.loc[random_walk_df.index[i-1], 'Random Walk Prediction'])

# Function to calculate MAPE
def mean_absolute_percentage_error(y_true, y_pred):
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

# Calculate the RMSE of the training data
train_rmse = np.sqrt(mean_squared_error(train_data['Adj Close'][1:], random_walk_df['Random Walk Prediction'][1:]))
print('Train RMSE: ', train_rmse)

# Calculate the MAE of the training data
train_mae = mean_absolute_error(train_data['Adj Close'][1:], random_walk_df['Random Walk Prediction'][1:])
print('Train MAE: ', train_mae)

# Calculate the MAPE of the training data
train_mape = mean_absolute_percentage_error(train_data['Adj Close'][1:].values, random_walk_df['Random Walk Prediction'][1:].values)
print('Train MAPE: ', train_mape)

# Calculate the R^2 of the training data
train_r2 = r2_score(train_data['Adj Close'][1:], random_walk_df['Random Walk Prediction'][1:])
print('Train R^2: ', train_r2)

# Create a new dataframe for the random walk on the test data
random_walk_test_df = test_data[['Adj Close']].copy()

# Add a new column for the random walk predictions
random_walk_test_df['Random Walk Prediction'] = np.nan

# Set the first prediction to be the last value in the training data
random_walk_test_df.loc[random_walk_test_df.index[0], 'Random Walk Prediction'] = train_data.loc[train_data.index[-1], 'Adj Close']

# Generate the random walk predictions
for i in range(1, len(random_walk_test_df)):
    random_walk_test_df.loc[random_walk_test_df.index[i], 'Random Walk Prediction'] = random_walk(last_value=random_walk_test_df.loc[random_walk_test_df.index[i-1], 'Random Walk Prediction'])

# Calculate the RMSE of the test data
test_rmse = np.sqrt(mean_squared_error(test_data['Adj Close'][1:], random_walk_test_df['Random Walk Prediction'][1:]))
print('Test RMSE: ', test_rmse)

# Calculate the MAE of the test data
test_mae = mean_absolute_error(test_data['Adj Close'][1:], random_walk_test_df['Random Walk Prediction'][1:])
print('Test MAE: ', test_mae)

# Calculate the MAPE of the test data
test_mape = mean_absolute_percentage_error(test_data['Adj Close'][1:].values, random_walk_test_df['Random Walk Prediction'][1:].values)
print('Test MAPE: ', test_mape)

# Calculate the R^2 of the test data
test_r2 = r2_score(test_data['Adj Close'][1:], random_walk_test_df['Random Walk Prediction'][1:])
print('Test R^2: ', test_r2)


[*********************100%%**********************]  1 of 1 completed
Train RMSE:  780.8576476696128
Train MAE:  668.155967053741
Train MAPE:  27.699813632178998
Train R^2:  -0.7936127387422294
Test RMSE:  285.4437211155681
Test MAE:  233.46217305434774
Test MAPE:  8.08556034713153
Test R^2:  -0.29339460335349554
