
# Case Study: Stock Price Prediction using Polynomial Regression

This notebook demonstrates the process of simulating stock price data and predicting future stock prices using polynomial regression. We will walk through the following steps:

1. **Data Generation and Preprocessing**: Simulate stock prices using random daily returns, handle missing values, and normalize the data.
2. **Feature Transformation**: Apply polynomial feature transformation to model non-linear relationships.
3. **Model Fitting**: Train a polynomial regression model and visualize its performance.
4. **Model Evaluation**: Evaluate the model's performance using metrics like RMSE and R².
5. **Final Model and Prediction**: Use the trained model to predict future stock prices.

Let's get started!


In [None]:
# Import Required Packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from math import sqrt
from sklearn.preprocessing import StandardScaler


## Step 1 - Data Generation and Preprocessing

In this step, we will simulate stock prices by generating random daily returns based on a normal distribution. We will then preprocess the data by handling missing values and normalizing the features.


**Data Generation**

In [None]:

# Initial stock price and volatility
initial_price = 100  # Starting stock price
volatility = 1  # Standard deviation of daily returns
n_days = 500  # Number of days to simulate

# Generate random daily returns (normally distributed)
daily_returns = np.random.normal(loc=0, scale=volatility, size=n_days)

# Simulate stock price as a random walk
stock_prices = initial_price + np.cumsum(daily_returns)  # Cumulative sum to get price path

# Create the DataFrame with the generated stock price data
df = pd.DataFrame({
    'Date': pd.date_range(start="2020-01-01", periods=n_days, freq='D'),  # 500 days
    'Stock Price': stock_prices
})

**Data Preprocessing**

In [None]:
# Handle missing values (if any)
df['Stock Price'].interpolate(inplace=True)

# Feature engineering: Add moving average as an additional feature
df['Moving Average'] = df['Stock Price'].rolling(window=5).mean()

# Normalize features
scaler = StandardScaler()
df['Date'] = (df['Date'] - df['Date'].min()).dt.days  # Convert date to days
df[['Date', 'Stock Price']] = scaler.fit_transform(df[['Date', 'Stock Price']])

# Visualize the Generated Data

In [None]:
# Visualize the stock price data
plt.figure(figsize=(10, 5))
plt.plot(df['Date'], df['Stock Price'], label='Stock Price')
plt.xlabel('Days')
plt.ylabel('Normalized Stock Price')
plt.title('Simulated Stock Prices over Time')
plt.legend()
plt.show()


## Step 2 - Feature Transformation

Here, we will experiment with polynomial transformations of the feature (Date) to capture non-linear relationships between the stock price and time.


In [None]:

# Polynomial Degree: Experiment with degree 2, 3, or 4
poly_degree = 3  # Change this to 2 or 4 to experiment with different complexity
poly = PolynomialFeatures(degree=poly_degree)

# Prepare data for polynomial regression
X = df[['Date']].values
y = df['Stock Price'].values

# Transform the feature into polynomial terms (degree 3 for example)
X_poly = poly.fit_transform(X)

# Visualize the transformed features

In [None]:
# Visualize the transformed features
plt.figure(figsize=(12, 6))

# Original feature (Date) vs Stock Price
plt.subplot(1, 2, 1)
plt.scatter(X, y, color='blue', label='Original data')
plt.xlabel('Date (Transformed)')
plt.ylabel('Stock Price')
plt.title('Original Feature vs Stock Price')
plt.legend()

# Transformed features (Degree 3): Date, Date^2, Date^3 vs Stock Price
X_poly_transformed = poly.fit_transform(X)
plt.subplot(1, 2, 2)
plt.plot(X, X_poly_transformed[:, 1], color='orange', label='$x$ (Date)')
plt.plot(X, X_poly_transformed[:, 2], color='green', label='$x^2$ (Date^2)')
plt.plot(X, X_poly_transformed[:, 3], color='red', label='$x^3$ (Date^3)')
plt.xlabel('Date')
plt.ylabel('Transformed Features')
plt.title(f'Transformed Features (Polynomial Degree {poly_degree})')
plt.legend()

plt.tight_layout()
plt.show()


## Step 3 - Model Fitting

Now, we will split the data into training and testing sets, fit a polynomial regression model, and visualize the predicted stock prices.


In [None]:

# Splitting the Data (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X_poly, y, test_size=0.2, random_state=42)

# Fit the polynomial regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict the stock prices using the fitted model
y_pred = model.predict(X_poly)

plt.scatter(X, y, color='blue', label='Actual Data')
plt.plot(X, y_pred, color='red', label=f'Polynomial Fit (Degree {poly_degree})')
plt.xlabel('Date')
plt.ylabel('Stock Price')
plt.title(f'Polynomial Regression for Stock Price Prediction (Degree {poly_degree})')
plt.legend()
plt.show()



## Step 4 - Final Model Evaluation and Prediction



In [None]:
from sklearn.metrics import mean_squared_error, r2_score

# Predict the stock prices on the training and test sets
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)

rmse_train = sqrt(mean_squared_error(y_train, y_train_pred))  # Root Mean Squared Error for training data
rmse_test = sqrt(mean_squared_error(y_test, y_test_pred))  # Root Mean Squared Error for testing data
r2_train = r2_score(y_train, y_train_pred)  # R-squared for training data
r2_test = r2_score(y_test, y_test_pred)  # R-squared for testing data

print(f'Root Mean Squared Error (RMSE) for Training Data: {rmse_train}')
print(f'Root Mean Squared Error (RMSE) for Testing Data: {rmse_test}')
print(f'R-squared (R²) for Training Data: {r2_train}')
print(f'R-squared (R²) for Testing Data: {r2_test}')