# Stock Price Prediction Using Linear Regression

This notebook demonstrates how stock closing prices are predicted using a linear regression model. The workflow includes data loading, feature engineering, data cleaning, model training, and evaluation.

## Features Used

- **7DMA**: 7-day moving average of the closing price
- **30DMA**: 30-day moving average of the closing price
- **RSI**: Relative Strength Index (14-day)

## Functions

- `dt(loc)`: The CSV dataset is loaded from the given path.
- `fillin(loc)`: Missing `Close` values are filled using forward fill.
- `markers(loc)`: 7DMA, 30DMA, Return, and RSI are added to the dataset.
- `ins(loc)`: Feature columns are prepared for model input.
- `outs(loc)`: The target column is prepared for model output.
- `linreg(loc)`: The model is trained, evaluated, and performance metrics are printed.

Required libraries are imported for data manipulation, model training, and evaluation.

In [None]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from pathlib import Path
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_absolute_error
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

Data loading and preprocessing functions are defined. The dataset is loaded, missing values are handled, and technical indicators are computed.

In [None]:
def dt(loc):
    # The CSV file is loaded with date parsing enabled.
    stock = pd.read_csv(loc, parse_dates=True)
    return stock

def fillin(loc):
    # Missing 'Close' values are interpolated and backfilled.
    stock = dt(loc)
    g = stock['Close'].isna().cumsum().where(stock['Close'].isna())
    stock['Close'] = stock['Close'].where(g.map(g.value_counts()) > 3, stock['Close'].interpolate('linear'))
    stock['Close'] = stock['Close'].bfill()
    return stock

def markers(loc):
    # Moving averages, percentage return, and RSI are computed and added.
    stock = fillin(loc)
    stock['7DMA'] = stock['Close'].shift(1).rolling(window=7, min_periods=0).mean()
    stock['30DMA'] = stock['Close'].shift(1).rolling(window=30, min_periods=0).mean()
    stock['Return'] = stock['Close'].shift(1).pct_change() * 100

    change = stock['Close'].diff()
    avgain = change.where(change > 0, 0).shift(1).rolling(window=14, min_periods=1).sum() / 14
    avloss = abs(change.where(change < 0, 0).shift(1).rolling(window=14, min_periods=1).sum()) / 14
    stock['RSI'] = (avgain / (avgain + avloss)) * 100

    return stock

Data cleaning is performed to handle any remaining missing values in the engineered features.

In [None]:
def clean(loc):
    # Missing values in 7DMA, 30DMA, and RSI are interpolated and backfilled.
    stock = markers(loc)
    g = stock['7DMA'].isna().cumsum().where(stock['7DMA'].isna())
    stock['7DMA'] = stock['7DMA'].where(g.map(g.value_counts()) > 3, stock['7DMA'].interpolate('linear'))
    stock['7DMA'] = stock['7DMA'].bfill()
    g = stock['30DMA'].isna().cumsum().where(stock['30DMA'].isna())
    stock['30DMA'] = stock['30DMA'].where(g.map(g.value_counts()) > 3, stock['30DMA'].interpolate('linear'))
    stock['30DMA'] = stock['30DMA'].bfill()
    g = stock['RSI'].isna().cumsum().where(stock['RSI'].isna())
    stock['RSI'] = stock['RSI'].where(g.map(g.value_counts()) > 3, stock['RSI'].interpolate('linear'))
    stock['RSI'] = stock['RSI'].bfill()
    return stock

Feature and target extraction functions are defined. Features and targets are extracted for model training and evaluation.

In [None]:
def ins(loc):
    # Features are extracted from computed indicators.
    stock = clean(loc)
    inputs = pd.DataFrame()
    inputs['7DMA'] = stock['7DMA']
    inputs['30DMA'] = stock['30DMA']
    inputs['RSI'] = stock['RSI']
    return inputs

def outs(loc):
    # The closing price is extracted as the target.
    stock = clean(loc)
    outputs = pd.DataFrame()
    outputs['Close'] = stock['Close']
    return outputs

The linear regression workflow is defined. Data is split, the model is trained, predictions are made, and performance metrics are reported.

In [None]:
def linreg(loc):
    # Data is split, the model is trained, and metrics are reported.
    inputs = ins(loc)
    outputs = outs(loc)
    intrain, intest, outtrain, outtest = train_test_split(inputs, outputs, test_size=0.2, shuffle=False)
    lr = LinearRegression()
    lr.fit(intrain, outtrain)
    pred = lr.predict(intest)
    print(f"Coefficients: {lr.coef_}")
    print(f"Intercept: {lr.intercept_}")
    print("Mean Absolute Error:", mean_absolute_error(outtest, pred))
    print("R^2 Score:", r2_score(outtest, pred))

The linear regression model is executed on the dataset. The path to the CSV file is specified, and model performance is displayed.

In [None]:
# Replace this path with your own
print(linreg(Path.home() / 'Downloads' / 'Amazon' / 'amzn.us.csv'))