# Stock Price Prediction Using Random Forest

Stock closing prices are predicted using a Random Forest regression model. The approach includes handling of missing data, engineering of financial indicators, model training, evaluation, and trading signal generation.

## Features Used

- **7DMA**: 7-day moving average of the closing price
- **30DMA**: 30-day moving average of the closing price
- **RSI**: Relative Strength Index (14-day)

## Functions

- `dt(loc)`: The CSV dataset is loaded with date parsing enabled.
- `fillin(loc)`: Missing 'Close' values are forward-filled.
- `markers(loc)`: 7DMA, 30DMA, Return, and RSI are computed and added to the dataset.
- `ins(loc)`: Feature columns are prepared for model input.
- `outs(loc)`: The closing price is prepared as the target for model output.
- `randomf(loc)`: The Random Forest model is trained, evaluated, and performance metrics are reported.

In [1]:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from pathlib import Path
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_absolute_error
import matplotlib.pyplot as plt

In [2]:
def dt(loc):
    stock = pd.read_csv(loc, parse_dates=True)
    return stock

def fillin(loc):
    stock = dt(loc)
    stock['Close'] = stock['Close'].ffill()
    return stock

def markers(loc):
    stock = fillin(loc)
    stock['7DMA'] = stock['Close'].shift(1).rolling(window=7, min_periods=0).mean()
    stock['30DMA'] = stock['Close'].shift(1).rolling(window=30, min_periods=0).mean()
    stock['Return'] = stock['Close'].shift(1).pct_change() * 100
    change = stock['Close'].diff()
    avgain = change.where(change > 0, 0).shift(1).rolling(window=14, min_periods=1).sum() / 14
    avloss = abs(change.where(change < 0, 0).shift(1).rolling(window=14, min_periods=1).sum()) / 14
    stock['RSI'] = (avgain / (avgain + avloss)) * 100
    return stock

In [3]:
def ins(loc):
    stock = markers(loc)
    inputs = pd.DataFrame()
    inputs['7DMA'] = stock['7DMA']
    inputs['30DMA'] = stock['30DMA']
    inputs['RSI'] = stock['RSI']
    return inputs

def outs(loc):
    stock = markers(loc)
    outputs = pd.DataFrame()
    outputs['Close'] = stock['Close']
    return outputs

In [4]:
def randomf(loc):
    inputs = ins(loc)
    outputs = outs(loc)
    stock = markers(loc)
    intrain, intest, outtrain, outtest = train_test_split(inputs, outputs, test_size=0.2, shuffle=False)
    rf = RandomForestRegressor(n_estimators=100, random_state=42)
    rf.fit(intrain, outtrain.values.ravel())
    predout = rf.predict(intest)
    stock['Predicted'] = np.nan
    stock.loc[intest.index, 'Predicted'] = predout
    print("Mean Absolute Error:", mean_absolute_error(outtest, predout))
    print("R^2 Score:", r2_score(outtest, predout))
    return stock

In [5]:
def signal(loc):
    stock = randomf(loc)
    trainp = stock[stock['Predicted'].isna()]
    testp = stock[stock['Predicted'].notna()].copy()
    lastknown = trainp['Close'].iloc[-1]
    testp['Signal'] = np.where(testp['Predicted'] > lastknown, 'Buy', 'Sell')
    testp['Profit'] = abs(testp['Predicted'] - lastknown)
    print('Maximum Possible Profit in Test Period:', testp['Predicted'].max() - lastknown)
    print('Best Date to Sell:', testp.loc[testp['Predicted'].idxmax()])
    return testp

## Model Execution

The Random Forest regression model is executed on the dataset. Model performance, trading signals, and profit calculations are displayed.

In [6]:
# Replace this path with your own
print(signal(Path.home() / 'Downloads' / 'Amazon' / 'amzn.us.csv'))

Mean Absolute Error: 281.1305215324929
R^2 Score: -1.3661743871262022
Maximum Possible Profit in Test Period: 16.075300000000198
Best Date to Sell: Date         2013-12-26
Open             401.79
High             404.52
Low              396.81
Close            404.39
Volume          1869140
OpenInt               0
7DMA         396.012857
30DMA        381.412667
Return         -0.92326
RSI           64.717652
Predicted      314.3053
Signal              Buy
Profit          16.0753
Name: 4175, dtype: object
            Date     Open     High      Low    Close   Volume  OpenInt  \
4122  2013-10-10   304.65   306.70   302.59   305.17  2556190        0   
4123  2013-10-11   304.77   310.93   303.84   310.89  2162268        0   
4124  2013-10-14   309.22   311.64   307.00   310.70  1938900        0   
4125  2013-10-15   309.92   310.79   305.26   306.40  2261100        0   
4126  2013-10-16   308.38   310.80   305.55   310.49  2178262        0   
...          ...      ...      ...      ...   