<a href="https://colab.research.google.com/github/sppatel05/Stock-Direction-Prediction-ML-/blob/main/stock_direction_prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Stock Direction Prediction Using ML
This project predicts the next day stock direction(up/down) using historical market data and ML,and evaluates wheather ML predictions are meaningful in practice or not.

In [31]:
!pip install yfinance



In [32]:
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report,roc_auc_score

#Data Loading
Load historical stock price data into a pandas Dataframe

In [33]:
ticker = "RELIANCE.NS"
df = yf.download(ticker,start = "2010-01-01")
df.head()

  df = yf.download(ticker,start = "2010-01-01")
[*********************100%***********************]  1 of 1 completed


Price,Close,High,Low,Open,Volume
Ticker,RELIANCE.NS,RELIANCE.NS,RELIANCE.NS,RELIANCE.NS,RELIANCE.NS
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2010-01-04,218.593079,221.76375,207.688843,221.76375,76646086
2010-01-05,217.617523,233.714773,216.540313,233.714773,21392825
2010-01-06,221.133698,222.454812,217.607339,219.487382,23691760
2010-01-07,224.802307,226.621378,219.101204,221.133693,26197920
2010-01-08,224.212891,226.377487,222.759669,225.198642,15110149


In [34]:
df.columns=df.columns.get_level_values(0)

#Feature Engineering
We create technical indicators from closing prices to capture trend,momentum,and volatility

In [35]:
df['Return']=df['Close'].pct_change()
df['SMA_5']=df['Close'].rolling(5).mean()
df['SMA_10']= df['Close'].rolling(10).mean()
df['Volatility']=df['Return'].rolling(5).std()
df['Momentum']=df['Close']-df['Close'].shift(5)


#Target Variables
Target = 1 if next day price closing price is higher than today,else 0.

In [36]:
df['Target']=(df['Close'].shift(-1)>df['Close']).astype(int)

In [37]:
df.dropna(inplace=True)

In [38]:
features =['Return','SMA_5','SMA_10','Volatility','Momentum']
x=df[features]
y=df['Target']

#Model Training and Evaluation
Train a ml model to classify price direction and evaluate its performance on test data.

In [39]:
split = int(0.8*len(df))
if split == 0:
  raise ValueError("Not enough data after feature engineering")
x_train,x_test =x.iloc[:split],x.iloc[split:]
y_train,y_test =y.iloc[:split],y.iloc[split:]

In [40]:
x_train.isna().sum()

Unnamed: 0_level_0,0
Price,Unnamed: 1_level_1
Return,0
SMA_5,0
SMA_10,0
Volatility,0
Momentum,0


In [41]:
print("Total rows:",len(df))
print("Train rows:",len(x_train))
print("Teast rows:",len(x_test))

Total rows: 3934
Train rows: 3147
Teast rows: 787


In [42]:
assert len(x_train) > 0,"Training set is empty!"

In [43]:
scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(x_train)
x_test_scaled = scaler.transform(x_test)


#Method-1,Using Logistic Regression

In [44]:
lr = LogisticRegression()
lr.fit(x_train_scaled,y_train)
y_pred_lr=lr.predict(x_test_scaled)
y_prob_lr=lr.predict_proba(x_test_scaled)[:,1]


#Result by LR

In [45]:
print(classification_report(y_test,y_pred_lr))
print("ROC-AUC:",roc_auc_score(y_test,y_prob_lr))

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

           0       0.00      0.00      0.00       385
           1       0.51      1.00      0.68       402

    accuracy                           0.51       787
   macro avg       0.26      0.50      0.34       787
weighted avg       0.26      0.51      0.35       787

ROC-AUC: 0.4859275053304904


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


#Method-2,Using Random Forest Classification

In [46]:
rf = RandomForestClassifier(n_estimators=300,max_depth=6,random_state=42)
rf.fit(x_train,y_train)
y_pred_rf=rf.predict(x_test)
y_prob_rf=rf.predict_proba(x_test)[:,1]

#Result by Random forest classification


In [47]:
print(classification_report(y_test,y_pred_rf))
print("ROC-AUC:",roc_auc_score(y_test,y_prob_rf))

              precision    recall  f1-score   support

           0       0.49      0.74      0.59       385
           1       0.51      0.27      0.35       402

    accuracy                           0.50       787
   macro avg       0.50      0.50      0.47       787
weighted avg       0.50      0.50      0.47       787

ROC-AUC: 0.50995670995671


#Conclusion
Random Forest classification performed slightly better than Logistic
Regression in terms of ROC-AUC. Logistic Regression achieved a ROC-AUC
of 0.485, which is slightly below random guessing(0.5), while Random Forest
achieved a ROC-AUC of 0.51, which is marginally better than random guessing.

However, both models show performance close to random, highlighting the
difficulty of predicting short-term stock price direction using only
historical price-based technical indicators.