# Stock Market Trend Prediction with LSTM

This project uses an LSTM (Long Short-Term Memory) model to predict stock market trends. The model is trained on a dataset from Google Finance.

<div style="text-align: center;">
    <img src="Stock_Market.png" alt="Stock" width="600">
    <br>
    <em>Power of Machine Learning to Forecast Financial Trends</em>
</div>

## Project Description

This project aims to predict stock market trends using an LSTM model. LSTMs are a type of recurrent neural network that are capable of learning long-term dependencies, making them well-suited for time series forecasting tasks such as stock market prediction.

The model is trained on a dataset obtained from Google Finance. This dataset includes historical stock prices, which the model uses to learn patterns in the stock market's movements. Once trained, the model can predict future trends, providing valuable insights for investment decisions.

## Model Architecture

The LSTM model used in this project consists of several layers:

1. An LSTM layer with 50 units and 'return_sequences' set to True.
2. A Dropout layer to prevent overfitting.
3. Three additional LSTM layers, each followed by a Dropout layer.
4. A Dense output layer with one unit.

The model is compiled with the Adam optimizer and the mean squared error loss function, making it suitable for regression tasks.

## Usage

To use this project, you'll need to install the necessary Python libraries, including Keras for building the LSTM model and pandas for data manipulation. You'll also need to obtain the stock market data from Google Finance.

Once you have the data and the necessary libraries, you can train the model by running the provided Python script. After training, you can use the model to make predictions on new data.

## Future Work

Future work on this project could include tuning the model's hyperparameters for better performance, incorporating additional features into the model (such as volume or other technical indicators), or using different types of models for comparison.

Please replace 'image-url' with the actual URL or relative path of your image.

In [20]:
import pandas as pd
data = pd.read_csv("Google_Stock_Price_Train.csv")
data

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,1/3/2012,325.25,332.83,324.97,663.59,7380500
1,1/4/2012,331.27,333.87,329.08,666.45,5749400
2,1/5/2012,329.83,330.75,326.89,657.21,6590300
3,1/6/2012,328.34,328.77,323.68,648.24,5405900
4,1/9/2012,322.04,322.29,309.46,620.76,11688800
...,...,...,...,...,...,...
1253,12/23/2016,790.90,792.74,787.28,789.91,623400
1254,12/27/2016,790.68,797.86,787.66,791.55,789100
1255,12/28/2016,793.70,794.23,783.20,785.05,1153800
1256,12/29/2016,783.33,785.93,778.92,782.79,744300


In [21]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1258 entries, 0 to 1257
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Date    1258 non-null   object 
 1   Open    1258 non-null   float64
 2   High    1258 non-null   float64
 3   Low     1258 non-null   float64
 4   Close   1258 non-null   object 
 5   Volume  1258 non-null   object 
dtypes: float64(3), object(3)
memory usage: 59.1+ KB


In [22]:
data["Date"] = data["Date"].apply(lambda i:i.replace("/" , "")).astype(int)
data

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,132012,325.25,332.83,324.97,663.59,7380500
1,142012,331.27,333.87,329.08,666.45,5749400
2,152012,329.83,330.75,326.89,657.21,6590300
3,162012,328.34,328.77,323.68,648.24,5405900
4,192012,322.04,322.29,309.46,620.76,11688800
...,...,...,...,...,...,...
1253,12232016,790.90,792.74,787.28,789.91,623400
1254,12272016,790.68,797.86,787.66,791.55,789100
1255,12282016,793.70,794.23,783.20,785.05,1153800
1256,12292016,783.33,785.93,778.92,782.79,744300


In [23]:
data["Volume"] = data["Volume"].apply(lambda i:i.replace("," , "")).astype(int)
data

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,132012,325.25,332.83,324.97,663.59,7380500
1,142012,331.27,333.87,329.08,666.45,5749400
2,152012,329.83,330.75,326.89,657.21,6590300
3,162012,328.34,328.77,323.68,648.24,5405900
4,192012,322.04,322.29,309.46,620.76,11688800
...,...,...,...,...,...,...
1253,12232016,790.90,792.74,787.28,789.91,623400
1254,12272016,790.68,797.86,787.66,791.55,789100
1255,12282016,793.70,794.23,783.20,785.05,1153800
1256,12292016,783.33,785.93,778.92,782.79,744300


In [24]:
data["Close"] = data["Close"].apply(lambda i:i.replace("," , "")).astype(float)
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1258 entries, 0 to 1257
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Date    1258 non-null   int32  
 1   Open    1258 non-null   float64
 2   High    1258 non-null   float64
 3   Low     1258 non-null   float64
 4   Close   1258 non-null   float64
 5   Volume  1258 non-null   int32  
dtypes: float64(4), int32(2)
memory usage: 49.3 KB


In [25]:
x = data.iloc[: , :-1].values
y = data.iloc[: , -1].values

In [26]:
from sklearn.model_selection import train_test_split
X_train , X_test , y_train , y_test = train_test_split(x , y , test_size=0.2 , random_state=0)

In [27]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout

In [28]:
# model initializing
regressor = Sequential()


# Adding the first LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
regressor.add(Dropout(0.2))

# Adding a second LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

# Adding a third LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

# Adding a fourth LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50))
regressor.add(Dropout(0.2))

  super().__init__(**kwargs)


In [29]:
!pip install numpy

Defaulting to user installation because normal site-packages is not writeable


In [30]:
pip install tensorflow

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [31]:
# Adding the output layer
regressor.add(Dense(units = 1))

# Compiling the RNN
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error' , metrics = ["accuracy"])

# Fitting the RNN to the Training set
regressor.fit(X_train, y_train, epochs = 200, batch_size = 32)

Epoch 1/200
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 12ms/step - accuracy: 0.0000e+00 - loss: 14411806801920.0000
Epoch 2/200
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.0000e+00 - loss: 14686584045568.0000
Epoch 3/200
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.0000e+00 - loss: 13760777420800.0000
Epoch 4/200
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.0000e+00 - loss: 14749946347520.0000
Epoch 5/200
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.0000e+00 - loss: 16036772446208.0000
Epoch 6/200
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.0000e+00 - loss: 13858244657152.0000
Epoch 7/200
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.0000e+00 - loss: 14383064285184.0000
Epoch 8/200
[1m32/32[0m [32m━━━━━━━━━━

<keras.src.callbacks.history.History at 0x2080c83b220>

In [34]:
from sklearn.preprocessing import StandardScaler
import numpy as np
dataset_test = pd.read_csv('Google_Stock_Price_Test.csv')
real_stock_price = dataset_test.iloc[:, 1:2].values
sc = StandardScaler()

# Getting the predicted stock price of 2017
dataset_total = pd.concat((data['Open'], dataset_test['Open']), axis = 0)
inputs = dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
inputs = inputs.reshape(-1,1)
inputs = sc.fit_transform(inputs)
X_test = []
for i in range(60, 80):
    X_test.append(inputs[i-60:i, 0])
X_test = np.array(X_test)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))


In [None]:
data

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,132012,325.25,332.83,324.97,663.59,7380500
1,142012,331.27,333.87,329.08,666.45,5749400
2,152012,329.83,330.75,326.89,657.21,6590300
3,162012,328.34,328.77,323.68,648.24,5405900
4,192012,322.04,322.29,309.46,620.76,11688800
...,...,...,...,...,...,...
1253,12232016,790.90,792.74,787.28,789.91,623400
1254,12272016,790.68,797.86,787.66,791.55,789100
1255,12282016,793.70,794.23,783.20,785.05,1153800
1256,12292016,783.33,785.93,778.92,782.79,744300


In [None]:
m = pd.DataFrame({"Date" : [122015] , "Open" : [360] , "High" : [364] , "Low" : [359] , "Close" : [445]})
p = regressor.predict(m)
p



array([[332.09277]], dtype=float32)