In this Colab, we will use a keras Long Short-Term Memory (LSTM) model to predict the stock price of Tata Global Beverages


Here are some imports we need to make: numpy for scientific computation, matplotlib for graphing, and pandas for manipulating data.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd 

In [2]:
import yfinance as yf 
df = yf.download('SULA.NS', start='2018-05-25', end='2023-08-14')

Load training data set with the "Open" and "High" columns to use in our modeling.

In [3]:
df.head()

In [4]:
#url = '/Users/nimishkapoor/Downloads/TCS.NS.csv'
#dataset_train = pd.read_csv(url)
dataset_train = df

In [5]:
len(dataset_train)

In [6]:
dataset_train.head()

In [7]:
dataset_train = dataset_train.dropna()

In [8]:
dataset_train.tail(12)

In [9]:
#dataset_train = dataset_train[-12:]

In [10]:
dataset_train.head(20)

In [11]:
training_set = dataset_train.iloc[:, 1:2].values

Let's take a look at the first five rows of our dataset

In [12]:
dataset_train.tail()

Import MinMaxScaler from scikit-learn to scale our dataset into numbers between 0 and 1 

In [13]:
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range=(0,1))
training_set_scaled = sc.fit_transform(training_set)

In [14]:
print(training_set_scaled)

In [15]:
print(len(training_set_scaled))

We want our data to be in the form of a 3D array for our LSTM model. First, we create data in 60 timesteps and convert it into an array using NumPy. Then, we convert the data into a 3D array with X_train samples, 60 timestamps, and one feature at each step.

In [16]:
X_train = []
y_train = []
for i in range(60, len(training_set_scaled)):
    X_train.append(training_set_scaled[i-60:i, 0])
    y_train.append(training_set_scaled[i, 0])
X_train, y_train = np.array(X_train), np.array(y_train)

X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))

In [17]:
print(X_train)

In [18]:
print(X_train.shape[1])

In [19]:
print(y_train)

Make the necessary imports from keras

In [20]:
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dropout
from keras.layers import Dense 

Add LSTM layer along with dropout layers to prevent overfitting. After that, we add a Dense layer that specifies a one unit output. Next, we compile the model using the adam optimizer and set the loss as the mean_squarred_error

In [21]:
model = Sequential()

model.add(LSTM(units=50,return_sequences=True,input_shape=(X_train.shape[1], 1)))
model.add(Dropout(0.2))

model.add(LSTM(units=50,return_sequences=True))
model.add(Dropout(0.2))

model.add(LSTM(units=50,return_sequences=True))
model.add(Dropout(0.2))

model.add(LSTM(units=50))
model.add(Dropout(0.2))

model.add(Dense(units=1))

model.compile(optimizer='adam',loss='mean_squared_error')

model.fit(X_train,y_train,epochs=30,batch_size=32)

Import the test set for the model to make predictions on 

In [22]:
#url = 'https://raw.githubusercontent.com/mwitiderrick/stockprice/master/tatatest.csv'
#dataset_test = pd.read_csv(url)

In [23]:
#dataset_test.head()

In [24]:
#dataset_test = dataset_test[:6]

In [25]:
#dataset_test.head(10)

In [26]:
#real_stock_price = dataset_test.iloc[:, 1:2].values

In [27]:
#print(real_stock_price)

In [28]:
#dataset_total = pd.concat((dataset_train['Open'], dataset_test['Open']), axis = 0)

In [29]:
#dataset_total.head(20)

In [30]:
#inputs = dataset_total[len(dataset_total) - len(dataset_test) - 3:].values

In [31]:
#print(inputs)

In [32]:
#inputs = inputs.reshape(-1,1)

In [33]:
#print(inputs)

In [34]:
#inputs = sc.fit_transform(inputs)

In [35]:
#print(inputs)

In [36]:
#X_test = []
#for i in range(3, 9):
#    X_test.append(inputs[i-3:i, 0])

In [37]:
#print(X_test)

In [38]:
#X_test = np.array(X_test)

In [39]:
#print(X_test)

In [40]:
#X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))

In [41]:
#print(X_test)

In [42]:
#predicted_stock_price = model.predict(X_test)

In [43]:
#print(predicted_stock_price)

In [44]:
#predicted_stock_price = sc.inverse_transform(predicted_stock_price)

In [45]:
#print(predicted_stock_price)

Before predicting future stock prices, we have to manipulate the training set; we merge the training set and the test set on the 0 axis, set the time step to 60, use minmaxscaler, and reshape the dataset as done previously. After making predictions, we use inverse_transform to get back the stock prices in normal readable format.


In [46]:
#plt.plot(real_stock_price, color = 'black', label = 'TATA Stock Price')
#plt.plot(predicted_stock_price, color = 'green', label = 'Predicted TATA Stock Price')
#plt.title('TATA Stock Price Prediction')
#plt.xlabel('Time')
#plt.ylabel('TATA Stock Price')
#plt.legend()
#plt.show()

In [47]:
n_input = 60
n_features = 1

In [48]:
first_eval_batch = training_set_scaled[-n_input:]


In [49]:
print(first_eval_batch)

In [50]:
current_batch = first_eval_batch.reshape((1, n_input, n_features))

In [51]:
print(current_batch)

In [52]:
test_predictions = []


for i in range(60):
    
    # get the prediction value for the first batch
    current_pred = model.predict(current_batch)[0]
    
    # append the prediction into the array
    test_predictions.append(current_pred) 
    
    # use the prediction to update the batch and remove the first value
    current_batch = np.append(current_batch[:,1:,:],[[current_pred]],axis=1)

In [53]:
true_predictions = sc.inverse_transform(test_predictions)


In [54]:
print(true_predictions)

In [55]:
plt.plot(true_predictions, color = 'green', label = 'Predicted Stock Price')
plt.title('Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Stock Price')
plt.legend()
plt.show()