**The AutoRegressive Integrated Moving Average (ARIMA) model**

A famous and widely used forecasting method for time-series prediction is the AutoRegressive Integrated Moving Average (ARIMA) model. ARIMA models are capable of capturing a suite of different standard temporal structures in time-series data.

**Terminology** : 
Let’s break down these terms:



*   **AR: < Auto Regressive >** means that the model uses the dependent relationship between an observation and some predefined number of lagged observations (also known as “time lag” or “lag”). 
*   **I:< Integrated >** means that the model employs differencing of raw observations (e.g. it subtracts an observation from an observation at the previous time step) in order to make the time-series stationary.MA:

*   **MA: < Moving Average >** means that the model exploits the relationship between the residual error and the observations.

**Model parameters**

-- The standard ARIMA models expect as input parameters 3 arguments i.e. p,d,q.

* p is the number of lag observations.
* d is the degree of differencing.
* q is the size/width of the moving average window.

In [None]:
# Importing Required Lib

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
from pandas.plotting import lag_plot
from pandas import datetime
from statsmodels.tsa.arima_model import ARIMA
from sklearn.metrics import mean_squared_error

In [None]:
# Read the data
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Time Series/TATAMOTORS.NS.csv")
df.head(5)

**About the Data**


*   Stock Name: **Tata Motors Limited (TATAMOTORS.NS)**
*   Period: 5 year (May 31, 2018 - May 31, 2023)
*   Attribute: Date, Open, High, Low, Adj Close and Volume
*   Target: Our target variable will be the **Close** value





In [None]:
# let’s see if there is some cross-correlation in out data.
plt.figure()
lag_plot(df['Open'], lag=3)
plt.title('Tata Motors Limited Stock - Autocorrelation plot with lag = 3')
plt.show()

**there is auto-correlation in the data**

In [None]:
# let’s plot the stock price evolution over time.
plt.figure(figsize = (15,5))
plt.plot(df["Date"], df["Close"])
plt.xticks(np.arange(0,1259, 200), df['Date'][0:1259:200])
plt.title("Tata Motors Limited stock price over time")
plt.xlabel("time")
plt.ylabel("price")
plt.show()

In [None]:
# Preprocess the data
df['Date'] = pd.to_datetime(df['Date'])  # Convert 'Date' column to datetime format
df = df.dropna()  # Drop rows with missing values or use appropriate technique to handle them

In [None]:
# Fit the ARIMA model
import statsmodels.api as sm
train_data, test_data = df[0:int(len(df)*0.7)], df[int(len(df)*0.7):]
training_data = train_data['Close'].values
test_data = test_data['Close'].values
history = [x for x in training_data]
model_predictions = []
N_test_observations = len(test_data)
for time_point in range(N_test_observations):
    model = sm.tsa.arima.ARIMA(history, order=(4,1,0))
    model_fit = model.fit()
    output = model_fit.forecast()
    yhat = output[0]
    model_predictions.append(yhat)
    true_test_value = test_data[time_point]
    history.append(true_test_value)
MSE_error = mean_squared_error(test_data, model_predictions)
print('Testing Mean Squared Error is {}'.format(MSE_error))

The MSE of the test set is quite large denoting that the precise prediction is a hard problem. However, this is the average squared value across all the test set predictions. 

Let’s visualize the predictions to understand the performance of the model more.

In [None]:
test_set_range = df[int(len(df)*0.7):].index
plt.figure(figsize = (30,5))
plt.plot(test_set_range, model_predictions, color='black', marker='+', linestyle='dashed',label='Predicted Price')
plt.plot(test_set_range, test_data, color='red', label='Actual Price')
plt.title('Tata Motor Limited Prices Prediction')
plt.xlabel('Date')
plt.ylabel('Prices')
plt.xticks(np.arange(881,1259,50), df.Date[881:1259:50])
plt.legend()
plt.show()

Deep Learning Model 

In [None]:
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

In [None]:
df.set_index('Date', inplace=True)  # Set 'Date' column as the index
data = df['Close'].values  # Extract the 'Close' column as the target variable

In [None]:
# Normalize the data:
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(data.reshape(-1, 1))

In [None]:
#Split the data into training and testing sets:

train_size = int(len(scaled_data) * 0.8)
train_data = scaled_data[:train_size]
test_data = scaled_data[train_size:]

In [None]:
# Create input sequences:

def create_sequences(data, seq_length):
    X = []
    y = []
    for i in range(len(data) - seq_length):
        X.append(data[i:i + seq_length])
        y.append(data[i + seq_length])
    return np.array(X), np.array(y)

seq_length = 10  # Number of previous time steps to use as input
X_train, y_train = create_sequences(train_data, seq_length)
X_test, y_test = create_sequences(test_data, seq_length)

In [None]:
# Build the LSTM model:

model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(seq_length, 1)))
model.add(LSTM(units=50))
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(X_train, y_train, epochs=10, batch_size=32)

In [None]:
# Make predictions:

y_pred = model.predict(X_test)

In [None]:
# Inverse the scaling:

y_pred = scaler.inverse_transform(y_pred)
y_test = scaler.inverse_transform(y_test)

In [None]:
# Evaluate the model:

mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
print(f"Root Mean Squared Error (RMSE): {rmse}")

In [None]:
import matplotlib.pyplot as plt

# Convert the dates to the appropriate format
test_dates = pd.to_datetime(df.iloc[train_size+seq_length:].index)

In [None]:
# Plot the actual and predicted prices
plt.figure(figsize=(15, 5))
plt.plot(test_dates, y_test, color='red', label='Actual Price')
plt.plot(test_dates, y_pred, color='black', linestyle='dashed', label='Predicted Price')
plt.title('Tata Motor Limited Prices Prediction')
plt.xlabel('Date')
plt.ylabel('Prices')
plt.xticks(rotation=45)
plt.legend()
plt.show()