# **DOCUMENTATION**



# **STOCK PRICE PREDICTION**

## **PROBLEM STATEMENT**

The goal of this project is to develop a model that can predict stock prices based on historical data and technical indicators. This prediction can be valuable for traders and investors looking to make informed decisions about buying or selling stocks.

## **DESIGN THINKING PROCESS**

**Project Inception:**

* Identified the need for a reliable stock price prediction model.
* Defined the scope and objectives of the project.

**Data Collection and Exploration:**

* Obtained historical stock price data for Microsoft (MSFT) from a reliable source.
* Explored the dataset to understand its structure, features, and potential challenges.

**Feature Engineering:**

* Created technical indicators like Moving Averages, RSI, and MACD to provide additional context for the model.

**Model Selection and Architecture:**

* Chose a combination of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) layers for the model architecture.

**Data Preprocessing and Scaling:**

* Preprocessed the data by filling missing values and applying Min-Max scaling to normalize the features.

**Model Training and Evaluation:**

* Split the data into training and testing sets.
* Trained the CNN-LSTM model on the training data.
* Evaluated the model's performance using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).

**Model Interpretation:**

* Interpreted the model's predictions to gain insights into its behavior and performance.

**Visualization and Reporting:**

* Visualized the predicted vs. actual stock prices to assess the model's accuracy.
Generated reports to communicate the findings.

## **DATASET SOURCE**

**Dataset used:**

MICROSOFT LIFETIME STOCKS DATASET

**Dataset Source:**

[kaggle.com](https://)

**Dataset Link:**

https://www.kaggle.com/datasets/prasoonkottarathil/microsoft-lifetime-stocks-dataset



## **PHASES OF DEVELOPMENT**

**Data Collection and Preprocessing:**

* Imported necessary libraries for data manipulation and visualization.
* Loaded and processed historical stock price data for Microsoft (MSFT).
* Applied feature engineering to create technical indicators.

**Model Development:**

* Designed a CNN-LSTM architecture for stock price prediction.
* Compiled and trained the model using the processed data.

**Model Evaluation:**

* Calculated Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) to assess model performance.

**Results and Visualization:**

* Plotted the true vs. predicted stock prices for visual inspection.
* Generated performance metrics for documentation.

## **DATA PREPROCESSING STEPS**

**Loading Data:**

* Imported necessary libraries including pandas, numpy, and matplotlib.
* Loaded historical stock price data from a CSV file.

**Date Formatting:**

* Converted the 'Date' column to datetime format and set it as the index.

**Handling Missing Values:**

* Filled missing values using forward fill method.

**Feature Engineering:**

* Created technical indicators such as Moving Averages (MA), Relative Strength Index (RSI), and Moving Average Convergence Divergence (MACD).

## **MODEL TRAINING PROCESS**

**Scaling Data:**

* Applied Min-Max scaling to normalize the features.

**Sequence Generation:**

* Created sequences of data for input to the model with a specified sequence length.

**Train-Test Split:**

* Split the data into training and testing sets (80% training, 20% testing).

**Model Architecture:**

* Designed a CNN-LSTM model with Conv1D, LSTM, and Dense layers.

**Model Compilation and Training:**

* Compiled the model with Adam optimizer and mean squared error loss function.
* Trained the model with 50 epochs and batch size of 64.

## **MODEL EVALUATION**

**Prediction and Inverse Scaling:**

* Made predictions on the test set and applied inverse scaling to obtain real-world stock prices.

**Performance Metrics:**

* Calculated Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) to quantify model performance.

## **KEY FINDINGS AND INSIGHTS**

* The CNN-LSTM model demonstrates promising performance in predicting stock prices for Microsoft (MSFT).
* The inclusion of technical indicators like RSI and MACD contributes to the model's accuracy.
* The model is particularly effective in capturing short to medium-term trends.

## **RECOMMENDATIONS**

* Further fine-tuning of hyperparameters and exploration of alternative architectures may lead to improved performance.
* Consider incorporating additional features or external factors that could influence stock prices.
* Conduct thorough backtesting and validate the model's predictions against a wider range of historical data.

## **CODE**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Input, Conv1D
from tensorflow.keras.layers import Conv2D


#ADDING TECHNICAL INDICATORS

# Define a function to calculate technical indicators
def add_technical_indicators(data):
    # Moving Average (MA)
    data['MA_50'] = data['Close'].rolling(window=50).mean()
    data['MA_200'] = data['Close'].rolling(window=200).mean()

    # Relative Strength Index (RSI)
    delta = data['Close'].diff(1)
    gain = (delta.where(delta > 0, 0)).fillna(0)
    loss = (-delta.where(delta < 0, 0)).fillna(0)
    avg_gain = gain.rolling(window=14).mean()
    avg_loss = loss.rolling(window=14).mean()
    rs = avg_gain / avg_loss
    data['RSI'] = 100 - (100 / (1 + rs))

    # Moving Average Convergence Divergence (MACD)
    data['Short_MA'] = data['Close'].rolling(window=12).mean()
    data['Long_MA'] = data['Close'].rolling(window=26).mean()
    data['MACD'] = data['Short_MA'] - data['Long_MA']

    return data.dropna()



#DATA COLLECTION

data = pd.read_csv('/content/MSFT.csv')
data['Date'] = pd.to_datetime(data['Date'])
data.set_index('Date', inplace=True)

data.info()
data.shape
data.fillna(method='ffill', inplace=True)



#DEFINE CNN-LSTM MODEL

# Apply feature engineering
data = add_technical_indicators(data)

scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)

sequence_length = 14
X, y = [], []
for i in range(sequence_length, len(scaled_data)):
    X.append(scaled_data[i - sequence_length:i])
    y.append(scaled_data[i, 3])  # Using 'close price' as the target

X, y = np.array(X), np.array(y)

train_size = int(0.8 * len(X))
X_train, X_test, y_train, y_test = X[:train_size], X[train_size:], y[:train_size], y[train_size:]

X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], X_train.shape[2]))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], X_test.shape[2]))

model_cnn_lstm = Sequential()
model_cnn_lstm.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(X_train.shape[1], X_train.shape[2])))
model_cnn_lstm.add(LSTM(200, return_sequences=True))
model_cnn_lstm.add(LSTM(200))
model_cnn_lstm.add(Dense(1))
model_cnn_lstm.compile(optimizer='adam', loss='mean_squared_error')



#MODEL TRAINING AND PREDICTION

model_cnn_lstm.fit(X_train, y_train, batch_size=64, epochs=50, verbose=1)

y_pred_cnn_lstm = model_cnn_lstm.predict(X_test)



#MODEL EVALUATION

y_pred_cnn_lstm = scaler.inverse_transform(np.hstack((np.zeros((len(y_pred_cnn_lstm), scaled_data.shape[1] - 1)), y_pred_cnn_lstm)))
y_test = scaler.inverse_transform(np.hstack((np.zeros((len(y_test), scaled_data.shape[1] - 1)), np.expand_dims(y_test, axis=1))))

y_pred_close_cnn_lstm = y_pred_cnn_lstm[:, -1]
y_test_close = y_test[:, -1]

mae_cnn_lstm = mean_absolute_error(y_test_close, y_pred_close_cnn_lstm)
rmse_cnn_lstm = np.sqrt(mean_squared_error(y_test_close, y_pred_close_cnn_lstm))

print(f"CNN-LSTM Model:")
print(f"Mean Absolute Error: {mae_cnn_lstm}")
print(f"Root Mean Squared Error: {rmse_cnn_lstm}")



#VISUALIZATION

plt.figure(figsize=(12, 6))
plt.plot(data.index[-len(y_test):], y_test_close, label='True Close Price', color='blue')
plt.plot(data.index[-len(y_test):], y_pred_close_cnn_lstm, label='Predicted Close Price (CNN-LSTM)', color='red')
plt.xlabel('Date')
plt.ylabel('Close Price')
plt.title('Stock Price Prediction using CNN-LSTM')
plt.legend()
plt.show()