Internship on "Data analyst" at Jobaaj
DS_09: Project - Stock Market Prediction


Name: Pranita GG

****Problem Statement:****

The data is the price history and trading volumes of the fifty stocks in the index NIFTY 50 from NSE (National Stock Exchange) India. All datasets are at a day-level with pricing and trading values split across .cvs files for each stock along with a metadata file with some macro-information about the stocks itself.

# TATAMOTORS Stock Prediction

![](https://imageio.forbes.com/specials-images/imageserve/62bcbde698c96e32f370f112/Digitally-enhanced-shot-of-a-graph-showing-the-ups-and-downs-shares-on-the-stock/0x0.jpg?format=jpg&crop=1854,1042,x0,y0,safe&width=960)
[Img Source](https://www.forbes.com/sites/sergeiklebnikov/2022/06/29/stocks-are-crashing-but-history-shows-this-bear-market-could-recover-faster-than-others/?sh=224302415cc2)

# Introduction
* We have the dataset with the data of  NIFTY 50 stock prices from 1st January, 2000 to 30th April, 2021.
* Using the EDA we are trying to analyse and understand the  stock market dataset through visualizations

# Using ML Models for stock prediction

# Imports

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file 

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')
color_pal = sns.color_palette()
from sklearn.preprocessing import MinMaxScaler
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error


# Collection of Dataset

In [None]:
df=pd.read_csv('/kaggle/input/nifty50-stock-market-data/TATAMOTORS.csv')
df.head()

# Data preparation

In [None]:
df.describe()

In [None]:
df.shape

In [None]:
df.dtypes

# Exploratory Data Analysis (EDA)

> Exploratory data analysis is a great way of understanding and analyzing the data sets. The EDA technique is extensively used by data scientists and data analysts to summarize the main characteristics of data sets and to visualize them through different graphs and plots. It helps data scientists to search for patterns, spot anomalies, or check assumptions.

>EDA ensures that results are valid and applicable as per the business goals. Once the EDA task is completed, its features can be used for efficient and better data analysis, modelling, and machine learning.

In [None]:
df.isnull().sum()

>This implies that there are no null values in the data set provided.

In [None]:
df.dropna(axis = 1, inplace = True)
df.isna().sum()

In [None]:
plt.figure(figsize=(15,5))
plt.plot(df['Close'])
plt.title('TataMotors Close price.', fontsize=15)
plt.ylabel('Price')
plt.show()

In [None]:
fig,ax = plt.subplots(figsize=(15,5))
df.plot(ax=ax,x='Date',y='High',color=color_pal[2])
df.plot(ax=ax,x='Date',y='Close',color=color_pal[0])
plt.show()

In [None]:
plt.figure(figsize=(7,5))
sns.heatmap(df.corr(),cmap='Blues',annot=True)

In [None]:
df['open-high'] = df['Open']-df['High']
df['open-low'] = df['Open'] - df['Low']
df['close-high'] = df['Close']-df['High']
df['close-low'] = df['Close'] - df['Low']
df['high-low'] = df['High'] - df['Low']
df['open-close'] = df['Open'] - df['Close']
df.head()

In [None]:
data2 = df.copy()
data2 = data2.drop(['Open','High','Low','Last', 'Close'],axis=1)
plt.figure(figsize=(8,6))
sns.heatmap(data2.corr(),cmap='Blues',annot=True)

In [None]:
from sklearn.preprocessing import LabelEncoder

# create a LabelEncoder object
le = LabelEncoder()

# fit and transform the column(s) to be encoded
df['Symbol'] = le.fit_transform(df['Symbol'])

In [None]:
df['Series'] = le.fit_transform(df['Symbol'])

In [None]:
# Convert date column to datetime object
df['Date'] = pd.to_datetime(df['Date'])

# Extract year, month, and day into separate columns
df['year'] = df['Date'].dt.year
df['month'] = df['Date'].dt.month
df['day'] = df['Date'].dt.day

In [None]:
df.drop('Date', axis = 1, inplace = True)

In [None]:
df.head()

In [None]:
df.columns

In [None]:
columns = ['Prev Close', 'Open', 'High', 'Low', 'Last',
       'Close', 'VWAP', 'Volume', 'Turnover', 'open-high', 'open-low',
       'close-high', 'close-low', 'high-low', 'open-close', 'year', 'month',
       'day']

In [None]:
for column in columns:
    plt.figure(figsize = (25,6))
    sns.displot(df[column])
    plt.xticks(rotation=90)
    plt.show()

In [None]:
for column in columns:
    plt.figure(figsize = (25,6))
    sns.boxplot(x = df[column])
    plt.xticks(rotation=90)
    plt.show()

In [None]:
sns.pairplot(df[df.columns],height = 5 ,kind ='scatter',diag_kind='kde')

# Building & Evaluating Machine learning Model

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

In [None]:
X = df.drop('Close', axis=1)
y = df['Close']

In [None]:
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
model = LinearRegression()

In [None]:
# Train the Linear Regression Model
model.fit(X_train, y_train)

# Predictions

In [None]:
preds = model.predict(X_test)

In [None]:
from sklearn.metrics import r2_score

# Evaluate the model's accuracy
accuracy = r2_score(y_test, preds)
print("Accuracy:", accuracy)

**Accuracy: 100%**