<a href="https://colab.research.google.com/github/oceanx22/stock_prediction/blob/main/stock_prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **GOOGLE Stock Price Prediction**

## Linear Regression Model

Importing Libraries

In [1]:
#Importing the necessary python libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import datetime
# For preprocessing of data
from sklearn.preprocessing import MinMaxScaler,StandardScaler
# For building the Regression Model
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# For Model Evaluation
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error

    COLLECTING AND LOADING DATA

In [5]:
# Importing the csv file
df = pd.read_csv ('GOOGL.csv')

ParserError: ignored

**UNDERSTANDING DATA**

In [None]:
# Viewing the file headers to derive a primary meaning of the data
df.head()

**EXPLORATORY DATA ANALYSIS**

In [6]:
df.shape

NameError: ignored

In [None]:
df.info()

In [None]:
df.rename(columns={'Close(t)':'Close'}, inplace=True)
df.head()

Date column is of type 'object'. Therefore it is necessary to cast it into type Date

In [None]:
# changing date format - date time format, to_datetime func
df['Date'] = pd.to_datetime(df['Date'])

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
df.columns

##**PLOT TIME SERIES CHART FOR GOOGLE**
###**Based on the Close Price of the Stock**

In [None]:
#Visualizing Close Price data
plt.figure(figsize=(15, 7))
sns.lineplot(data=df, x='Date', y='Close')
plt.title("Google Stock Price", fontsize=20)
plt.ylabel('Close Price in USD($) ', fontsize=14)
plt.xlabel('Years', fontsize=14)
plt.grid(which="major", color='k', linestyle='-.', linewidth=0.5)
plt.show()

In [None]:
#Visualizing the Volume of trade data
df['Volume'].plot(figsize=(20, 5))
plt.title('Volume Traded', fontsize=20)
plt.ylabel('Volume ', fontsize=14)
plt.xlabel('Date', fontsize=14)
plt.show()

## **PREPROCESSING THE DATA**

In [None]:
df.dtypes

In [None]:
df.isna().sum()

In [None]:
df.corr()['Close'].sort_values(ascending=False).head(10)

#### **TRAIN TEST SPLIT**

Close_forecast is the column that we are trying to predict here which is the price for the next day

In [None]:
# Set the 'Date' column as the index
df.set_index('Date', inplace=True)
X = np.array(df.index).reshape(-1,1)
# To create the NumPy array X from the index values of the dataframe df('Date')
# df.index refers to the index values of the DataFrame df, which in this case are the dates.
# np.array(df.index) converts the index values into a NumPy array.
# .reshape(-1,1) is used to reshape the array. In this case, -1 indicates that the number of rows is unknown and will be inferred based on the number of elements in the array, while 1 indicates that there should be only one column.
# By reshaping the array to have one column, we are preparing the data to be used as the independent variable (X) in the linear regression model. The X array will contain the dates, allowing us to train the model to predict the stock prices based on the corresponding dates
y = df['Close_forcast']

In [None]:
# Splitting the data into train and test set
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=5)

In [None]:
X.shape

In [None]:
X_train.shape

In [None]:
y_train.shape

In [None]:
y_test.shape

**Scaling the train set features**

In [None]:
mm = MinMaxScaler()
ss = StandardScaler()

#Feature Scaling -results in a feature set having normally distributed values
ss.fit(X_train)

**BUILDING LINEAR REGRESSION MODEL**

In [None]:
# Creating an instance of the LinearRegression class and training it
lr = LinearRegression()
lr.fit(X_train,y_train)
# y_pred = lr.predict(X_test)

In [None]:
type(X_test)

In [None]:
# Predict the stock prices
X_test = X_test.astype(float)
y_pred = lr.predict(X_test)

In [None]:
y_test

In [None]:
y_pred_df = pd.DataFrame(y_pred)
y_pred_df

In [None]:
y_test_df=pd.DataFrame(y_test)
y_test_df

In [None]:
# Create a new DataFrame for visualization
df_pred = pd.DataFrame({'Actual': y_test_df['Close_forcast'], 'Predicted': y_pred}, index=y_test.index)

In [None]:
df_pred = df_pred.reset_index()
df_pred

In [None]:
df_pred.dtypes

In [None]:
# Sorting the df_pred dataframe according to 'Date'
df_pred_sorted = df_pred.sort_values('Date')
df_pred_sorted

**PLOT PREDICTED vs ACTUAL PRICE ON TIME SERIES PLOT FOR GOOGLE**

In [None]:
# Plot the actual and predicted stock prices
plt.figure(figsize=(12, 6))
plt.plot(df_pred_sorted['Date'], df_pred_sorted['Actual'], label='Actual')
plt.plot(df_pred_sorted['Date'], df_pred_sorted['Predicted'], label='Predicted')
plt.title('Actual vs Predicted Stock Prices - GOOGLE')
plt.xlabel('Date')
plt.ylabel('CLosing Price')
plt.legend(['Actual', 'Predicted'])
plt.show()

**MODEL EVALUATION**

In [None]:
# Calculating theMAE and MSE for the Model
print(mean_absolute_error(y_test,y_pred))
print(mean_squared_error(y_test,y_pred))
np.sqrt(mean_squared_error(y_test,y_pred))

**DECICISION TREE REGRESSION**

In [None]:
from sklearn.model_selection import cross_val_score

from sklearn.tree import DecisionTreeRegressor
tree_reg = DecisionTreeRegressor()
tree_reg.fit(X_train, y_train)

In [None]:
y_pred_tree = tree_reg.predict(X_test)

In [None]:
mean_squared_error(y_test, y_pred_tree)

In [None]:
#but this is overfitting model
#cv=8 => dividing into 8 folds
rmses = np.sqrt(-cross_val_score(tree_reg, X_train, y_train, cv=8, scoring='neg_mean_squared_error'))

In [None]:
#rmse of DecisionTreeRegressor
rmses

**RANDOM FOREST REGRESSOR**

In [None]:
from sklearn.ensemble import RandomForestRegressor
rand_reg = RandomForestRegressor()
rmses = np.sqrt(-cross_val_score(rand_reg, X_train, y_train, cv=8, scoring='neg_mean_squared_error'))

In [None]:
#rmse of RandomForestRegressor
rmses

In [None]:
rand_reg.fit(X_train, y_train)
y_pred_rf = rand_reg.predict(X_test)
mean_squared_error(y_test, y_pred_rf)

The minimum value of Root Mean Squared Error is obtained by using the RandomForestRegressor model.