## **Stock Market Prediction using Numerical and Textual Analysis**


* Objective: Create a hybrid model for stock price/performance prediction using numerical analysis of historical stock prices, and sentimental analysis of news headlines.
* Stock to analyze and predict - SENSEX (S&P BSE SENSEX)

* Download historical stock prices from finance.yahoo.com
* Download textual (news) data from https://bit.ly/36fFPI6




# Author: Muhammet Varlı

# **Analysis of Stock Market Prices and Stock Price Prediction by RNN and LSTM**

### **Some information about financial data**

* The adjusted closing price amends a stock's closing price to reflect that stock's value after accounting for any corporate actions.
* It is often used when examining historical returns or doing a detailed analysis of past performance.
* Stock values are stated in terms of the closing price and the adjusted closing price.
* The closing price is the raw price, which is just the cash value of the last transacted price before the market closes.
* The adjusted closing price factors in anything that might affect the stock price after the market closes.

* The high is the highest price at which a stock traded during a period. 
* The low is the lowest price of the period. 

* Volume is the total number of shares traded in a security over a period. Every time buyers and sellers exchange shares, the amount gets added to the period’s total volume. Studying volume patterns are an essential aspect of technical analysis because it can show the significance of a stock’s price movement.

* A price change that occurs in high volume can carry more weight because it indicates that many traders were behind the move. Conversely, a lower volume price move can be perceived as less important.

## The changes in price of the stock overtime.

* Here we will analyze the changes in the stocks of various technology companies with simple visualization methods.

In [None]:
from warnings import filterwarnings
filterwarnings('ignore')

In [None]:
# Some Libraries Imported
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

# For reading stock data from yahoo
from pandas_datareader.data import DataReader
# For time stamps
from datetime import datetime

In [None]:
# The tech stocks we'll use for this analysis
tech_list = ['AAPL', 'GOOG', 'TSLA', 'FB']

* Take the datas for 2 years

In [None]:
# Set up End and Start times for data grab (We will analyze for 2 years)
end = datetime.now()
start = datetime(end.year - 2, end.month, end.day)

In [None]:
#For loop for grabing yahoo finance data and setting as a dataframe
for stock in tech_list:   
    # Set DataFrame as the Stock Ticker
    globals()[stock] = DataReader(stock, 'yahoo', start, end)

In [None]:
company_list = [AAPL, GOOG, TSLA, FB]
company_name = ["APPLE", "GOOGLE", "TESLA", "FACEBOOK"]

for company, com_name in zip(company_list, company_name):
    company["company_name"] = com_name
    
df = pd.concat(company_list, axis=0)

In [None]:
df.head()

In [None]:
df.tail()

## **Adj Close**

In [None]:
# Let's see a historical view of the closing price of companies

plt.figure(figsize=(12, 8))
plt.subplots_adjust(top=1.25, bottom=1.2)
colorlist=['Red','Blue','Green','Purple']
for i, company in enumerate(company_list, 1):
    plt.subplot(2, 2, i)
    company['Adj Close'].plot(color=colorlist[i-1])
    plt.ylabel('Adj Close')
    plt.xlabel(None)
    plt.title(f"{tech_list[i - 1]}")

## **Volume**

In [None]:
# Now let's plot the total volume of stock being traded each day
plt.figure(figsize=(15, 10))
plt.subplots_adjust(top=1.25, bottom=1.2)
colorlist=['Black','Blue','Green','Purple']
for i, company in enumerate(company_list, 1):
    plt.subplot(2, 2, i)
    company['Volume'].plot(color=colorlist[i-1])
    plt.ylabel('Volume')
    plt.xlabel(None)
    plt.title(f"{tech_list[i - 1]}")

# The moving average of the various stocks

In [None]:
# Set the Moving Average Day
ma_day = [10, 20, 50]

In [None]:
for ma in ma_day:
    for company in company_list:
        column_name = f"MA for {ma} days"
        company[column_name] = company['Adj Close'].rolling(ma).mean()

In [None]:
print(AAPL.columns)

Now let's go ahead and plot all the additional Moving Averages

In [None]:
df.groupby("company_name").hist(figsize=(12, 12),color='seagreen');

### Visualization of companies changes over various MA days and 'Adj Close'

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=2)
fig.set_figheight(8)
fig.set_figwidth(15)

AAPL[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[0,0])
axes[0,0].set_title('APPLE')

GOOG[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[0,1])
axes[0,1].set_title('GOOGLE')

TSLA[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[1,0])
axes[1,0].set_title('TESLA')

FB[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[1,1])
axes[1,1].set_title('FACEBOOK')

fig.tight_layout()

# The daily return of the stock on average.

In [None]:
# We'll use pct_change to find the percent change for each day
for company in company_list:
    company['Daily Return'] = company['Adj Close'].pct_change()

# Then we'll plot the daily return percentage
fig, axes = plt.subplots(nrows=2, ncols=2)
fig.set_figheight(8)
fig.set_figwidth(15)

AAPL['Daily Return'].plot(ax=axes[0,0], legend=True, linestyle='--', marker='o',markerfacecolor='black')
axes[0,0].set_title('APPLE')

GOOG['Daily Return'].plot(ax=axes[0,1], legend=True, linestyle='--', marker='o',markerfacecolor='black')
axes[0,1].set_title('GOOGLE')

TSLA['Daily Return'].plot(ax=axes[1,0], legend=True, linestyle='--', marker='o',markerfacecolor='black')
axes[1,0].set_title('TESLA')

FB['Daily Return'].plot(ax=axes[1,1], legend=True, linestyle='--', marker='o',markerfacecolor='black')
axes[1,1].set_title('FACEBOOK')

fig.tight_layout()

In [None]:
# Note the use of dropna() here, otherwise the NaN values can't be read by seaborn
plt.figure(figsize=(12, 12))

for i, company in enumerate(company_list, 1):
    plt.subplot(2, 2, i)
    sns.distplot(company['Daily Return'].dropna(), bins=100, color='orange')
    plt.ylabel('Daily Return')
    plt.title(f'{company_name[i - 1]}')
    # Skewness and Kurtosis
    print(f'{company_name[i - 1]}'+" Skewness: %f" % company['Daily Return'].skew())
    print(f'{company_name[i - 1]}'+" Kurtosis: %f" % company['Daily Return'].kurt())
# Could have also done:
#AAPL['Daily Return'].hist()


# The correlation between different stocks Adj Close prices.

To build a DataFrame with all the ['Close'] columns for each of the stocks dataframes.

In [None]:
# Grab all the closing prices for the tech stock list into one DataFrame
closing_df = DataReader(tech_list, 'yahoo', start, end)['Adj Close']

# Let's take a quick look
closing_df.head() 

Now that we have all the closing prices, let's go ahead and get the daily return for all the stocks, like we did for the Apple stock.

In [None]:
# Make a new tech returns DataFrame
tech_rets = closing_df.pct_change()
tech_rets.head()

Now we can compare the daily percentage return of two stocks to check how correlated. First let's see a sotck compared to itself.

In [None]:
# Comparing Google to itself should show a perfectly linear relationship
sns.jointplot('TSLA', 'TSLA', tech_rets, kind='scatter', color='seagreen')

In [None]:
# We'll use joinplot to compare the daily returns of Google and Microsoft
sns.jointplot('GOOG', 'FB', tech_rets, kind='scatter',color='darkblue')

If two stocks are perfectly (and positivley) correlated with each other a linear relationship bewteen its daily return values should occur.  We will use sns.pairplot() to automatically create this plot.

In [None]:
# We can simply call pairplot on our DataFrame for an automatic visual analysis 
# of all the comparisons

sns.pairplot(tech_rets, kind='kde')

In [None]:
# We can simply call pairplot on our DataFrame for an automatic visual analysis 
# of all the comparisons

sns.pairplot(tech_rets, kind='reg')

Above we can see all the relationships between all stocks regarding daily returns. For example, there does not seem to be a very high correlation relationship between Google and Tesla.

In [None]:
# Set up our figure by naming it returns_fig, call PairPLot on the DataFrame
return_fig = sns.PairGrid(tech_rets.dropna())

# Using map_upper we can specify what the upper triangle will look like.
return_fig.map_upper(plt.scatter, color='seagreen')

# We can also define the lower triangle in the figure, inclufing the plot type (kde) 
# or the color map (BluePurple)
return_fig.map_lower(sns.kdeplot, cmap='Greens_r')

# Finally we'll define the diagonal as a series of histogram plots of the daily return
return_fig.map_diag(plt.hist, bins=30)

In [None]:
# Set up our figure by naming it returns_fig, call PairPLot on the DataFrame
returns_fig = sns.PairGrid(closing_df)

# Using map_upper we can specify what the upper triangle will look like.
returns_fig.map_upper(plt.scatter,color='darkblue')

# We can also define the lower triangle in the figure, inclufing the plot type (kde) or the color map (BluePurple)
returns_fig.map_lower(sns.kdeplot,cmap='Blues_r')

# Finally we'll define the diagonal as a series of histogram plots of the daily return
returns_fig.map_diag(plt.hist,bins=30)

### Correlation

In [None]:
# Let's go ahead and use sebron for a quick correlation plot for the daily returns
sns.heatmap(tech_rets.corr(), annot=True, cmap='summer')

In [None]:
sns.heatmap(closing_df.corr(), annot=True, cmap='summer')

Just as we suspected with our PairPlot, we see here that numerically and visually, Google and Tesla's daily stock return do not have very strong correlations compared to others. The strongest correlation is seen between Apple and Tesla. It's also interesting to see all tech companies positively associated.

# Risk Analysis

One of the most basic ways to measure risk is to compare the expected return with the standard deviation of daily returns.

In [None]:
# Let's start by defining a new DataFrame as a clenaed version of the oriignal tech_rets DataFrame
rets = tech_rets.dropna()

area = np.pi*20

plt.figure(figsize=(12, 10))
plt.scatter(rets.mean(), rets.std(), s=area)
plt.xlabel('Expected return')
plt.ylabel('Risk')

for label, x, y in zip(rets.columns, rets.mean(), rets.std()):
    plt.annotate(label, xy=(x, y), xytext=(50, 50), textcoords='offset points', ha='right', va='bottom', 
                 arrowprops=dict(arrowstyle='-', color='blue', connectionstyle='arc3,rad=-0.3'))

* **Selecting a company and concentrating on analysis on it**

In [None]:
# Summary Stats
AAPL.describe()

In [None]:
# General info
AAPL.info()

In [None]:
Column_List = ["High", "Low","Open","Close", "Volume","Adj Close"]

In [None]:
# Generate whisker plots to detect the presence of any outliers
fig, ax = plt.subplots (len(Column_List), figsize = (10, 20))

for i, col_list in enumerate(Column_List):
    sns.boxplot(AAPL[col_list], ax = ax[i], palette = "winter", orient = 'h')
    ax[i].set_title("Whisker Plot for Outlier Detection on Apple Datas on" + " " + col_list, fontsize = 10)
    ax[i].set_ylabel(col_list, fontsize = 8)
    fig.tight_layout(pad = 1.1)

In [None]:
# Visualize the spread and skweness through the distribution plot

# Use the Column_List : list initialized above in the following steps
fig, ax = plt.subplots(len(Column_List), figsize = (15, 10))

for i, col_list in enumerate(Column_List):
    sns.distplot(AAPL[col_list], hist = True, ax = ax[i])
    ax[i].set_title ("Apple Datas Frequency Distribution of" + " " + col_list, fontsize = 10)
    ax[i].set_xlabel (col_list, fontsize = 8)
    ax[i].set_ylabel ('Distribution Value', fontsize = 8)
    fig.tight_layout (pad = 1.1) # To provide space between plots
    ax[i].grid('on') # Enabled to view and make markings

In [None]:
AAPL[["Open","High","Low","Close"]].plot.area(figsize=(15,10),alpha=0.5);
plt.title('Apple Finance Stock Trend')
plt.show()

In [None]:
# A glimpse of how the market shares varied over the given time

# Create a list for numerical columns that are to be visualized
Column_List = ['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']

# Plot to view the same
AAPL.plot(y = Column_List, subplots = True, layout = (3, 3), figsize = (15, 15), sharex = False, title = "Apple Stock Value Trend from 2019-11 - 2020-11", rot = 45);

# 6. Predicting the closing price stock price of APPLE :

In [None]:
#Get the stock quote
df = DataReader('AAPL', data_source='yahoo', start='2019-06-30', end='2020-06-30')
#Show teh data
df

In [None]:
plt.figure(figsize=(16,8))
plt.title('Close Price History')
plt.plot(df['Close'])
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price USD ($)', fontsize=18)
plt.show()

In [None]:
#Create a new dataframe with only the 'Close column
data = df.filter(['Close'])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = int(np.ceil( len(dataset) * .8 ))

training_data_len

In [None]:
#Scale the data
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)

scaled_data[0:5]

* LSTMs expect our data to be in a specific format, usually a 3D array. We start by creating data in 60 timesteps and converting it into an array using NumPy. Next, we convert the data into a 3D dimension array with X_train samples, 60 timestamps, and one feature at each step.

In [None]:
#Create the training data set
#Create the scaled training data set
train_data = scaled_data[0:int(training_data_len), :]
#Split the data into x_train and y_train data sets
x_train = []
y_train = []

for i in range(60, len(train_data)):
    x_train.append(train_data[i-60:i, 0])
    y_train.append(train_data[i, 0])
    if i<= 61:
        print(x_train)
        print(y_train)
        print()
        
# Convert the x_train and y_train to numpy arrays 
x_train, y_train = np.array(x_train), np.array(y_train)

#Reshape the data
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
# x_train.shape

In [None]:
#Create the testing data set
#Create a new array containing scaled values from index 1543 to 2002 
test_data = scaled_data[training_data_len - 60: , :]
#Create the data sets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
    x_test.append(test_data[i-60:i, 0])
    
# Convert the data to a numpy array
x_test = np.array(x_test)

# Reshape the data
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 ))


# **Creating LSTM Model**

In [None]:
from keras.models import Sequential
from keras.layers import Dense, LSTM

#Build the LSTM model
model_lstm = Sequential()
model_lstm.add(LSTM(64, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model_lstm.add(LSTM(64, return_sequences= False))
model_lstm.add(Dense(32))
model_lstm.add(Dense(1))

# Compile the model
model_lstm.compile(optimizer='adam', loss='mean_squared_error')

#Train the model
model_lstm.fit(x_train, y_train, batch_size=20, epochs=20)

## **Creating RNN model**

In [None]:
# importing libraries
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import SimpleRNN
from keras.layers import Dropout

# initializing the RNN
model_rnn = Sequential()

# adding first RNN layer and dropout regulatization
model_rnn.add(SimpleRNN(units = 50,activation = "tanh", return_sequences = True,input_shape = (x_train.shape[1],1)))
model_rnn.add(Dropout(0.2))
# adding second RNN layer and dropout regulatization
model_rnn.add(SimpleRNN(units = 50, activation = "tanh", return_sequences = True))
model_rnn.add(Dropout(0.2))
# adding third RNN layer and dropout regulatization
model_rnn.add(SimpleRNN(units = 50,activation = "tanh", return_sequences = False))
model_rnn.add(Dropout(0.2))
# adding the output layer
model_rnn.add(Dense(units = 1))
# compiling RNN
model_rnn.compile(optimizer = "adam", loss = "mean_squared_error")
# fitting the RNN
model_rnn.fit(x_train, y_train, epochs = 100, batch_size = 32)

In [None]:
# Get the LSTM model predicted price values 
predictions_lstm = model_lstm.predict(x_test)
predictions_lstm = scaler.inverse_transform(predictions_lstm)

In [None]:
# Get the RNN models predicted price values 
predictions_rnn = model_rnn.predict(x_test)
predictions_rnn = scaler.inverse_transform(predictions_rnn)

## **Models Scores**

In [None]:
from sklearn import metrics

# Get the root mean squared error (RMSE)
mse_lstm = metrics.mean_squared_error(y_test, predictions_lstm)
rmse_lstm = np.sqrt(mse_lstm)

print("LSTM Model RMSE: ",rmse_lstm)
# Get r2 score
r2_lstm = metrics.r2_score(y_test, predictions_lstm)
print("LSTM Model r2: ",r2_lstm)

In [None]:
# Get the root mean squared error (RMSE)
mse_rnn = metrics.mean_squared_error(y_test, predictions_rnn)
rmse_rnn = np.sqrt(mse_rnn)

print("RNN Model RMSE: ",rmse_rnn)
# Get r2 score
r2_rnn = metrics.r2_score(y_test, predictions_rnn)
print("RNN Model r2: ",r2_rnn)

## **Plotting Train Data, Validation Data and Predictions RNN Model**

In [None]:
# Plot the data
train = data[:training_data_len]
valid = data[training_data_len:]
valid['Predictions_RNN'] = predictions_rnn
# Visualize the data
plt.figure(figsize=(16,8))
plt.title('Model')
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price USD ($)', fontsize=18)
plt.plot(train['Close'])
plt.plot(valid[['Close', 'Predictions_RNN']])
plt.legend(['Train', 'Val', 'Predictions_RNN'], loc='lower right')
plt.show()

In [None]:
#Show the valid and predicted prices
valid[0:10]

## **Plotting Train Data, Validation Data and Predictions LSTM Model**

In [None]:
# Plot the data
train = data[:training_data_len]
valid = data[training_data_len:]
valid['Predictions_LSTM'] = predictions_lstm
# Visualize the data
plt.figure(figsize=(16,8))
plt.title('Model')
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price USD ($)', fontsize=18)
plt.plot(train['Close'])
plt.plot(valid[['Close', 'Predictions_LSTM']])
plt.legend(['Train', 'Val', 'Predictions_LSTM'], loc='lower right')
plt.show()

In [None]:
#Show the valid and predicted prices
valid[0:10]

# **Sentiment Analysis of Indian News Headlines**

### Loading Textual Data

In [None]:
ndf = pd.read_csv('../input/tsf-datasets/india-news-headlines.csv', parse_dates=[0], infer_datetime_format=True,error_bad_lines=False,usecols =["publish_date","headline_text"])
ndf = ndf.rename(columns={"publish_date": "Date"})
ndf.head()

In [None]:
ndf.tail()

In [None]:
ndf.info()

In [None]:
start_date = pd.to_datetime('2019-06-30')
end_date = pd.to_datetime('2020-06-30')
ndf=ndf.loc[(ndf['Date'] > start_date) & (ndf['Date'] < end_date)]

In [None]:
ndf=ndf.reset_index()

In [None]:
ndf=ndf.drop("index",axis=1)

In [None]:
ndf.head()

In [None]:
ndf.tail()

In [None]:
# Dropping duplicates by grouping the same dates.
ndf['headline_text'] = ndf.groupby(['Date']).transform(lambda x : ' '.join(x)) 
ndf = ndf.drop_duplicates() 
ndf.reset_index(inplace = True, drop = True)

In [None]:
ndf.head()

### Data Pre-Processing

In [None]:
# uppercase-lowercase conversion
ndf['headline_text'] = ndf['headline_text'].apply(lambda x: " ".join(x.lower() for x in x.split()))

In [None]:
# punctuation
ndf['headline_text'] = ndf['headline_text'].str.replace('[^\w\s]','')


In [None]:
# numbers
ndf['headline_text'] = ndf['headline_text'].str.replace('\d','')


In [None]:
#!pip install nltk

In [None]:
#stopwords
import nltk
nltk.download('stopwords')

In [None]:
from nltk.corpus import stopwords
sw = stopwords.words('english')
ndf['headline_text'] = ndf['headline_text'].apply(lambda x: " ".join(x for x in x.split() if x not in sw))


In [None]:
## Deletion of sparse.
delete = pd.Series(' '.join(ndf['headline_text']).split()).value_counts()[-1000:]
ndf['headline_text'] = ndf['headline_text'].apply(lambda x: " ".join(x for x in x.split() if x not in delete))


In [None]:
nltk.download('wordnet')

In [None]:
#!pip install textblob

In [None]:
#lemmatisation
from textblob import Word

ndf['headline_text'] = ndf['headline_text'].apply(lambda x: " ".join([Word(word).lemmatize() for word in x.split()])) 

In [None]:
ndf.info()

In [None]:
ndf.head()

In [None]:
ndf.tail()

In [None]:
ndf['headline_text'][0:10]

In [None]:
ndf[ndf['headline_text'].duplicated(keep=False)].sort_values('headline_text').head(8)

## Sentiment Analysis

In [None]:
from textblob import TextBlob

In [None]:
#Functions to get the subjectivity and polarity
def getSubjectivity(text):
    return TextBlob(text).sentiment.subjectivity

def getPolarity(text):
    return  TextBlob(text).sentiment.polarity

In [None]:
ndf['Subjectivity'] = ndf['headline_text'].apply(getSubjectivity)
ndf['Polarity'] = ndf['headline_text'].apply(getPolarity)
ndf.head()

In [None]:
plt.figure(figsize = (10,6))
ndf['Polarity'].hist(color = 'purple')

In [None]:
plt.figure(figsize = (10,6))
ndf['Subjectivity'].hist(color = 'blue')

In [None]:
nltk.download('vader_lexicon')

In [None]:
#Adding sentiment score to ndf by using SentimentIntensityAnalyzer
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()

ndf['Compound'] = [sia.polarity_scores(v)['compound'] for v in ndf['headline_text']]
ndf['Negative'] = [sia.polarity_scores(v)['neg'] for v in ndf['headline_text']]
ndf['Neutral'] = [sia.polarity_scores(v)['neu'] for v in ndf['headline_text']]
ndf['Positive'] = [sia.polarity_scores(v)['pos'] for v in ndf['headline_text']]
ndf[0:5]

In [None]:
df_merge = pd.merge(df, ndf, how='inner', on='Date')
df_merge.head()

In [None]:
df_merge.tail()

# **Hybrid model Analysis**

In [None]:
df_final = df_merge[['Close','Subjectivity', 'Polarity', 'Compound', 'Negative', 'Neutral' ,'Positive']]
df_final

### Feature Scaling using MinMaxScaler

In [None]:
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler()
df_scaled = pd.DataFrame(sc.fit_transform(df_final))
df_scaled.columns = df_final.columns
df_scaled.index = df_final.index
df_scaled.head()

## Create Hybrid Model

In [None]:
X=df_scaled.drop("Close",axis=1)
y=df_scaled["Close"]

### Train-Test 

In [None]:
# shuffle and split training and test sets

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)


In [None]:
from sklearn.ensemble import RandomForestRegressor, AdaBoostRegressor
from sklearn import metrics

### Random Forest

In [None]:
## Using Random Forest Regression
rf = RandomForestRegressor()
rf.fit(X_train, y_train)
y_pred_rf=rf.predict(X_test)

mse_rf = metrics.mean_squared_error(y_test, y_pred_rf)
rmse_rf = np.sqrt(mse_rf)

print("Random Forest Model RMSE: ",rmse_rf)

### Xgboost

In [None]:
# !pip install xgboost

In [None]:
from xgboost import XGBRegressor
xgb = XGBRegressor(colsample_bytree = 0.6, 
                         learning_rate = 0.01, 
                         max_depth = 2, 
                         n_estimators = 1000) 
xgb.fit(X_train, y_train)
y_pred_xgb=xgb.predict(X_test)

mse_xgb = metrics.mean_squared_error(y_test, y_pred_xgb)
rmse_xgb = np.sqrt(mse_xgb)

print("XGB Model RMSE: ",rmse_xgb)

### Light GBM

In [None]:
# !pip install lightgbm

In [None]:
from lightgbm import LGBMRegressor
lgbm = LGBMRegressor(learning_rate = 0.1, 
                           max_depth = 8, 
                           n_estimators = 50,
                           colsample_bytree = 0.4,
                           num_leaves = 10)
lgbm.fit(X_train, y_train)
y_pred_lgbm=lgbm.predict(X_test)

mse_lgbm = metrics.mean_squared_error(y_test, y_pred_lgbm)
rmse_lgbm = np.sqrt(mse_lgbm)

print("Light GBM Model RMSE: ",rmse_lgbm)

### Catboost

In [None]:
# !pip install catboost

In [None]:
from catboost import CatBoostRegressor 
catb = CatBoostRegressor(iterations = 500, 
                               learning_rate = 0.1, 
                               depth = 5)
catb.fit(X_train, y_train)
y_pred_cat=catb.predict(X_test)


In [None]:
mse_cat = metrics.mean_squared_error(y_test, y_pred_cat)
rmse_cat = np.sqrt(mse_cat)

print("Catboost Model RMSE: ",rmse_cat)

### AdaBoost

In [None]:
## Using AdaBoostRegressor
adb = AdaBoostRegressor()
adb.fit(X_train, y_train)
y_pred_adb=adb.predict(X_test)

mse_adb = metrics.mean_squared_error(y_test, y_pred_adb)
rmse_adb = np.sqrt(mse_adb)

print("AdaBoost Model RMSE: ",rmse_adb)