<div style="margin-bottom:110px">
    <a href="https://www.ismt.edu.np/">
        <img src="./docs/ismt.png" alt="ismt College"  height="100" align="left">
    </a>
    <a href="https://www.sunderland.ac.uk/">
        <img src="./docs/sunderland.png" alt="University of Sunderland" align="right" height="100" >
    </a>
    <div align="center"><h3><b>Stock Price Prediction Using Machine Learning Algorithms</b></h3><p><b><a href="https://github.com/itSubeDibesh">Dibesh Raj Subedi</a></b></p></div>
</div>

# **Stock Price Prediction Using Machine Learning Algorithms**
- **Student Name:** [Dibesh Raj Subedi](https://github.com/itSubeDibesh) 
- **Student ID:** 219327253 
- **Module Name:** Artificial Intelligence 
- **Module Code:** CET313 
- **Module Leader / Module Tutor:** [Mr. Himalayan Kashyapati](https://www.youtube.com/channel/UCxOGD9bX_533jPWXfz8smlQ) 
- **Center:** [ISMT College](https://www.ismt.edu.in/) 
- **Programme:** BSC. (Hons) Computer Systems Engineering 
- **Project:** Stock Price Prediction Using Machine Learning Algorithms 

[![wakatime](https://wakatime.com/badge/github/itSubeDibesh/StockPricePredection.svg)](https://wakatime.com/badge/github/itSubeDibesh/StockPricePredection)


## ***Table of Contents***

- Introduction
- Package Setups and Imports
     - Dependencies Installation
     - Import Libraries
- Data Extraction
    - Data Extraction Function
        - From NEPSE
        - From SmartWealthPro
    - Data Mining
        - Dataframe head from NEPSE
        - Dataframe head from SmartWealthPro
            - Dataframe head of listed companies
            - Dataframe head of selected companies
- Stock Price Prediction 
    - Using Linear Regression
    - Using Random Forest
    - Using LSTM
- Comparison of models
- Evaluation of models
    - Accuracy
    - Confusion Matrix
    - Classification Report
    - ROC Curve
    - Precision-Recall Curve
    - ROC-AUC Curve
    - Precision-Recall-AUC Curve
- Conclusion


## Introduction

**Stock** or **Equity** a part of company plays a **significant** role in the market, reflecting the value of company and also **stating** a source of income for it's investors. Price of share or equity is affected by many factors such as market conditions, economic conditions, government policies, etc. Stock Market and Stock price prediction has been a **lucrative** subject of study for decades. Although there are several factors affecting the price of share or equity, we can observe **several patterns** over long period of time creating an **opportunity of investment**. Being a passive investor myself, the objective of this project is to **predict** the stock price of a company using **Machine Learning** techniques and explore possibilities of using different ML models.

## Package Setups and Imports

Installing Dependencies and Importing Libraries Web Scraping 🕸, Data Processing📈 and Data Visualization📊.


### Dependencies Installation 📦🛠

In [1]:
# Data Extraction
%pip install beautifulsoup4
%pip install requests
%pip install urllib3
%pip install html5lib
# Data Manipulation
%pip install numpy
%pip install pandas
# Data Visualization
%pip install matplotlib
# Machine Learning
%pip install sklearn

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.



### Import Libraries ⬇📦

In [2]:
# Data Extraction
from bs4 import BeautifulSoup as BS
import requests
import urllib3
import json
# Data Manipulation
import numpy as np
import pandas as pd
# File Handling
import os
# Data Visualization
import matplotlib.pyplot as plt
import datetime 
# Machine Learning
from sklearn import model_selection
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression

## Data Extraction

To implement ML modules we need dataset and I will be using **Nepal Stock Exchange**([NEPSE](http://www.nepalstock.com)) and [SmartWealthPro](https://app.smartwealthpro.com) website to scrape dataset.

Useful request types to make web and api requests for data extraction

In [3]:
# Request List
REQUESTS = ["GET", "POST"]

# Request type dictionary
REQUEST_TYPE = {
    "GET": REQUESTS[0],
    "POST": REQUESTS[1]
}


### Data Extraction Functions 📃🕸

To extract/mine data I have defined functions to scrape and invoke intercepted api request's from [NEPSE](http://www.nepalstock.com) and [SmartWealthPro](https://app.smartwealthpro.com).


#### From [NEPSE](http://www.nepalstock.com)
After reading [Quassarian Viper](https://q-viper.github.io/2020/11/21/deploying-nepse-data-visualizer-on-heroku/)'s post I found an data extraction method from  [Nepse](http://www.nepalstock.com) website and created functions respectively. 

In [4]:

def nepse_company_names(save_to_csv: bool = False) -> pd.DataFrame:
    """Extracts all the company names from NEPSE website

    Args:
        save_to_csv: Save the data to csv file - Ex: True

    Return:
        DataFrame of the company names

    """
    http = urllib3.PoolManager()
    http.headers.update({'User-Agent': 'Mozilla/5.0'})
    web_page = http.request(
        method=REQUEST_TYPE["GET"],
        url="http://www.nepalstock.com/company?_limit=500"
    )
    soup = BS(web_page.data, 'html5lib')
    table = soup.find('table')
    company = []
    rows = [row.findAll('td') for row in table.findAll('tr')[1:-2]]
    col = 0
    notfirstrun = False
    for row in rows:
        companydata = []
        for data in row:
            if col == 5 and notfirstrun:
                companydata.append(data.a.get('href').split('/')[-1])
            else:
                companydata.append(data.text.strip())
            col += 1
        company.append(companydata)
        col = 0
        notfirstrun = True

    df = pd.DataFrame(company[1:], columns=company[0])
    df.rename(columns={'Operations': 'Symbol No'}, inplace=True)
    df.drop(columns='', inplace=True)
    df.drop(columns='S.N.', inplace=True)

    if(save_to_csv):
        path = './NepseDataset'
        if not os.path.exists(path):
            os.makedirs(path)
        df.to_csv(path+'/company_list.csv', index=False)

    return df


#### From [SmartWealthPro](https://app.smartwealthpro.com/)

Although [Data Extraction From Nepse Website](#data-extraction-from-nepalipaisa-website-📃) didn't helped [only for cross validating SmartWealthPro NepseId with Nepse symbol No] much, I found a way to extract data from **paid application**. Firstly, I registered for **Free Trial** on [SmartWealthPro](https://app.smartwealthpro.com) and used [Postman](https://www.postman.com/)'s **Interceptor** to intercept request and response from browser using [Postman](https://www.postman.com/)'s [Interceptor Plugin](https://chrome.google.com/webstore/detail/postman-interceptor/aicmkgpgakddgnaphhhpliifpcfhicfo) and extracted useful URL's along with cookies and imported the request code from postman to python and created respective function which would also store dataset on local drive as csv file.


In [5]:
COOKIE = ".AspNetCore.Antiforgery.c2MYB_mZqSE=CfDJ8PqVbBoQk65Ji_IQLqiq9W-fOAB5lXkNnLsOdVX_JuYMqnfZdsfQpxhsB_koNdcHbjTS7EKhCsR3Ba0H5eJRCxJYUuF2L2XjMEEsr_lEk0vHcjJbv49IaArGuxzUyEmYBJOaL7GPu4btgEc-6lC39M8; .AspNetCore.Identity.Application=CfDJ8PqVbBoQk65Ji_IQLqiq9W-MNL-VXNllti2wpkpsv9cA8u0kKTKvIGBYkW90A7ni7DpinfIjU2u4puqWHy6fidF0JFbXquKH6q_0wQDTCpB7htmSMnHIkEDijflIUm5tb7zwvETb5wxZ35xV6Q_EjusY0yFyLa-n6ogBJ0j_HDSr2NhyooqZ-dTVOSDyyAi327YiZ_decvG2QFdUOJKUY2MYGwTP3RDhoPDgJyBENNjxgi2QhpnIwGrcmKImFNK2TBpiewXgp_WUJqoHnQFJ5yHCjKW-D59ErSmauHSAXvMnidOAe9BUXVlY3FOgdoEESu4GZAYMMVyByQA8Kc5v8b-MEetPqhqBPTkezn20BaEAG9E5TfCowQmdWzlOYbiDdjlx8JKdHM-TE5gFBtTagEv21qMi83t8tXZP11HF2vZ12zap620XeyDuRS0R8KItA0Lkwc_iUhPPVIiUH9ve61VVNGZD98FCnbGx9DjFlgPX2ONOhz6DBpJp7BWxztF8NRaKzJmnpAMta-ej9ZYw60ONMW2_4r_BEeLvka6_2VOhlI7-O1CopnusQ4oJ11V1wFCX3EfJQ_7vkvKqcsFIV3M3EYWEVD-KY4s8yN9r3tY9RbWnaOpXWSLRr8bE9jRRiddjHmc5o1Ix3zBT66wkpq2Vj8saytK_oHmyM88WPMQmtwZpCuix6wT1BaHHDDaM3putb9FxUEMGrFC6I8A-5IS7p5TIEdUo_nmByI9XsgnKyzD_ACmqhm4MNrRR4r3zAU-8igzRhrT9PptJDe8vIZ-yz6BqqZXYWPyE8kjxTjR2gk7ul4zd0v309RSCLO5FilbVknSwliy1rPIXyVZhCZE"


def smart_wealth_company_list(save_to_csv=False) -> pd.DataFrame:
    """ Retrives List of Company from SmartWealthPro which includes CompanyId Defined by SmartWealthPro and also help's to export as csv file.

    Args:
        save_to_csv: (bool) - Ex : True
    Returns:
        df: (pd.DataFrame)
    """
    url = "https://app.smartwealthpro.com/api/GetAutoCompleteCompanies?_=1648523415665"
    payload = {}
    headers = {
        'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="99", "Microsoft Edge";v="99"',
        'Accept': 'application/json, text/javascript, */*; q=0.01',
        'Content-Type': 'application/json; charset=utf-8',
        'X-Requested-With': 'XMLHttpRequest',
        'sec-ch-ua-mobile': '?0',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36 Edg/99.0.1150.55',
        'sec-ch-ua-platform': '"Windows"',
        'Cookie': COOKIE
    }
    response = requests.request("GET", url, headers=headers, data=payload)
    print(
        f"Status Code: {response.status_code}, Mining ⛏ Status: {'Success' if response.status_code==200 else 'Failed'}")
    if response.status_code == 200:
        SMARTWEALTH_JSON = json.loads(response.text)
        df = pd.DataFrame(SMARTWEALTH_JSON["result"])
        df.drop(columns='type', inplace=True)
        df.rename(columns={'companyId': 'CompanyId', 'nepseCompanyId': 'NepseId',
                           'companyName': 'Company', 'stockSymbol': 'Symbol', 'sector': 'Sector'}, inplace=True)
        if save_to_csv:
            path = "./SmartWealthDataset"
            if not os.path.exists(path):
                os.makedirs(path)
            df.to_csv(path + '/smartwealthpro_company_list.csv', index=False)
        return df
    else:
        return None


def smart_wealth_company_history(companyId: str, startDate: str = "", endDate: str = "", save_to_csv=False) -> pd.DataFrame:
    """ Fetch Company History from SmartWealthPro.

    Args:
        companyId: (str) - Ex : 154
        startDate: (str) - Ex : "2010-01-01" (YYYY-MM-DD)
        endDate: (str) - Ex : "2020-01-01"  (YYYY-MM-DD)
        save_to_csv: (bool) - Ex : True

    Returns:
        df: (pd.DataFrame)
    """
    #
    url = "https://app.smartwealthpro.com/api/GetDailyHistoricalData?type=stock&id="+companyId+"&fromDate=" + \
        startDate+"&toDate="+endDate + \
        "&pageNo=1&itemsPerPage=9000000&pagePerDisplay=5&_=1648522274261"
    payload = {}
    headers = {
        'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="99", "Microsoft Edge";v="99"',
        'Accept': 'application/json, text/javascript, */*; q=0.01',
        'Content-Type': 'application/json; charset=utf-8',
        'X-Requested-With': 'XMLHttpRequest',
        'sec-ch-ua-mobile': '?0',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36 Edg/99.0.1150.55',
        'sec-ch-ua-platform': '"Windows"',
        'Cookie': COOKIE
    }
    response = requests.request("GET", url, headers=headers, data=payload)
    print(
        f"Status Code: {response.status_code}, Mining ⛏ Status: {'Success' if response.status_code==200 else 'Failed'}")
    if response.status_code == 200:
        SMARTWEALTH_JSON = json.loads(response.text)
        df = pd.DataFrame(SMARTWEALTH_JSON["result"]['data'])
        df.drop(columns='sNo', inplace=True)
        df.rename(columns={'tradeDate': 'Date', 'open': 'Open',
                  'high': 'High', 'low': 'Low', 'close': 'Close'}, inplace=True)
        df.insert(
            0, 'Symbol', SMARTWEALTH_JSON["result"]['summary']['stockSymbol'])
        if save_to_csv:
            path = "./SmartWealthDataset/Company"
            if not os.path.exists(path):
                os.makedirs(path)
            df.to_csv(path+'/smartwealthpro_' +
                      SMARTWEALTH_JSON["result"]['summary']['stockSymbol']+'_history.csv', index=False)
        return df
    else:
        return None


def smart_wealth_company_code(symbol: str) -> str:
    """
    Returns CompanyCode as per SmartWealthPro using stock symbol.

    Args:
        symbol: (str) - Ex : "AHPC"
    Returns:
        companyCode: (str) - Ex : "154"
    """
    cvs_file = pd.read_csv(
        './SmartWealthDataset/smartwealthpro_company_list.csv')
    return cvs_file[cvs_file['Symbol'] == symbol]['CompanyId'].values[0].__str__()


### Data Mining ⛏

After defining function to extract data from **NEPSE** and **SmartWealthPro** websites, I invoked functions and extracted dataset and also stored it on csv file locally for data storage and easy manipulation.

#### Dataframe head from [Nepse](http://www.nepalstock.com)

In [6]:
nepse_company_names(save_to_csv=True).head()

Unnamed: 0,Stock Name,Stock Symbol,Sector,Symbol No
0,10 % NMB DEBENTURE 2085,NMBD2085,Corporate Debenture,2850
1,10% Himalayan Bank Debenture 2083,HBLD83,Corporate Debenture,2873
2,10% Laxmi Bank Debenture 2086,LBLD86,Corporate Debenture,2879
3,10% Nabil Debenture 2082,NBLD82,Corporate Debenture,2892
4,10% Nepal SBI Bank Debenture 2086,SBIBD86,Corporate Debenture,2890


#### DataFrame head from [ SmartWealthPro](#data-extraction-from-smartwealthprohttpsappsmartwealthprocom-website-📃)

In [7]:
# NABIL is one of the top Banks of Nepal, 
# NEPAL Life Insurance Company(NLIC) is one of the leading Life Insurance company and 
# Citizen Investment Trust(CIT) is an government company with a good market share.
selected_company_symbol = ["NABIL", "NLIC", "CIT"] 
starting_date = "2000-01-01"
ending_date = "2022-12-31"

##### Dataframe head of listed companies

In [8]:
smart_wealth_company_list(save_to_csv=True).head()

Status Code: 200, Mining ⛏ Status: Success


Unnamed: 0,CompanyId,NepseId,Company,Symbol,Sector
0,596,2790,Aarambha Chautari Laghubitta Bittiya Sanstha L...,ACLBSL,Microfinance
1,259,2845,Adhikhola Laghubitta Bittiya Sanstha Limited,AKBSL,Microfinance
2,2,397,Agriculture Development Bank Limited,ADBL,Commercial Banks
3,583,2893,Ajod Insurance Limited,AIL,Non Life Insurance
4,151,2788,Ankhu Khola Jalvidhyut Company Ltd,AKJCL,Hydro Power


##### Dataframe head of selected companies

ℹ️ **NOTE:** Data Fetched from **SmartWealthPro** website is on descending ***Date*** so, I reversed the order of the dataframe to get ascending order..

In [9]:
for company in selected_company_symbol:
    smart_wealth_company_history(companyId=smart_wealth_company_code(company),startDate=starting_date,endDate=ending_date,save_to_csv=True)

Status Code: 200, Mining ⛏ Status: Success
Status Code: 200, Mining ⛏ Status: Success
Status Code: 200, Mining ⛏ Status: Success


In [10]:
CIT = pd.read_csv('./SmartWealthDataset/Company/smartwealthpro_CIT_history.csv')[::-1].reset_index(drop=True)
CIT.head()

Unnamed: 0,Symbol,Date,Open,High,Low,Close
0,CIT,2010-04-20,760.0,751.0,750.0,751.0
1,CIT,2010-04-21,751.0,736.0,736.0,736.0
2,CIT,2010-04-22,736.0,722.0,722.0,722.0
3,CIT,2010-04-26,722.0,708.0,655.0,655.0
4,CIT,2010-04-28,655.0,650.0,650.0,650.0


In [11]:
NABIL = pd.read_csv(
    './SmartWealthDataset/Company/smartwealthpro_NABIL_history.csv')[::-1].reset_index(drop=True)
NABIL.head()


Unnamed: 0,Symbol,Date,Open,High,Low,Close
0,NABIL,2010-04-15,2040.0,2040.0,2000.0,2000.0
1,NABIL,2010-04-15,2040.0,2040.0,2000.0,2000.0
2,NABIL,2010-04-15,2040.0,2040.0,2000.0,2000.0
3,NABIL,2010-04-19,2080.0,2039.0,2000.0,2001.0
4,NABIL,2010-04-20,2001.0,1975.0,1945.0,1945.0


In [12]:
NLIC = pd.read_csv(
    './SmartWealthDataset/Company/smartwealthpro_NLIC_history.csv')[::-1].reset_index(drop=True)
NLIC.head()


Unnamed: 0,Symbol,Date,Open,High,Low,Close
0,NLIC,2010-05-23,980.0,961.0,961.0,961.0
1,NLIC,2010-06-03,961.0,942.0,942.0,942.0
2,NLIC,2010-06-14,942.0,924.0,871.0,871.0
3,NLIC,2010-07-12,871.0,854.0,850.0,850.0
4,NLIC,2010-07-25,850.0,833.0,770.0,770.0


## Stock Price Prediction

### Using Linear Regression

In [13]:
# Supervised Learning Problem -> Regression (Based on Continuous Data Set) and Classification(Characterstics)
# 

# Problem Statement -> I don't information whether the stock price will go up or down.

### Using Random Forest

### Using LSTM

## Comparison of Models

## Evaluation of Models

### Accuracy

### Confusion Matrix

### Classification Report

### ROC Curve

### Precision-Recall Curve

### ROC-AUC Curve

### Precision-Recall-AUC Curve

## Conclusion