# End of the Day Stock data (EODData) - Web Scraping Project 

Data Source : [End of the Day Stock data](http://eoddata.com/stocklist/TSX/A.htm)
![](https://imgur.com/ONhEMwe.jpg)

### About End of the Day Stock data

EODData is a leading provider of quality historical market data with easy to use download facilities at exceptional prices. Daily updates containing end of day quotes and intraday 1-minute bars can be downloaded automatically each day.
Extensive, easy to access and affordable.

The website also have a variety of servers that are dedicated to finding and correcting the numerous errors that stock exchanges produce. All of our historical data has been carefully screened and adjusted for splits.

![](https://imgur.com/Vxph50u.jpg)

### Project Idea

As part of this project, we will parse through the EODData website to get the details for Toronto Stock Exchange information.

We will retrieve information from the page **’Toronto Stock Exchange’** using _web scraping_: a process of extracting information from a website programmatically. For this specific project we will be scraping stocks starting with Alphabets A to H.

### Project Goal

The project goal is to build a web scraper that withdraws stock information and assemble them into a single CSV. The format of the output CSV file is shown below:

|#|Code|Name|High|Low|Close|Volume|Stock Page URL
|-|----------|-------|---------------|-----|------|-----------------|-----------
|1|AAB|Aberdeen International Inc|0.1400|0.1350|0.1400|13138|http://eoddata.com/stockquote/TSX/AAB.htm
|2|AAV|Advantage Oil & Gas Ltd|6.370|6.130|6.360|684302|http://eoddata.com/stockquote/TSX/AAV.htm

### Project steps
Here is an outline of the steps we'll follow :

1. Download the webpage using `requests`
2. Parse the HTML source code using `BeautifulSoup` library and extract the desired infromation
3. Building the scraper components
4. Compile the extracted information into Python list and dictionaries
5. Converting the python dictionaries into `Pandas DataFrames`
5. Write information to the final CSV file
7. Future work and references


### How to run the code

This tutorial is an executable [Jupyter notebook](https://jupyter.org) hosted on [Jovian](https://www.jovian.ai). You can _run_ this tutorial and experiment with the code examples in a couple of ways: *using free online resources* (recommended) or *on your computer*.

#### Option 1: Running using free online resources (1-click, recommended)

The easiest way to start executing the code is to click the **Run** button at the top of this page and select **Run on Binder**. You can also select "Run on Colab" or "Run on Kaggle", but you'll need to create an account on [Google Colab](https://colab.research.google.com) or [Kaggle](https://kaggle.com) to use these platforms.


#### Option 2: Running on your computer locally

To run the code on your computer locally, you'll need to set up [Python](https://www.python.org), download the notebook and install the required libraries. We recommend using the [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) distribution of Python. Click the **Run** button at the top of this page, select the **Run Locally** option, and follow the instructions.

>  **Jupyter Notebooks**: This tutorial is a [Jupyter notebook](https://jupyter.org) - a document made of _cells_. Each cell can contain code written in Python or explanations in plain English. You can execute code cells and view the results, e.g., numbers, messages, graphs, tables, files, etc., instantly within the notebook. Jupyter is a powerful platform for experimentation and analysis. Don't be afraid to mess around with the code & break things - you'll learn a lot by encountering and fixing errors. You can use the "Kernel > Restart & Clear Output" menu option to clear all outputs and start again from the top.

## Lets start with scraping

>Note : We will use the `Jovian` library and its `commit()` function throughout the code to save our progress as we move along.

In [3]:
!pip install jovian --upgrade --quiet
import jovian
# Execute this to save new versions of the notebook
#jovian.commit(project="final-web-scraping-project")

## Use the requests library to download web pages



In [4]:
!pip install requests --quiet --upgrade
import requests

#### **requests.get()**

In order to **download a web page**, we use `requests.get()` to **send the HTTP request** to the **IMDB server** and what the function returns is a **response object**, which is **the HTTP response**. 

In [6]:
home_url = 'http://eoddata.com/stocklist/TSX/A.htm'   #The URL Address of the webpage we will scrape, i.e. Stocks starting from A
response = requests.get(home_url)      #requests.get()


![](https://imgur.com/1o1s92P.jpg)

#### **Status code**

Now, we have to `check` if we succesfully send the HTTP request and get a HTTP response back on purpose. This is because we're NOT using browsers, because of which we can't get `the feedback` directly if we didn't send HTTP requests successfully.

In general, the method to check out if the server sended a HTTP response back is the **status code**. In `requests` library, `requests.get` returns a response object, which containing the page contents and the information about status code indicating if the HTTP request was successful. Learn more about HTTP status codes here: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status.


If the request was successful, `response.status_code` is set to a value between **200 and 299**.

In [8]:
response.status_code  

200

The HTTP response contains HTML that is ready to be displayed in browser. Here we can use `response.text` to retrive the HTML document.

In [9]:
page_contents = response.text
len(page_contents) 

115987

In [28]:
page_contents[:1000] 

'\r\n\r\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\r\n<html xmlns="http://www.w3.org/1999/xhtml">\r\n<head><link rel="stylesheet" href="../../styles/jquery-ui-1.10.0.custom.min.css" type="text/css" /><link rel="stylesheet" href="../../styles/main.css" type="text/css" /><link rel="stylesheet" href="../../styles/button.css" type="text/css" /><link rel="stylesheet" href="../../styles/nav.css" type="text/css" />\r\n  <script src="/scripts/jquery-1.9.0.min.js" type="text/javascript"></script>\r\n  <script src="/scripts/jquery-ui-1.10.0.custom.min.js" type="text/javascript"></script>\r\n\t<script type="text/javascript">\t\tvar _sf_startpt = (new Date()).getTime()</script>\r\n\t<script src="https://js.stripe.com/v3/" type="text/javascript"></script>\r\n\t\r\n\t<script type="text/javascript" src="scripts/jquery-1.4.2.min.js"></script>\r\n<meta name="keywords" content="list of symbols for Toronto Stock Exchange,list

- Above is the source code of the web page. It is written in a language called HTML. 
- It defines and display the content and structure of the web page by the help of the browsers like Chrome

In [10]:
#jovian.commit()

## Parse the HTML source code using Beautiful Soup library


>### What is Beautiful Soup?

>Beautiful Soup is **a Python package** for **parsing HTML and XML documents**. Beautiful Soup enables us to get data out of sequences of characters. It creates a parse tree for parsed pages that can be used to extract data from HTML. It's a handy tool when it comes to web scraping. You can read more on their documentation site. https://www.crummy.com/software/BeautifulSoup/bs4/doc/#getting-help


In [12]:
!pip install beautifulsoup4 --quiet --upgrade
from bs4 import BeautifulSoup
doc = BeautifulSoup(page_contents, 'html.parser')

In [13]:
type(doc)

bs4.BeautifulSoup

### Inspecting the HTML source code of a web page


>In Beautiful Soup library, we can specify `html.parser` to ask Python to read components of the page, instead of reading it as a long string. 

>### What is HTML?
>The HyperText Markup Language, or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets and scripting languages such as JavaScript.

![](https://imgur.com/CA9wgTE.jpg)

#### **An HTML tag comprises of three parts:**

1. **Name**: (`html`, `head`, `body`, `div`, etc.) Indicates what the tag represents and how a browser should interpret the information inside it.
2. **Attributes**: (`href`, `target`, `class`, `id`, etc.) Properties of tag used by the browser to customize how a tag is displayed and decide what happens on user interactions.
3. **Children**: A tag can contain some text or other tags or both between the opening and closing segments, e.g., `<div>Some content</div>`.


### Common tags and attributes

#### **Tags in HTML**

There are around 100 types of HTML tags but on a day to day basis, around 15 to 20 of them are the most common use, such as `<div>` tag, `<p>` tag, `<section>` tag, `<img>` tag, `<a>` tags.


Of many tags, I wanted to highlight **`<a>` tag**, which  can contain attributes such as `href` (hyperlink reference), because `<a>` tag allows users to click and they would be directed to another site. That's why the name of `<a>` tag is  **anchor**.

#### **Attributes**

Each tag supports several attributes. Following are some common attributes used to modify the behavior of tags

* `id`
* `style`
* `class`
* `href` (used with `<a>`)
* `src` (used with `<img>`)

`What we can do with **a BeautifulSoup object** is to get **a specifc types of a tag in HTML** by calling the name of a tag, as shown in code cell below.`

Here, we use the `find()` function of BeautifulSoup to find the first `<title>` tag in the HTML document and display its content

In [14]:
title = doc.find('title')
title

<title>
	List of Symbols for Toronto Stock Exchange [TSX] Starting with A
</title>

### Inspecting HTML in the Browser

>To view the **source code** of any webpage right within **your browser**, you can **right click** anywhere on a page and **select** the **"Inspect"** option. You access the **"Developer Tools"** mode, where you can see the source code as **a tree**. You can expand and collapse various nodes and find the source code for a specific portion of the page

![](https://imgur.com/J6fZBAU.png)


As shown in the photo above, I've cursored over one of the Stock to display how the entire content was presented. 
I found out that each `stock` was present inside the `<tr>` tag.

Since I've pulled a single page and return to a BeautifulSoup object, we can start to use some function from Beautiful Soup library to withdraw the piece of information we want.

#### Here we get the main tr tag for complete stock information. Note we have alternate stocks so getting both

In [19]:
tr_parent1 = doc.find_all('tr',{'class':'ro'}) 
tr_parent2 = doc.find_all('tr',{'class':'re'})

#### Looks like we have around 127 records for stocks starting with 'A'

In [20]:
len(tr_parent1) + len(tr_parent2)


127

#### Let's get the indivdual td for the first stock which has all the information required

In [21]:
td_child1 = tr_parent1[0].find_all('td')


In [22]:
td_child1


[<td><a href="/stockquote/TSX/AAB.htm" title="Display Quote &amp; Chart for TSX,AAB">AAB</a></td>,
 <td>Aberdeen International Inc</td>,
 <td align="right">0.0500</td>,
 <td align="right">0.0450</td>,
 <td align="right">0.0450</td>,
 <td align="right">177,000</td>,
 <td align="right">0.0000</td>,
 <td align="center"><img src="/images/nc.gif"/></td>,
 <td align="left">0.00</td>,
 <td align="right"><a href="/stockquote/TSX/AAB.htm" title="Download Data for TSX,AAB"><img height="14" src="/images/dl.gif" width="14"/></a> <a href="/stockquote/TSX/AAB.htm" title="View Quote and Chart for TSX,AAB"><img height="14" src="/images/chart.gif" width="14"/></a></td>]

### Getting Single Stocks Information

#### Symbol

In [23]:
symbol = td_child1[0].find('a').text.strip()

#### Name

In [24]:
name = td_child1[1].text.strip()

#### High Value

In [27]:
high = td_child1[2].text.strip()

#### Low Value 

In [28]:
low = td_child1[3].text.strip()

#### Closing Value

In [29]:
close = td_child1[4].text.strip()

#### Total Volume

In [30]:
volume = td_child1[5].text.strip().replace(',', '') # Here we remove the comma

#### URL Of Stocks

In [32]:
url = "http://eoddata.com/" + td_child1[0].find('a')['href'] # Here we append the base url

#### Printing all the values 

In [33]:
print("Symbol:", format(symbol))
print("Name:", format(name))
print("High:", format(high))
print("Low:", format(low))
print("Volume:", format(volume))
print("URL:", format(url))

Symbol: AAB
Name: Aberdeen International Inc
High: 0.0500
Low: 0.0450
Volume: 177000
URL: http://eoddata.com//stockquote/TSX/AAB.htm


## Creating the  function with all the information


In [45]:
def parse_document(tr_tag):
    
    td_tag = tr_tag.find_all('td')
    symbol = td_tag[0].find('a').text.strip()
    name = td_tag[1].text.strip()
    high = td_tag[2].text.strip()
    low = td_tag[3].text.strip()
    close = td_tag[4].text.strip()
    volume = td_tag[5].text.strip().replace(',', '')
    url = "http://eoddata.com/" + td_child1[0].find('a')['href']
    
    print("Symbol:", format(symbol))
    print("Name:", format(name))
    print("High:", format(high))
    print("Low:", format(low))
    print("Volume:", format(volume))
    print("URL:", format(url))
    

### Let's test the function by for specific stock

In [46]:
parse_document(tr_parent1[5])

Symbol: ACB.WS.U
Name: Aurora Cannabis Inc Ws USD
High: 0.1500
Low: 0.1500
Volume: 5000
URL: http://eoddata.com//stockquote/TSX/AAB.htm


In [47]:
parse_document(tr_parent1[11])

Symbol: ADCO.WT
Name: Adcore Inc Wts
High: 0.0050
Low: 0.0050
Volume: 10000
URL: http://eoddata.com//stockquote/TSX/AAB.htm


### Now let's update the function to return dictionary 

In [55]:
def parse_document(tr_tag):
    
    td_tag = tr_tag.find_all('td')
    symbol = td_tag[0].find('a').text.strip()
    name = td_tag[1].text.strip()
    high = td_tag[2].text.strip()
    low = td_tag[3].text.strip()
    close = td_tag[4].text.strip()
    volume = td_tag[5].text.strip().replace(',', '')
    url = "http://eoddata.com/" + td_tag[0].find('a')['href']
    
    # Return a dictionary
    return {
        'Symbol': symbol,
        'Name': name,        
        'High': high,
        'Low': low,
        'Close': close,
        'Volume': volume,
        'URL': url
    }   

### Using function to get all the stock information of the given page

In [56]:
all_records_1 = [parse_document(tag) for tag in tr_parent1]
all_records_2 = [parse_document(tag) for tag in tr_parent2]

In [57]:
len(all_records_1) + len(all_records_2) # The length the page records matches with the len we found earlier.

127

#### Combining both the list 

In [58]:
all_records = [item for sublist in zip(all_records_1, all_records_2) for item in sublist]

In [59]:
len(all_records)

126

## Writing information to CSV files

In [60]:
def write_csv(items, path):
    # Open the file in write mode
    with open(path, 'w') as f:
        # Return if there's nothing to write
        if len(items) == 0:
            return
        
        # Write the headers in the first line
        headers = list(items[0].keys())
        f.write(','.join(headers) + '\n')
        
        # Write one item per line
        for item in items:
            values = []
            for header in headers:
                values.append(str(item.get(header, "")))
            f.write(','.join(values) + "\n")

### Testing the function

In [61]:
write_csv(all_records,"A.csv")

In [62]:
import pandas as pd

In [63]:
pd.read_csv('A.csv')

Unnamed: 0,Symbol,Name,High,Low,Close,Volume,URL
0,AAB,Aberdeen International Inc,0.050,0.045,0.045,177000,http://eoddata.com//stockquote/TSX/AAB.htm
1,AAV,Advantage Oil & Gas Ltd,9.930,9.290,9.860,1289915,http://eoddata.com//stockquote/TSX/AAV.htm
2,ABCT,ABC Technologies Holdings Inc,5.020,4.780,4.780,5600,http://eoddata.com//stockquote/TSX/ABCT.htm
3,ABST,Absolute Software Corp,13.890,13.410,13.850,50324,http://eoddata.com//stockquote/TSX/ABST.htm
4,ABTC,Accelerate Carbon Negative Bitcoin ETF,1.870,1.860,1.870,4100,http://eoddata.com//stockquote/TSX/ABTC.htm
...,...,...,...,...,...,...,...
121,AX.PR.E,Artis REIT Pref Ser E,23.000,23.000,23.000,100,http://eoddata.com//stockquote/TSX/AX.PR.E.htm
122,AX.PR.I,Artis REIT Pref Series I,24.430,24.200,24.200,516,http://eoddata.com//stockquote/TSX/AX.PR.I.htm
123,AX.UN,Artis Real Estate Investment Trust Units,9.060,8.940,9.000,314994,http://eoddata.com//stockquote/TSX/AX.UN.htm
124,AXIS,Axis Auto Finance Inc,0.485,0.465,0.485,4500,http://eoddata.com//stockquote/TSX/AXIS.htm


![](https://imgur.com/rFBRgTz.jpg)

## Final function with all the information above 

In [64]:
def scrap_stockInfo(alpha_list):  
    base_url = "http://eoddata.com/stocklist/TSX/"
    
    for i in range(len(alpha_list)):
        data_url = base_url + alpha_list[i] +".htm"
        response = requests.get(data_url)
        page_contents = response.text
        doc = BeautifulSoup(page_contents, 'html.parser')
        tr_tags1 = doc.find_all('tr',{'class':'ro'})
        tr_tags2 = doc.find_all('tr',{'class':'re'})
        all_records_1 = [parse_document(tag) for tag in tr_tags1]
        all_records_2 = [parse_document(tag) for tag in tr_tags2]
        all_records = [item for sublist in zip(all_records_1, all_records_2) for item in sublist]
        
        file_name = alpha_list[i] + ".csv"
        write_csv(all_records,file_name)

### Creating separate csv for each alphabet across multiple pages


In [65]:
alpha_list = ['A','B','D','E','F','G','H']

In [66]:
scrap_stockInfo(alpha_list)

![](https://imgur.com/18p9k7c.jpg)

In [67]:
pd.read_csv('H.csv')

Unnamed: 0,Symbol,Name,High,Low,Close,Volume,URL
0,H,Hydro One Ltd,36.71,36.16,36.64,832566,http://eoddata.com//stockquote/TSX/H.htm
1,HAB,Horizons Active Corporate Bond ETF,9.76,9.67,9.67,18401,http://eoddata.com//stockquote/TSX/HAB.htm
2,HAC,Horizons Seasonal Rotation ETF,25.02,24.92,24.92,533,http://eoddata.com//stockquote/TSX/HAC.htm
3,HAD,Horizons Active CDN Bond ETF,9.00,9.00,9.00,1800,http://eoddata.com//stockquote/TSX/HAD.htm
4,HAF,Horizons Active Global Fixed Income ETF,6.99,6.98,6.99,800,http://eoddata.com//stockquote/TSX/HAF.htm
...,...,...,...,...,...,...,...
161,HYLD,Hamilton Enhanced U.S. Covered Call ETF,12.12,11.90,12.03,177413,http://eoddata.com//stockquote/TSX/HYLD.htm
162,HYLD.U,Hamilton Enhanced US Coverd Call ETF USD,12.10,12.05,12.10,1307,http://eoddata.com//stockquote/TSX/HYLD.U.htm
163,HZD,Betapro Silver 2X Daily Bear ETF,15.05,14.85,15.04,14186,http://eoddata.com//stockquote/TSX/HZD.htm
164,HZM,Horizonte Minerals Plc,2.38,2.38,2.38,1000,http://eoddata.com//stockquote/TSX/HZM.htm


### We created csv for each stock starting with the Alphabets, Combining everything and removeing the individual files

In [68]:
import os

# create empty list
final_list = []
 
# append individual csv into the list
for i in range(len(alpha_list)):
    temp_df = pd.read_csv(alpha_list[i]+".csv")
    os.remove(alpha_list[i]+".csv")
    final_list.append(temp_df)
    
# create new data frame with the combined list
merged_df = pd.concat(final_list,axis=0, ignore_index=True)

# export into final csv
merged_df.to_csv( "Toronto_Stocks.csv", index=None)

In [69]:
merged_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 984 entries, 0 to 983
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Symbol  984 non-null    object 
 1   Name    984 non-null    object 
 2   High    984 non-null    float64
 3   Low     984 non-null    float64
 4   Close   984 non-null    float64
 5   Volume  984 non-null    int64  
 6   URL     984 non-null    object 
dtypes: float64(3), int64(1), object(3)
memory usage: 53.9+ KB


![](https://imgur.com/OKlUoJd.jpg)

In [73]:
merged_df.head(10)

Unnamed: 0,Symbol,Name,High,Low,Close,Volume,URL
0,AAB,Aberdeen International Inc,0.05,0.045,0.045,177000,http://eoddata.com//stockquote/TSX/AAB.htm
1,AAV,Advantage Oil & Gas Ltd,9.93,9.29,9.86,1289915,http://eoddata.com//stockquote/TSX/AAV.htm
2,ABCT,ABC Technologies Holdings Inc,5.02,4.78,4.78,5600,http://eoddata.com//stockquote/TSX/ABCT.htm
3,ABST,Absolute Software Corp,13.89,13.41,13.85,50324,http://eoddata.com//stockquote/TSX/ABST.htm
4,ABTC,Accelerate Carbon Negative Bitcoin ETF,1.87,1.86,1.87,4100,http://eoddata.com//stockquote/TSX/ABTC.htm
5,ABTC.U,Accelerate Carbon Neg Bitcoin ETF USD,1.33,1.33,1.33,2500,http://eoddata.com//stockquote/TSX/ABTC.U.htm
6,ABX,Barrick Gold Corp,24.06,23.6,23.79,6527468,http://eoddata.com//stockquote/TSX/ABX.htm
7,AC,Air Canada,19.83,19.5,19.58,2446800,http://eoddata.com//stockquote/TSX/AC.htm
8,ACAA,Arrow Canadian Advantage Alternative,20.31,20.31,20.31,100,http://eoddata.com//stockquote/TSX/ACAA.htm
9,ACB,Aurora Cannabis Inc,1.32,1.27,1.31,1188887,http://eoddata.com//stockquote/TSX/ACB.htm


In [74]:
merged_df.tail(10)

Unnamed: 0,Symbol,Name,High,Low,Close,Volume,URL
974,HXU,Betapro S&P TSX 60 2X Daily Bull ETF,17.94,17.72,17.88,52277,http://eoddata.com//stockquote/TSX/HXU.htm
975,HXX,Horizons Euro Stoxx 50 Index ETF,37.41,37.13,37.41,1731,http://eoddata.com//stockquote/TSX/HXX.htm
976,HYBR,Horizons Active Hybrd Bond Prf Share ETF,7.63,7.63,7.63,1000,http://eoddata.com//stockquote/TSX/HYBR.htm
977,HYDR,Horizons Global Hydrogen Index ETF,12.76,12.62,12.76,717,http://eoddata.com//stockquote/TSX/HYDR.htm
978,HYI,Horizons Active High Yield Bond ETF,7.68,7.62,7.65,7258,http://eoddata.com//stockquote/TSX/HYI.htm
979,HYLD,Hamilton Enhanced U.S. Covered Call ETF,12.12,11.9,12.03,177413,http://eoddata.com//stockquote/TSX/HYLD.htm
980,HYLD.U,Hamilton Enhanced US Coverd Call ETF USD,12.1,12.05,12.1,1307,http://eoddata.com//stockquote/TSX/HYLD.U.htm
981,HZD,Betapro Silver 2X Daily Bear ETF,15.05,14.85,15.04,14186,http://eoddata.com//stockquote/TSX/HZD.htm
982,HZM,Horizonte Minerals Plc,2.38,2.38,2.38,1000,http://eoddata.com//stockquote/TSX/HZM.htm
983,HZU,Betapro Silver 2X Daily Bull ETF,24.29,23.82,23.91,25968,http://eoddata.com//stockquote/TSX/HZU.htm


In [76]:
merged_df.loc[117:127]

Unnamed: 0,Symbol,Name,High,Low,Close,Volume,URL
117,AVL,Avalon Advanced Materials Inc,0.12,0.115,0.12,269234,http://eoddata.com//stockquote/TSX/AVL.htm
118,AVNT,Avant Brands Inc,0.185,0.175,0.18,97453,http://eoddata.com//stockquote/TSX/AVNT.htm
119,AVNT.WT,Avant Brands Inc WT,0.005,0.005,0.005,1000,http://eoddata.com//stockquote/TSX/AVNT.WT.htm
120,AW.UN,A&W Revenue Royalties Income Fund,36.75,35.89,36.0,4386,http://eoddata.com//stockquote/TSX/AW.UN.htm
121,AX.PR.E,Artis REIT Pref Ser E,23.0,23.0,23.0,100,http://eoddata.com//stockquote/TSX/AX.PR.E.htm
122,AX.PR.I,Artis REIT Pref Series I,24.43,24.2,24.2,516,http://eoddata.com//stockquote/TSX/AX.PR.I.htm
123,AX.UN,Artis Real Estate Investment Trust Units,9.06,8.94,9.0,314994,http://eoddata.com//stockquote/TSX/AX.UN.htm
124,AXIS,Axis Auto Finance Inc,0.485,0.465,0.485,4500,http://eoddata.com//stockquote/TSX/AXIS.htm
125,AYA,Aya Gold and Silver Inc,9.75,8.92,9.64,648010,http://eoddata.com//stockquote/TSX/AYA.htm
126,BABY,Else Nutrition Holdings Inc,0.58,0.54,0.55,34848,http://eoddata.com//stockquote/TSX/BABY.htm


## Summary

Finally, we have managed to `parse` 'EOD Data website' to get our hands on very **interesting and insightful data** when it comes world of financial stock information.  
We have saved all the information we could extract from that website for our needs in a `CSV` file using which we can further get answers to a lot of questions we may want to ask, e.g - `Which stock was best but on the given day`
![](https://imgur.com/uAGgHE3.jpg)

>### Packages Used:
>1. Requests — For downloading the HTML code from the IMDB URL
>2. BeautifulSoup4 — For parsing and extracting data from the HTML string
>3. Pandas — to gather my data into a dataframe for further processing



Let us look at the steps that we took from start to finish : 

1. We downloaded the webpage using `requests`  


2. We `parsed` the HTML source code using `BeautifulSoup` library and extracted the desired information, i.e.
    * Stock Name
    * Opening and closing price of each stock


3. We extracted detailed information for each stock,such as :
    * Stock Symbol
    * Stock Name
    * Highest price
    * Lowest price
    * Closing price	
    * Total volumes traded
    * URL to get the historical data of the stock	


4. We then created a `Python Dictionary` to save all these details


5. We converted the python dictionary into `Pandas DataFrames`


6. Then we combined the multiple csv files generated for each alphabets into single data frame and remove others.


7. With one single DataFrame in hand, we then converted it into a single `CSV` file, which was the goal of our project.

## Future Work

We can now work forward to explore this data more and more to fetch meaningful information out of it.  

With all the insights , and further analysis into the data, we can have answers to a lot of questions like -   
* Which stock performed better on the given day 
* Which stock traded more based on volume
* Individual stock information
* Gain/Loss information of the stock

And the list goes on..

In the future, I would like to work to make this `DataSet` even richer with 

* Stock information for symbols starting with other alphabets
* Scrap the individual stock detail page to get more insights of specific stock
* Scrap different exchanges like NASDAQ and others...
* Automation script to scrap the stock information on daily basis to generate the data set which can be further used for Exploratory Data Analysis and draw interesting insights for stock market across different exchanges.

## References 


[1] Python offical documentation. https://docs.python.org/3/


[2] Requests library. https://pypi.org/project/requests/


[3] Beautiful Soup documentation. https://www.crummy.com/software/BeautifulSoup/bs4/doc/


[4] Aakash N S, Introduction to Web Scraping, 2021. https://jovian.ai/aakashns/python-web-scraping-and-rest-api


[5] Pandas library documentation. https://pandas.pydata.org/docs/


[6] IMDB Website. https://www.imdb.com/chart/top


[7] Web Scraping Article. https://www.toptal.com/python/web-scraping-with-python


[8] Web Scraping Image. https://morioh.com/p/431153538ecb

[8] Working with Jupyter Notebook https://towardsdatascience.com/write-markdown-latex-in-the-jupyter-notebook-10985edb91fd

In [None]:
jovian.commit(files=['Toronto_Stocks.csv'])


<IPython.core.display.Javascript object>