# Web Scraping Companies Financial Data on MoneyControl using Python

![](https://imgur.com/kAFgDAT.png)

### About MoneyControl
https://www.moneycontrol.com/cdata/aboutus

## Web Scraping

Web scraping is the process of extracting and parsing data from websites in an automated fashion using a computer program. It's a useful technique for creating datasets for research and learning. Mostly it is unstructured html data which is then converted into structured data and stored in spreadsheet or in database format.

### The steps we'll follow:

- We're going to scrape https://www.moneycontrol.com/stocks/marketinfo/marketcap/bse
- We'll get a list of companies.
- For each company, we'll get the company name, company page URL
- For each company, we'll get company's last tarded stock price, Percentage change in stock price and Market Capitalisation.
- Save the information data to CSV file Using Pandas library

The output will look like this:

Name,	LTP,	% Chg.,	Market Cap, URL.

![](https://imgur.com/4P1ZLWF.png)

## Scrape the list of Companies from MoneyControl

 - We will use Requests library to downlaod the page.
 - we will use BeautifulSoup to parse and extract information.
 - convert to a Pandas dataframe

### Install and import required libraries.

In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd


## Downloading the web page using requests

When you access a URL like using a web browser, it downloads the contents of the web page the URL points to and displays the output on the screen. Before we can extract information from a web page, we need to download the page using Python.

We'll use a library called requests to download web pages from the internet. We can download a web page using the requests.get function.

We'll use a library called BeautifulSoup to Parse the HTML source code.



*   About requests library - https://requests.readthedocs.io/en/latest/
*   About BeautifulSoup library - https://beautiful-soup-4.readthedocs.io/en/latest/
*   About HTML - https://html.com/

In [None]:
def get_company_page():
    # TODO - add comments
    topics_url = 'https://www.moneycontrol.com/stocks/marketinfo/marketcap/bse/index.html'
    response = requests.get(topics_url)
    if response.status_code != 200:
        raise Exception('Failed to load page {}'.format(topic_url))
    doc = BeautifulSoup(response.text, 'html.parser')
    return doc

In [None]:
doc = get_company_page()

## Inspecting HTML in the Browser
To view the source code of any webpage right within your browser, you can right click anywhere on a page and select the “Inspect” option. You access the “Developer Tools” mode, where you can see the source code as a tree. You can expand and collapse various nodes and find the source code for a specific portion of the page.

![](https://imgur.com/JJf5pnb.png)

Now let’s get the 'tr' and the total number of companies

In [None]:
company = doc.find_all('tr')

In [None]:
len(company)

102

## Dropping unecessary data.

In [None]:
company.pop(0)

<tr><td><link href="https://www.moneycontrol.com/rss/latestnews.xml" rel="alternate" title="MoneyControl.com News" type="application/rss+xml"/></td></tr>

In [None]:
company.pop(0)

<tr class="bggry">
<th align="left" class="brdrgtgry" width="25%">Company Name</th></tr>

Now let’s get the individual details for the first company, which has all the information required

In [None]:
company[0]

<tr>
<td class="brdrgtgry" width="25%"><a class="bl_12" href="/india/stockpricequote/refineries/relianceindustries/RI"><b>Reliance</b></a>
<div class="addPrWhs">
<a class="mIcon" href="javascript:void(0);"></a>
<div class="ddlist">
<ul>
<li><a class="watch" href="javascript:;" onclick="javascript:chkbx_val('RI','1');">Add to Watchlist</a></li>
<li><a class="port" href="javascript:;" onclick="javascript:chkbx_val('RI','5');">Add to Portfolio</a></li>
</ul>
</div>
</div>
</td>
<td align="right" class="brdrgtgry" style="color:#16a903">2,642.40</td>
<td align="right" class="brdrgtgry" style="color:#16a903">0.37</td>
<td align="right" class="brdrgtgry">2,855.00</td>
<td align="right" class="brdrgtgry">2,110.15</td>
<td align="right" class="brdrgtgry">1,787,789.57</td>
</tr>

In [None]:
company[2].text.strip()

'Reliance\n\n\n\n\nAdd to Watchlist\nAdd to Portfolio\n\n\n\n\n2,642.40\n0.37\n2,855.00\n2,110.15\n1,787,789.57'

Now let’s get the individual 'td' for the company, which has all the information required

In [None]:
com = company[2].find_all('td')
com

[<td class="brdrgtgry" width="25%"><a class="bl_12" href="/india/stockpricequote/refineries/relianceindustries/RI"><b>Reliance</b></a>
 <div class="addPrWhs">
 <a class="mIcon" href="javascript:void(0);"></a>
 <div class="ddlist">
 <ul>
 <li><a class="watch" href="javascript:;" onclick="javascript:chkbx_val('RI','1');">Add to Watchlist</a></li>
 <li><a class="port" href="javascript:;" onclick="javascript:chkbx_val('RI','5');">Add to Portfolio</a></li>
 </ul>
 </div>
 </div>
 </td>,
 <td align="right" class="brdrgtgry" style="color:#16a903">2,642.40</td>,
 <td align="right" class="brdrgtgry" style="color:#16a903">0.37</td>,
 <td align="right" class="brdrgtgry">2,855.00</td>,
 <td align="right" class="brdrgtgry">2,110.15</td>,
 <td align="right" class="brdrgtgry">1,787,789.57</td>]

Now let’s parse through each td tag to get the required information for specific stock and display it.

In [None]:
name = com[0].find('b').text.strip()

In [None]:
name

'Reliance'

In [None]:
L_T_price = com[1].text.strip()

In [None]:
L_T_price

'2,642.40'

In [None]:
Percent_change = com[2].text.strip()

In [None]:
Percent_change

'0.37'

In [None]:
Market_Cap = com[-1].text.strip()

In [None]:
Market_Cap

'1,787,789.57'

In [None]:
url = "https://www.moneycontrol.com/"  + com[0].find('a')['href']

In [None]:
url

'https://www.moneycontrol.com//india/stockpricequote/refineries/relianceindustries/RI'

Print all the Values

In [None]:

print("Name:", format(name))
print("Last_Traded_Price:", format(L_T_price))
print("% Chg.:", format(Percent_change))
print("Market Cap:", format(Market_Cap))
print("URL:", format(url))

Name: Reliance
Last_Traded_Price: 2,642.40
% Chg.: 0.37
Market Cap: 1,787,789.57
URL: https://www.moneycontrol.com//india/stockpricequote/refineries/relianceindustries/RI


## Function to Clean the string

In [None]:
def remove_coma(a): # This is a function that removes coma(,)
  b = a.replace(',','')
  return b


## Let us do the same thing in a function.

In [None]:
def parse_document(company):
    
    com = company.find_all('td')
    name = com[0].find('b').text.strip()
    L_T_price = com[1].text.strip()
    Percent_change = com[2].text.strip()
    Market_Cap = com[-1].text.strip()
    url = "https://www.moneycontrol.com/" + com[0].find('a')['href']
    
    # Return a dictionary
    return {
        'Name': name,        
        'LTP': remove_coma(L_T_price),
        '% Chg.': Percent_change,
        'Market Cap': remove_coma(Market_Cap),
        'URL': url
    }   

In [None]:
all_records = [parse_document(tag) for tag in company]

In [None]:
len(all_records)

100

In [None]:
all_records[:5]

[{'Name': 'Reliance',
  'LTP': '2642.40',
  '% Chg.': '0.37',
  'Market Cap': '1787789.57',
  'URL': 'https://www.moneycontrol.com//india/stockpricequote/refineries/relianceindustries/RI'},
 {'Name': 'TCS',
  'LTP': '3401.05',
  '% Chg.': '0.04',
  'Market Cap': '1244461.67',
  'URL': 'https://www.moneycontrol.com//india/stockpricequote/computerssoftware/tataconsultancyservices/TCS'},
 {'Name': 'HDFC Bank',
  'LTP': '1503.45',
  '% Chg.': '1.26',
  'Market Cap': '835600.60',
  'URL': 'https://www.moneycontrol.com//india/stockpricequote/banksprivatesector/hdfcbank/HDF01'},
 {'Name': 'Infosys',
  'LTP': '1598.00',
  '% Chg.': '0.22',
  'Market Cap': '672393.34',
  'URL': 'https://www.moneycontrol.com//india/stockpricequote/computerssoftware/infosys/IT'},
 {'Name': 'HUL',
  'LTP': '2636.10',
  '% Chg.': '1.58',
  'Market Cap': '619375.75',
  'URL': 'https://www.moneycontrol.com//india/stockpricequote/personalcare/hindustanunilever/HU'}]

## Let us write a function to create a CSV file. The function accepts a dictionary of records and path/filename as parameters.

In [None]:
def write_csv(items, path):
    # Open the file in write mode
    with open(path, 'w') as f:
        # Return if there's nothing to write
        if len(items) == 0:
            return
        
        # Write the headers in the first line
        headers = list(items[0].keys())
        f.write(','.join(headers) + '\n')
        
        # Write one item per line
        for item in items:
            values = []
            for header in headers:
                values.append(str(item.get(header, "")))
            f.write(','.join(values) + "\n")

In [None]:
write_csv(all_records,"Company_details.csv")

## Let us dispaly the dataset we created.

In [None]:
pd.read_csv('Company_details.csv')

Unnamed: 0,Name,LTP,% Chg.,Market Cap,URL
0,Reliance,2642.40,0.37,1787789.57,https://www.moneycontrol.com//india/stockprice...
1,TCS,3401.05,0.04,1244461.67,https://www.moneycontrol.com//india/stockprice...
2,HDFC Bank,1503.45,1.26,835600.60,https://www.moneycontrol.com//india/stockprice...
3,Infosys,1598.00,0.22,672393.34,https://www.moneycontrol.com//india/stockprice...
4,HUL,2636.10,1.58,619375.75,https://www.moneycontrol.com//india/stockprice...
...,...,...,...,...,...
95,Macrotech Dev,1107.80,1.16,53353.14,https://www.moneycontrol.com//india/stockprice...
96,JSW Energy,322.85,0.97,53077.56,https://www.moneycontrol.com//india/stockprice...
97,INDUS TOWERS,195.80,-0.33,52766.87,https://www.moneycontrol.com//india/stockprice...
98,Bosch,17814.20,3.34,52540.56,https://www.moneycontrol.com//india/stockprice...


## Future Work
 * We may get the list of more companies listed on BSE. We may all Create a large Data Frame for Future Analysis. 
 * Explore other complex websites.
 * Explore how we might go about scraping data using Selenium.

## Summary

 * Install and import libraries
 * Download and Parse the Best seller HTML page source code using resquest and Beautifulsoup to get item categories topics URL.
 * Extract information from each page
 * Creadted Pandas DataFrame using ain Function
 * Save the information data to CSV file Using Pandas library

## References
* Python offical documentation. - https://docs.python.org/3/

* Aakash N S, Introduction to Web Scraping, 2021. - https://jovian.ai/aakashns/python-web-scraping-and-rest-api

* requests library documentation. - https://requests.readthedocs.io/en/latest/
* BeautifulSoup library documentation. - https://beautiful-soup-4.readthedocs.io/en/latest/
* HTML Tutorial - https://html.com/

In [None]:
pip install jovian

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting jovian
  Downloading jovian-0.2.41-py2.py3-none-any.whl (68 kB)
[K     |████████████████████████████████| 68 kB 3.2 MB/s 
Collecting uuid
  Downloading uuid-1.30.tar.gz (5.8 kB)
Building wheels for collected packages: uuid
  Building wheel for uuid (setup.py) ... [?25l[?25hdone
  Created wheel for uuid: filename=uuid-1.30-py3-none-any.whl size=6503 sha256=f5b8d887b9ebe0096cce3f80b36a47ef01a48442890c7cb0fed2e9ae6aa1af25
  Stored in directory: /root/.cache/pip/wheels/2a/ea/87/dd57f1ecb4f0752f3e1dbf958ebf8b36d920d190425bcdc24d
Successfully built uuid
Installing collected packages: uuid, jovian
Successfully installed jovian-0.2.41 uuid-1.30


In [None]:

import jovian
jovian.commit(files=['Company_details.csv'])

[jovian] Detected Colab notebook...[0m


[31m[jovian] Error: jovian.commit doesn't work on Colab unless the notebook was created and executed from Jovian.
Make sure to run the first code cell at the top after executing from Jovian.
Alternatively, you can download this notebook and upload it manually to Jovian.
Learn more: https://jovian.ai/docs/user-guide/run.html#run-on-colab[0m
