# Web Scraping on Financial News

**Tags:** #BeautifulSoup #pandas #dataframe #moneycontrol.com

**Author:** [Kevin Leung](https://www.linkedin.com/in/kelvinleung421/)

**Last update:** 2025-01-14 (Created: 2023-01-14)

**Description:** This notebook scrapes a financial news article from a moneycontrol.com, extracts key information (headline, description, date published, article body), and stores the extracted data into a pandas DataFrame.



**References:**
- [Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
- [Financial news - Lloyds Metals stock soars to record high after Incred initiates 'add' call, sees 40% upside](https://www.moneycontrol.com/news/business/markets/lloyds-metals-stock-soars-to-record-high-after-incred-initiates-add-call-sees-40-upside-12885835.html)

### Import libraries

In [74]:
import bs4
import requests
import json
import re
import pandas as pd

### Web scraping using BeautifulSoup

In [77]:
url = "https://www.moneycontrol.com/news/business/markets/lloyds-metals-stock-soars-to-record-high-after-incred-initiates-add-call-sees-40-upside-12885835.html"

# Send a GET request to the URL
request = requests.get(url)
soup = bs4.BeautifulSoup(request.text, 'html.parser')

# Find all script tags with type 'application/ld+json' on script 3
all_script = soup.find_all('script', attrs={'type': 'application/ld+json'})
script_include_neccessary_info = all_script[3]
raw_article_str = script_include_neccessary_info.get_text().replace('\r\n', ' ')

### Clean up the string by removing unnecessary characters

In [78]:
# Clean up the string by removing unnecessary characters
parts = re.split(r"""("[^"]*"|'[^']*')""", raw_article_str)
article_str = "".join(parts)
article_str = article_str[1:]
article_str = article_str[:-1]

# Load the JSON data into a dictionary
article_dict = json.loads(article_str)

### Output - Display the result in txt

In [79]:
# Access specific keys from the dictionary
headline = article_dict[0]['headline']
description = article_dict[0]['description']
date_published = article_dict[0]['datePublished']
article_body = article_dict[0]['articleBody']

print("Headline:", headline)
print("Description:", description)
print("Date Published:", date_published)
print("Article Body:", article_body)

Headline: Lloyds Metals stock soars to record high after Incred initiates 'add' call, sees 40% upside
Description: Incred's optimism about Lloyds Metals is driven by India's surging steel demand and elevated iron ore prices, factors that are expected to push the company's growth trajectory higher.
Date Published: 2024-12-09T12:54:22+05:30
Article Body: Lloyds Metals and Energy shares jumped 4.5 percent and hit a record high of Rs&nbsp;1,099 on December 9, fuelled by brokerage firm Incred Equities' upbeat growth outlook and bullish price target.  Incred Equities initiated an 'add' call on Lloyds Metals, assigning it a price target of Rs 1,476 which translates into an upside potential of a whopping 40 percent from Friday's close. Incred's bullishness over Lloyds Metals stems from&nbsp; India's soaring steel demand and high iron ore prices, which are likely to benefit the company's growth trajectory.  At 12.45 pm, shares of Lloyds Metals and Energy were trading at Rs 1,088.50 on the NSE. 

### Output - Display the result in Dataframe

In [81]:
data = {
    "Headline": [headline],
    "Description": [description],
    "Date Published": [date_published],
    "Article Body": [article_body]
}

df = pd.DataFrame(data)
df

Unnamed: 0,Headline,Description,Date Published,Article Body
0,Lloyds Metals stock soars to record high after...,Incred's optimism about Lloyds Metals is drive...,2024-12-09T12:54:22+05:30,Lloyds Metals and Energy shares jumped 4.5 per...
