# Web Scrapping of Justdail Hosptial Data

# Web Scraping of Hospital Data from Justdial Using Python Requests and BeautifulSoup

This notebook demonstrates the process of web scraping data from the Justdial website, specifically focusing on extracting hospital-related information. The scraping is achieved using **Python's Requests** library to send HTTP requests and **BeautifulSoup** to parse and extract relevant data from the HTML content of the webpage.

This notebook walks through the steps to:
1. Send HTTP requests to fetch the Justdial page containing hospital information.
2. Use BeautifulSoup to parse the HTML and extract the data.
3. Clean and store the extracted data for further use or analysis.

This notebook provides a practical approach to web scraping, focusing on real-world applications like gathering information about hospitals, including contact details, locations, and services.


### Libraries

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup   #library used to simplify the web page html

In [2]:
!pip install bs4



bs4 is used to parse and extract relevant data from the HTML content of the webpage.

### Sending an HTTP Request to Justdial for Hospital Data

In [3]:
URL = 'https://www.justdial.com/Patna/Hospitals/nct-10253670?trkid=173-remotecity&term=Hospitals'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'}

# Get the response
response = requests.get(url=URL, headers=headers).text
#print(response)


In [4]:
soup=BeautifulSoup(response,'lxml')

we use **BeautifulSoup** to parse the HTML response obtained from the Justdial page. The `lxml` parser is used to efficiently navigate and search through the HTML structure.

In [5]:
#print(soup.prettify())   # this print the structure way of webpage

### Extracting Relevant Information: Hospital Name, Rating, Contact Info, Address, and Total Ratings

In this step, we focus on extracting specific and relevant information about the hospitals listed on the Justdial webpage. Using **BeautifulSoup**, we will precisely extract the following details for each hospital:

- **Hospital Name**: The name of the hospital.
- **Rating**: The overall rating score of the hospital.
- **Contact Information**: Phone numbers or any available contact details.
- **Address**: The physical location of the hospital.
- **Total Ratings**: The total number of ratings the hospital has received.

This process enables us to collect structured data that can be used for analysis, visualization, or other applications, such as creating a directory or a recommendation system based on hospital ratings and reviews.


In [6]:
soup.find_all('h3')[0].text  #Extact the name of hospital on the basis of tags

'Heart Hospital Pvt Ltd'

In [7]:
soup.find_all(class_='resultbox_totalrate')[0].text   #Extract the Rating using unique class name

'4.0'

In [8]:
soup.find_all(class_='locatcity')[0].text       #Extract the Address using unique class name

'Chandralay Kankarbagh, Patna'

In [9]:
soup.find_all(class_='callcontent')[5].text    #Extract the name Contact using unique class name

'08511636498'

In [10]:
soup.find(class_='resultbox_countrate').text    ##Extract the Total Rating using unique class name

'786 Ratings'

In [11]:
information_box= soup.find_all(class_='resultbox_info') #Extract the Div that contains all the information.

In [12]:
len(information_box)     #total no. of boxes

10

### Iterating Through the Data and Storing Information in Lists

In this step, we iterate through the parsed HTML content to extract relevant hospital information such as name, rating, contact info, address, and total ratings. We loop through the elements on the webpage and append each piece of information to a list.

The process involves:

- Extracting the **hospital name** from the appropriate HTML tag.
- Extracting the **rating** and **total ratings** for each hospital.
- Retrieving the **address** (or locality) of the hospital.
- Collecting the **contact information** (e.g., phone numbers).

The extracted data is then stored in separate lists for each category, making it easier to work with and analyze.


In [13]:
name=[]
address=[]
contact_info=[]
rating=[]
total_ratings=[]
for i in range(0, len(information_box)):
    names=soup.find_all('h3')[i].text
    ratings=soup.find_all(class_='resultbox_totalrate')[i].text
    t_ratings=soup.find(class_='resultbox_countrate').text
    locality=soup.find_all(class_='locatcity')[i].text
    contact=soup.find_all(class_='callcontent')[i].text
    
    name.append(names)
    rating.append(ratings)
    total_ratings.append(t_ratings)
    address.append(locality)
    contact_info.append(contact)
    
print(name)
print(rating)
print(total_ratings)
print(address)
print(contact_info)
    

['Heart Hospital Pvt Ltd', 'Jyotipunj Hospital', 'Dr. Prabhat Memorial Hiramati Hospital (AIIMS New Delhi Alumni Initiative)', 'Jay Prabha Medanta Super Specialty Hospital', 'Pancardia Hospital (Heart and Multi Super Speciality Hospital Pvt Ltd)', 'G.B HOSPITAL', 'Patna Womens Hospital', 'Shree Balaji Netralaya', 'Patna Vatsalya Child Hospital', 'Awantika Memorial Hospital']
['4.0', '4.6', '4.2', '4.5', '4.2', '4.6', '4.0', '4.9', '4.9', '3.5']
['786 Ratings', '786 Ratings', '786 Ratings', '786 Ratings', '786 Ratings', '786 Ratings', '786 Ratings', '786 Ratings', '786 Ratings', '786 Ratings']
['Chandralay Kankarbagh, Patna', 'Jyotipunj Hospital Boring Road, Patna', 'Near Ravindra Balika Vidyalaya Rajendranagar, Patna', 'Pani Tanki Road Kankarbagh, Patna', 'Kankarbagh Main Road Kankarbagh, Patna', 'North Of Piller No -73 Sheikhpura, Patna', 'D.s.-12 Lohia Nagar Kankarbagh, Patna', 'Nala Road Saidpur, Patna', 'Kankarbagh Main Road Kankarbagh, Patna', '90 Feet Road Vijay Nagar, Patna']
['

### Converting the Lists into a Python Dictionary and Using it in a DataFrame

After extracting the relevant information and storing it in lists, we now convert these lists into a **Python dictionary**. This dictionary will serve as the foundation for creating a **Pandas DataFrame**, a powerful data structure that allows for easier manipulation, analysis, and visualization of the data.

In this step:
1. We map the lists (containing hospital names, addresses, contact info, ratings, and total ratings) into a dictionary with appropriate keys.
2. We use **Pandas** to create a DataFrame from the dictionary.
3. Finally, we display the first five rows of the DataFrame using `.head(5)` to verify the structure and ensure the data is correctly organized.


In [14]:
data={
    'hospital_name':name,
    'Address': address,
    'Contact': contact_info,
    'Rating': rating,
    'Total_Rating':total_ratings,
    
}
df=pd.DataFrame(data)
df.head(5)

Unnamed: 0,hospital_name,Address,Contact,Rating,Total_Rating
0,Heart Hospital Pvt Ltd,"Chandralay Kankarbagh, Patna",7383519271,4.0,786 Ratings
1,Jyotipunj Hospital,"Jyotipunj Hospital Boring Road, Patna",7760601100,4.6,786 Ratings
2,Dr. Prabhat Memorial Hiramati Hospital (AIIMS ...,"Near Ravindra Balika Vidyalaya Rajendranagar, ...",8123776419,4.2,786 Ratings
3,Jay Prabha Medanta Super Specialty Hospital,"Pani Tanki Road Kankarbagh, Patna",7383033936,4.5,786 Ratings
4,Pancardia Hospital (Heart and Multi Super Spec...,"Kankarbagh Main Road Kankarbagh, Patna",8487858537,4.2,786 Ratings
