# Web Scraping Web Table using Python
---
This Jupyter Notebook demonstrates how to extract tabular data from a **Web page** using Python libraries `requests`, `BeautifulSoup` and `pandas`.



In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd



### Imports three core Python libraries:

requests: for fetching web pages using HTTP.

BeautifulSoup: for parsing and navigating HTML/XML content.

pandas: for reading structured tables and storing them as DataFrames.


These libraries together form a complete workflow — fetch → parse → extract → analyze.

### Set the target URL
We specify the web page we want to scrape data from.


Defines the target webpage URL (the page listing large dams in India).

This URL acts as the source from which your script will collect the data.

In [8]:
# URL of the dams page
url = "https://indiawris.gov.in/wiki/doku.php?id=large_dams_in_india"


### Send the HTTP request to fetch page content
We use `requests.get()` to retrieve the HTML of the web page.

In [9]:
# Fetch the web page using requests
response = requests.get(url)
print(response)

#response.status_code

<Response [200]>


Sends an HTTP GET request to the given URL and prints the response object.


The response object tells you whether your request was successful.

A status code like 200 means success; others (e.g., 404) indicate errors.

### Converts the HTML content of the response into a structured parse tree using BeautifulSoup.

You can now search for tags like < table >, < div >, or < p >  easily — ideal for extracting data.


In [10]:
soup = BeautifulSoup(response.text, "html.parser")
print(soup)


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html dir="ltr" lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<title>large_dams_in_india - INDIA WRIS WIKI</title>
<meta content="DokuWiki" name="generator"/>
<meta content="index,follow" name="robots"/>
<meta content="large_dams_in_india" name="keywords"/>
<link href="/wiki/lib/exe/opensearch.php" rel="search" title="INDIA WRIS WIKI" type="application/opensearchdescription+xml"/>
<link href="/wiki/" rel="start"/>
<link href="/wiki/doku.php?id=large_dams_in_india&amp;do=index" rel="contents" title="Sitemap"/>
<link href="/wiki/lib/exe/manifest.php" rel="manifest"/>
<link href="/wiki/feed.php" rel="alternate" title="Recent Changes" type="application/rss+xml"/>
<link href="/wiki/feed.php?mode=list&amp;ns=" rel="alternate" title="Current namespace" type="application/rss+xml"/

In [11]:
table = soup.find("table")
print (table)

<table class="inline">
<thead>
<tr class="row0">
<th class="col0">#</th><th class="col1">Name</th><th class="col2">Purpose</th><th class="col3">River</th><th class="col4">Nearest City</th><th class="col5">District</th><th class="col6">State</th><th class="col7">Basin</th><th class="col8">Status</th><th class="col9">Completion Year</th><th class="col10">Type</th><th class="col11">Length (m)</th><th class="col12">Max Height above Foundation (m)</th><th class="col13">Design Gross Storage Capacity (MCM)</th>
</tr>
</thead>
<tr class="row1">
<th class="col0">1</th><td class="col1"><a class="urlextern" href="http://59.179.19.250/wrpinfo/wiki1.php?show=D00742&amp;str2=http://59.179.19.250/wrpinfo/index.php?title=Tehri_Dam_D00742" rel="nofollow" title="http://59.179.19.250/wrpinfo/wiki1.php?show=D00742&amp;str2=http://59.179.19.250/wrpinfo/index.php?title=Tehri_Dam_D00742">Tehri Dam</a></td><td class="col2">Hydroelectric,Irrigation</td><td class="col3">Bhagirathi</td><td class="col4">Pratapnag

In [12]:
# Read the HTML table into a pandas DataFrame
df = pd.read_html(str(table))
df


  df = pd.read_html(str(table))


[     #                                Name  \
 0    1                           Tehri Dam   
 1    2                         Lakhwar Dam   
 2    3         Idukki (Eb)/Idukki Arch Dam   
 3    4                          Bhakra Dam   
 4    5                       Pakal Dul Dam   
 5    6          Sardar Sarover Gujarat Dam   
 6    7           Srisailam (N.S.R.S.P) Dam   
 7    8                    Ranjit Sagar Dam   
 8    9                        Baglihar Dam   
 9   10                       Chamera I Dam   
 10  11                 Cheruthoni (Eb) Dam   
 11  12                            Pong Dam   
 12  13                         Jamrani Dam   
 13  14       Subansiri Lower HE (Nhpc) Dam   
 14  15                        Ramganga Dam   
 15  16                 Nagarjuna Sagar Dam   
 16  17                      Kakki (Eb) Dam   
 17  18                            Nagi Dam   
 18  19  Salal (Rockfill And Concrete ) Dam   
 19  20                          Lakhya Dam   
 20  21      

In [13]:
df2 = pd.read_html(str(table))[0]
df2


  df2 = pd.read_html(str(table))[0]


Unnamed: 0,#,Name,Purpose,River,Nearest City,District,State,Basin,Status,Completion Year,Type,Length (m),Max Height above Foundation (m),Design Gross Storage Capacity (MCM)
0,1,Tehri Dam,"Hydroelectric,Irrigation",Bhagirathi,Pratapnagar,Tehri Garhwal,Uttarakhand,Ganga,Completed,2005.0,Earthen / Gravity & Masonry,575.0,260.5,3540.0
1,2,Lakhwar Dam,"Hydroelectric,Irrigation",Yamuna,Dehradun,Dehradun,Uttarakhand,Ganga,Proposed,,Earthen / Gravity & Masonry,451.0,204.0,587.84
2,3,Idukki (Eb)/Idukki Arch Dam,Hydroelectric,Periyar,Todupulai,Idukki,Kerala,West flowing rivers from Tadri to Kanyakumari,Completed,1974.0,Gravity & Masonry,366.0,169.0,1998.57
3,4,Bhakra Dam,"Hydroelectric,Irrigation,Recreation",Satluj,Bilaspur,Bilaspur,Himachal Pradesh,Indus up to International Border,Completed,1963.0,Earthen / Gravity & Masonry,518.16,167.64,9867.84
4,5,Pakal Dul Dam,Hydroelectric,Marusudar,Kishtwar,Kishtwar,Jammu & Kashmir,Indus up to International Border,Proposed,,Earthen / Gravity & Masonry,305.0,167.0,0.1254
5,6,Sardar Sarover Gujarat Dam,"Hydroelectric,Irrigation",Narmada,Rajpipla,Narmada,Gujarat,Narmada,Completed,,Gravity & Masonry,1210.0,163.0,9500.0
6,7,Srisailam (N.S.R.S.P) Dam,"Hydroelectric,Irrigation",Krishna,Nandikotkur,Kurnool,Telangana,Krishna,Completed,1984.0,Earthen,512.0,145.0,8724.88
7,8,Ranjit Sagar Dam,"Flood Control,Hydroelectric,Irrigation",Ravi,Pathankot,Kathua,Punjab,Indus up to International Border,Completed,1999.0,Earthen,617.0,145.0,3280.0
8,9,Baglihar Dam,Hydroelectric,CHENAB,Ramban,Ramban,Jammu & Kashmir,Indus up to International Border,Completed,2009.0,Gravity & Masonry,364.362,143.0,475.0
9,10,Chamera I Dam,Hydroelectric,Ravi,Bhattiyat,Chamba,Himachal Pradesh,Indus up to International Border,Completed,1994.0,Earthen / Gravity & Masonry,295.0,140.0,242.3


### In summary, the process:

#### Fetches a webpage (requests)

#### Parses HTML (BeautifulSoup)

#### Extracts tables (pandas)

#### Converts them into an analytical DataFrame