## Web Scrapping using BeautifulSoup

1.Use the request library and the link to extract the data. 

2.Use BeautifulSoup to prepare the website's source code, then try to find a table on the source page.

3.After finding the table, extract data from all available columns and store it in the dataframe.


#### Install library if it is not found ,use pip command to install

In [None]:
!pip install bs4

Collecting bs4
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py): started
  Building wheel for bs4 (setup.py): finished with status 'done'
  Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1277 sha256=f27a42d979fa6c134df5adffbe71a91eab2a98d2e9123fa194846cc2bbfcc4dc
  Stored in directory: c:\users\abhishek\appdata\local\pip\cache\wheels\75\78\21\68b124549c9bdc94f822c02fb9aa3578a669843f9767776bca
Successfully built bs4
Installing collected packages: bs4
Successfully installed bs4-0.0.1


**Step-1:** Importing Libraries.

In [1]:
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
import dateutil

**Step-2:** Using request library, fetch data from given link. <br>
Call get method with help of request library and pass given link as perameter.

In [2]:
result = requests.get("https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population")

In [3]:
assert result.status_code==200  

**Step-3:** Preparing source code of website

In [4]:
src = result.content
document = BeautifulSoup(src, 'lxml')

**Step-4:** Find 'table' tag in prepared document

In [5]:
table = document.find("table")
table

<table class="wikitable sortable">
<tbody><tr>
<th>Rank</th>
<th><a href="/wiki/List_of_sovereign_states" title="List of sovereign states">Country</a> / <a href="/wiki/Dependent_territory" title="Dependent territory">Dependency</a></th>
<th><a href="/wiki/United_Nations_geoscheme" title="United Nations geoscheme">Region</a></th>
<th>Population</th>
<th>Percentage of the world</th>
<th>Date</th>
<th><span class="nowrap">Source (official or from</span> the <a href="/wiki/United_Nations" title="United Nations">United Nations</a>)</th>
<th>Notes
</th></tr>
<tr>
<td align="center"><b>–</b>
</td>
<td><b>World</b>
</td>
<td align="center"></td>
<td style="text-align:center"><b> 7,941,814,000</b></td>
<td style="text-align:right"><b>100%</b></td>
<td><b><span data-sort-value="000000002022-04-07-0000" style="white-space:nowrap">7 Apr 2022</span></b></td>
<td style="text-align:left"><b>UN projection<sup class="reference" id="cite_ref-unpop_1-1"><a href="#cite_note-unpop-1">[1]</a></sup></b></td>

In [6]:
assert table.find("th").get_text() == "Rank"

**Step-5:** Read prepared document and extract the output and store it in the dataframe.

In [7]:
df = pd.read_html(str(table))
df1 = pd.DataFrame(df[0])
df1 

Unnamed: 0,Rank,Country / Dependency,Region,Population,Percentage of the world,Date,Source (official or from the United Nations),Notes
0,–,World,,7941814000,100%,7 Apr 2022,UN projection[1],
1,1,China,Asia,1412600000,,31 Dec 2021,National annual estimate[2],The population figure refers to mainland China...
2,2,India,Asia,1375019925,,7 Apr 2022,National population clock[3],The figure includes the population of Jammu an...
3,3,United States,Americas,332607072,,7 Apr 2022,National population clock[4],The figure includes the 50 states and the Dist...
4,4,Indonesia,Asia[b],272248500,,1 Jul 2021,National annual estimate[5],
...,...,...,...,...,...,...,...,...
237,–,Niue (New Zealand),Oceania,1549,,1 Jul 2021,National annual projection[92],
238,–,Tokelau (New Zealand),Oceania,1501,,1 Jul 2021,National annual projection[92],
239,195,Vatican City,Europe,825,,1 Feb 2019,Monthly national estimate[196],The total population of 825 consisted of 453 r...
240,–,Cocos (Keeling) Islands (Australia),Oceania,573,,30 Jun 2020,National annual estimate[195],
