### Web Scrapping on Water Use by Country

In [1]:
import requests
import pandas as pd
from bs4 import BeautifulSoup

##### Requests:
- With Python Requests, you can easily send HTTP requests to any web server or web service and receive responses which is perfect for web scraping.

##### Beautiful Soup:
- Beautiful Soup is a Python package for parsing HTML and XML documents. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.

##### Pandas:
- Pandas can also help us easily scrape tables from HTML pages.

##### Excel file or csv file:
- pandas offers a lot of functionality for reading different data types into DataFrames. For example, you can read data from Excel files, text files (like CSV), SQL databases, web APIs stored in JSON data, and even directly from webpages.

In [2]:
url="https://www.worldometers.info/water/"

- The above is the url we are going to fetch for scrapping the data.

- with the help of requests if we get the response of 200 it means that we can scrap that page.

In [3]:
page=requests.get(url)
page

<Response [200]>

##### Beautiful Soup
- Beautiful Soup then parses the document using the best available parser.


##### prettify() method:
- The prettify() method will turn a Beautiful Soup parse tree into a nicely formatted Unicode string, with a separate line for each tag and each string

In [4]:
soup=BeautifulSoup(page.text)
soup.prettify

<bound method Tag.prettify of <!DOCTYPE html>
<!--[if IE 8]> <html lang="en" class="ie8"> <![endif]--><!--[if IE 9]> <html lang="en" class="ie9"> <![endif]--><!--[if !IE]><!--><html lang="en"> <!--<![endif]--> <head> <meta charset="utf-8"/> <meta content="IE=edge" http-equiv="X-UA-Compatible"/> <meta content="width=device-width, initial-scale=1" name="viewport"/> <title>Water Use Statistics - Worldometer</title><meta content="Live statistics showing how much water is being used in the world. Global water use data by year and by country" name="description"/><!-- Favicon --><link href="/favicon/favicon.ico" rel="shortcut icon" type="image/x-icon"/><link href="/favicon/apple-icon-57x57.png" rel="apple-touch-icon" sizes="57x57"/><link href="/favicon/apple-icon-60x60.png" rel="apple-touch-icon" sizes="60x60"/><link href="/favicon/apple-icon-72x72.png" rel="apple-touch-icon" sizes="72x72"/><link href="/favicon/apple-icon-76x76.png" rel="apple-touch-icon" sizes="76x76"/><link href="/favicon/a

##### soup.find():
- soup.find will find the table tag and id related to required table and store it in table variable

In [5]:
table=soup.find("table",id="example2")

##### table body:
- Now,the content is present in table and we only required the table body that is in tbody tag. so we use find tbody.

In [6]:
table=table.find("tbody")

In [7]:
table

<tbody> <tr> <td style="font-weight: bold; font-size:17px; text-align:left; padding-left:5px; padding-top:10px; padding-bottom:10px"><a href="/water/afghanistan-water/">Afghanistan</a></td> <td style="font-weight: bold; text-align:right;"><a data-toggle="tooltip" href="/water/afghanistan-water/#water-use" title="Year:2000">20,280,000,000</a></td> <td style="font-weight: bold; text-align:right;">2,674</td> <td style="font-weight: bold; text-align:right;"> <a data-toggle="tooltip" href="/world-population/afghanistan-population/" title="Year:2000">20,779,953</a></td> </tr> <tr> <td style="font-weight: bold; font-size:17px; text-align:left; padding-left:5px; padding-top:10px; padding-bottom:10px"><a href="/water/albania-water/">Albania</a></td> <td style="font-weight: bold; text-align:right;"><a data-toggle="tooltip" href="/water/albania-water/#water-use" title="Year:2006">1,311,000,000</a></td> <td style="font-weight: bold; text-align:right;">1,173</td> <td style="font-weight: bold; text-

In [8]:
rows=table.find_all("tr")

In [9]:
rows

[<tr> <td style="font-weight: bold; font-size:17px; text-align:left; padding-left:5px; padding-top:10px; padding-bottom:10px"><a href="/water/afghanistan-water/">Afghanistan</a></td> <td style="font-weight: bold; text-align:right;"><a data-toggle="tooltip" href="/water/afghanistan-water/#water-use" title="Year:2000">20,280,000,000</a></td> <td style="font-weight: bold; text-align:right;">2,674</td> <td style="font-weight: bold; text-align:right;"> <a data-toggle="tooltip" href="/world-population/afghanistan-population/" title="Year:2000">20,779,953</a></td> </tr>,
 <tr> <td style="font-weight: bold; font-size:17px; text-align:left; padding-left:5px; padding-top:10px; padding-bottom:10px"><a href="/water/albania-water/">Albania</a></td> <td style="font-weight: bold; text-align:right;"><a data-toggle="tooltip" href="/water/albania-water/#water-use" title="Year:2006">1,311,000,000</a></td> <td style="font-weight: bold; text-align:right;">1,173</td> <td style="font-weight: bold; text-align

- Next,we find the table's all row which is present in tr tag and contains style tag

In [10]:
column_info=[]
for i in rows:
    cols=i.find_all("td")
    #print(cols)
    water_info=[c.text for c in cols]
    #print(water_info)
    column_info.append(water_info)
print(column_info)

[['Afghanistan', '20,280,000,000', '2,674', ' 20,779,953'], ['Albania', '1,311,000,000', '1,173', ' 3,063,021'], ['Algeria', '9,978,000,000', '674', ' 40,551,392'], ['Angola', '705,800,000', '100', ' 19,433,602'], ['Antigua and Barbuda', '11,500,000', '348', ' 90,409'], ['Argentina', '37,780,000,000', '2,505', ' 41,320,500'], ['Armenia', '2,847,000,000', '2,649', ' 2,944,791'], ['Australia', '16,130,000,000', '1,821', ' 24,262,712'], ['Austria', '3,492,000,000', '1,138', ' 8,409,949'], ['Azerbaijan', '12,780,000,000', '3,556', ' 9,845,320'], ['Bahrain', '434,400,000', '835', ' 1,425,792'], ['Bangladesh', '35,870,000,000', '681', ' 144,304,167'], ['Barbados', '81,000,000', '803', ' 276,323'], ['Belarus', '1,452,000,000', '421', ' 9,445,643'], ['Belgium', '6,005,000,000', '1,515', ' 10,859,940'], ['Belize', '101,000,000', '1,119', ' 247,315'], ['Benin', '130,000,000', '50', ' 7,076,733'], ['Bhutan', '338,000,000', '1,379', ' 671,613'], ['Bolivia', '2,088,000,000', '579', ' 9,884,781'], [

- Here,We take an empty list and then we take for loop in rows which contains all the tr tags, and for every tr tag we find all td tags also and then we append the list which we took as an empty list.

In [11]:
Country=[]
Yearly_Water_Used=[]
Daily_Water_Used=[]
Population=[]
for i in column_info:
    Country.append(i[0])
    Yearly_Water_Used.append(i[1])
    Daily_Water_Used.append(i[2])
    Population.append(i[3])

- Now for every column we take an empty list, the data is now present in pandem_info list in form of another list so we use for loop to iterate over every item of list and append each list item using indexing in the following column wise. And,the list which we took empty will now contain every iterated list items by indexing

In [12]:
data={'Country':Country,'Yearly_Water_Used':Yearly_Water_Used,'Daily_Water_Used':Daily_Water_Used,'Population':Population}

In [13]:
data

{'Country': ['Afghanistan',
  'Albania',
  'Algeria',
  'Angola',
  'Antigua and Barbuda',
  'Argentina',
  'Armenia',
  'Australia',
  'Austria',
  'Azerbaijan',
  'Bahrain',
  'Bangladesh',
  'Barbados',
  'Belarus',
  'Belgium',
  'Belize',
  'Benin',
  'Bhutan',
  'Bolivia',
  'Bosnia and Herzegovina',
  'Botswana',
  'Brazil',
  'Bulgaria',
  'Burkina Faso',
  'Burundi',
  "Côte d'Ivoire",
  'Cabo Verde',
  'Cambodia',
  'Cameroon',
  'Canada',
  'Central African Republic',
  'Chad',
  'Chile',
  'China',
  'Colombia',
  'Comoros',
  'Congo',
  'Costa Rica',
  'Croatia',
  'Cuba',
  'Cyprus',
  'Czech Republic (Czechia)',
  'Denmark',
  'Djibouti',
  'Dominica',
  'Dominican Republic',
  'DR Congo',
  'Ecuador',
  'Egypt',
  'El Salvador',
  'Equatorial Guinea',
  'Eritrea',
  'Estonia',
  'Eswatini',
  'Ethiopia',
  'Fiji',
  'Finland',
  'France',
  'Gabon',
  'Gambia',
  'Georgia',
  'Germany',
  'Ghana',
  'Greece',
  'Grenada',
  'Guatemala',
  'Guinea',
  'Guinea-Bissau',
  

In [14]:
Water_info=pd.DataFrame(data)
Water_info

Unnamed: 0,Country,Yearly_Water_Used,Daily_Water_Used,Population
0,Afghanistan,20280000000,2674,20779953
1,Albania,1311000000,1173,3063021
2,Algeria,9978000000,674,40551392
3,Angola,705800000,100,19433602
4,Antigua and Barbuda,11500000,348,90409
...,...,...,...,...
174,Venezuela,22630000000,2275,27247610
175,Vietnam,82030000000,2681,83832661
176,Yemen,3565000000,486,20107409
177,Zambia,1572000000,393,10971698
