# Webscrapping



Web scraping is the process of automatically extracting data from websites. It involves using programming techniques to navigate a website, locate and extract specific data, and store it for further analysis or use.

Web scraping is commonly used in a variety of applications, including data mining, market research, information retrieval, and automated testing.

Some common techniques used in web scraping include:

* Using libraries like BeautifulSoup or Scrapy to parse HTML and extract data
* Sending HTTP requests to retrieve web pages and extract data using regular expressions or XPath
* Handling dynamic content and handling JavaScript-generated content
* Handling CAPTCHAs and other forms of website protection
* Handling pagination and navigating through multiple pages of data

Web scraping can be used for both legal and illegal purposes. It's important to respect the website's terms of service and to only scrape data that is publicly available and not protected by copyright or other legal restrictions.

If you're interested in learning more about web scraping, there are many resources available online, including tutorials, documentation, and forums.




In [1]:
import requests
from bs4 import  BeautifulSoup
import pandas as pd

url = "https://www.worldometers.info/gdp/gdp-by-country/"
response = requests.get(url)

soup = BeautifulSoup(response.content,"html.parser")
print(soup.title)
soup.title.text
# print(soup.prettify())  #print the whole html code

tables = soup.find_all("table") 
# print(tables)

dataframe = []

for i,  table in enumerate(tables):
    rows = table.find_all("tr")
    data = []
    for row in rows:
        cols = row.find_all("td")
        cols = [col.text.strip() for col in cols]
        data.append(cols)
    df = pd.DataFrame(data)
    dataframe.append(df)
dataframe[0]
        
        


<title>GDP by Country - Worldometer</title>


Unnamed: 0,0,1,2,3,4,5,6,7
0,,,,,,,,
1,1,United States,"$25,462,700,000,000",$25.463 trillion,2.06%,341534046,"$74,554",25.32%
2,2,China,"$17,963,200,000,000",$17.963 trillion,2.99%,1425179569,"$12,604",17.86%
3,3,Japan,"$4,231,140,000,000",$4.231 trillion,1.03%,124997578,"$33,850",4.21%
4,4,Germany,"$4,072,190,000,000",$4.072 trillion,1.79%,84086227,"$48,429",4.05%
...,...,...,...,...,...,...,...,...
173,173,Sao Tome & Principe,"$546,680,342",$547 million,0.93%,226305,"$2,416",0.00%
174,174,Micronesia,"$427,094,119",$427 million,-0.62%,523477,$816,0.00%
175,175,Marshall Islands,"$279,667,900",$280 million,1.50%,40077,"$6,978",0.00%
176,176,Kiribati,"$223,352,943",$223 million,1.56%,130469,"$1,712",0.00%


In [5]:
#making plot of the above table using plotly

import plotly.express as px
import plotly.graph_objects as go


fig = px.line(dataframe[0])


fig.show()


In [3]:
# !pip install --upgrade nbformat
