# Project: GDP Data extraction and process

## Introduction  
In this, You will extract data from a website using webscraping and reqeust APIs process it using Pandas and Numpy libraries.

## Project Scenario:

Creating a script that can extract the list of the top 10 largest economies of the world in descending order of their GDPs in Billion USD, as logged by the International Monetary Fund (IMF). 

URL mentioned below:

URL: https://web.archive.org/web/20230902185326/https://en.wikipedia.org/wiki/List_of_countries_by_GDP_%28nominal%29

## Objectives

 - Use Webscraping to extract required information from a website.
 - Use Pandas to load and process the tabular data as a dataframe.
 - Use Numpy to manipulate the information contatined in the dataframe.
 - Load the updated dataframe to CSV file.


In [1]:
import numpy as np
import pandas as pd
import warnings

# Suppress warnings
def warn(*args, **kwargs):
    pass
warnings.warn = warn
warnings.filterwarnings("ignore")



Extract the required GDP data from the given URL using Web Scraping.

In [2]:
URL="https://web.archive.org/web/20230902185326/https://en.wikipedia.org/wiki/List_of_countries_by_GDP_%28nominal%29"

You can use Pandas library to extract the required table directly as a DataFrame

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/images/pandas_wbs_3.png">

In [3]:

# Extract tables from webpage using Pandas
tables = pd.read_html(URL)
df = tables[3]

df.columns = range(df.shape[1])

df = df[[0,2]]

df = df.iloc[1:11,:]
# Assign column names as "Country" and "GDP (Million USD)"
df.columns = ['Country','GDP (Million USD)']
df.reset_index(drop=True, inplace=True)
print(df)

          Country GDP (Million USD)
0   United States          26854599
1           China          19373586
2           Japan           4409738
3         Germany           4308854
4           India           3736882
5  United Kingdom           3158938
6          France           2923489
7           Italy           2169745
8          Canada           2089672
9          Brazil           2081235


Modify the GDP column of the DataFrame, converting the value Million USD to Billion USD. Use the 'round()' method of Numpy library to round the value to 2 decimal places. Modify the header of the  DataFrame to "GDP (Billion USD)"

In [4]:


# Convert GDP column to integer
df["GDP (Million USD)"] = df["GDP (Million USD)"].astype(int)

# Convert the GDP from Million to Billion
df["GDP (Million USD)"] = df["GDP (Million USD)"] / 1000

# Round to 2 decimal places
df["GDP (Million USD)"] = np.round(df["GDP (Million USD)"], 2)

# Rename column
df.rename(columns={"GDP (Million USD)": "GDP (Billion USD)"}, inplace=True )

df.reset_index(drop=True, inplace=True)
print(df)

          Country  GDP (Billion USD)
0   United States           26854.60
1           China           19373.59
2           Japan            4409.74
3         Germany            4308.85
4           India            3736.88
5  United Kingdom            3158.94
6          France            2923.49
7           Italy            2169.74
8          Canada            2089.67
9          Brazil            2081.24


In [5]:
df.to_csv("./Largest_economies.csv", index=False)

df2 = pd.read_csv("./Largest_economies.csv")
print(df2)

          Country  GDP (Billion USD)
0   United States           26854.60
1           China           19373.59
2           Japan            4409.74
3         Germany            4308.85
4           India            3736.88
5  United Kingdom            3158.94
6          France            2923.49
7           Italy            2169.74
8          Canada            2089.67
9          Brazil            2081.24
