## INFO212 Assignment2 T22

### Data Collection  
Request:  
- Download JSON data files containing economic indicators from reputable sources (e.g. such as the World Bank or IMF).  
- Ensure the data spans at least 5 years for multiple countries. 

In [10]:
import requests
import json

# 定义查询的国家和指标
countries = ["CN", "BR", "JP", "CA", "NP"]
indicators = {
    "GDP": "NY.GDP.MKTP.CD",
    "Unemployment": "SL.UEM.TOTL.ZS",
    "Inflation": "FP.CPI.TOTL"
}
years = range(2010, 2019 + 1)
base_url = "http://api.worldbank.org/v2/country/{}/indicator/{}?date={}:{}&format=json&per_page=1000"

# 获取数据并合并成一个JSON文件
data = {}
for indicator_name, indicator_code in indicators.items():
    data[indicator_name] = {}
    for country in countries:
        response = requests.get(base_url.format(country, indicator_code, years[0], years[-1]))
        if response.status_code == 200:
            result = response.json()
            if len(result) > 1:
                data[indicator_name][country] = result[1]
        else:
            print(f"Error fetching data for {country} - {indicator_name}: {response.status_code}")

# 保存数据到JSON文件
with open("world_bank_data.json", "w") as f:
    json.dump(data, f, indent=4)

print("Data has been saved to world_bank_data.json")


Data has been saved to world_bank_data.json


## Data Preparation  
Request:  
- If required, load the JSON files into Pandas DataFrames.     
- Clean the data by handling missing values, duplicates, and incorrect data 
types. 

In [12]:
import pandas as pd

# 国家名称映射字典
import pandas as pd

# 国家名称映射字典
country_names = {"CN": "China", "BR": "Brazil", "JP": "Japan", "CA": "Canada", "NP": "Nepal"}

def create_dataframe(data, indicator):
    records = [(country_names[country_code], int(entry['date']), entry['value']) for country_code, values in data[indicator].items() for entry in values]
    df = pd.DataFrame(records, columns=['country', 'date', indicator])
    df_pivot = df.pivot(index='date', columns='country', values=indicator)
    return df_pivot

# 创建GDP、Unemployment、Inflation的DataFrame
gdp_df = create_dataframe(data, "GDP")
unemployment_df = create_dataframe(data, "Unemployment")
inflation_df = create_dataframe(data, "Inflation")

# 显示DataFrame
print("GDP DataFrame:")
print(gdp_df)
print("\nUnemployment DataFrame:")
print(unemployment_df)
print("\nInflation DataFrame:")
print(inflation_df)

GDP DataFrame:
country        Brazil        Canada         China         Japan         Nepal
date                                                                         
2010     2.208838e+12  1.617343e+12  6.087192e+12  5.759072e+12  1.600266e+10
2011     2.616156e+12  1.793327e+12  7.551546e+12  6.233147e+12  2.157387e+10
2012     2.465228e+12  1.828366e+12  8.532185e+12  6.272363e+12  2.170310e+10
2013     2.472820e+12  1.846597e+12  9.570471e+12  5.212328e+12  2.216221e+10
2014     2.456044e+12  1.805750e+12  1.047562e+13  4.896994e+12  2.273161e+10
2015     1.802212e+12  1.556509e+12  1.106157e+13  4.444931e+12  2.436080e+10
2016     1.795693e+12  1.527995e+12  1.123331e+13  5.003678e+12  2.452411e+10
2017     2.063515e+12  1.649266e+12  1.231049e+13  4.930837e+12  2.897159e+10
2018     1.916934e+12  1.725329e+12  1.389491e+13  5.040881e+12  3.311153e+10
2019     1.873288e+12  1.743725e+12  1.427997e+13  5.117994e+12  3.418618e+10

Unemployment DataFrame:
country  Brazil  Canada 