# Project - Process Data

Continue with this page [GDP wiki](https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)_per_capita)

It contains data from IMF, World Bank, and United Nations.

You job is to get the data and find the average value of the three estimates, but only if all three sources have a value.

Write the result to a CSV file.

To get started.
0. What libraries can you use?
1. Read the data and prepare it in a list.
2. Convert the data.
3. Process the data.
4. Write the data.

In [1]:
# Step 0: What librareis
import pandas as pd

In [2]:
# Step 1: Read the data and prepare in a list

url = 'https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)_per_capita'

tables = pd.read_html(url)

In [3]:
data = tables[1]
data_list = data.values.tolist()[1:]

In [4]:
data_list[0]

['Monaco\u202f*', 'Europe', '—', '—', '234316', '2021', '234317', '2021']

In [7]:
# Step 2: Convert the data
def check_all_data(imf, wb, un):
    if not imf.isdigit():
        return False
    if not wb.isdigit():
        return False
    if not un.isdigit():
        return False
    return True

data_items = []

for item in data_list:
    country = item[0]
    imf = item[2]
    wb = item[4]
    un = item[6]

    if not check_all_data(imf, wb, un):
        print('skipping', country, imf, wb, un)
        continue
        
    if country.endswith('\u202f*'):
        country = country.replace('\u202f*', '')
        
    data_items.append(
        {
            'country': country,
            'imf': int(imf),
            'wb': int(wb),
            'un': int(un)
        }
    )

skipping Monaco * — 234316 234317
skipping Liechtenstein * — 157755 169260
skipping Bermuda * — 114090 112653
skipping Isle of Man * — 87158 —
skipping Cayman Islands * — 86569 85250
skipping Channel Islands * — 75153 —
skipping Faroe Islands * — 69010 —
skipping Greenland * — 54571 58185
skipping British Virgin Islands * — — 49444
skipping U.S. Virgin Islands * — 39552 —
skipping New Caledonia * — 37160 34994
skipping Guam * — 35905 —
skipping Taiwan * 35513 — 33011
skipping Sint Maarten (Dutch part) * — 28988 26199
skipping Northern Mariana Islands * — 23707 —
skipping Saint Martin (French part) * — 21921 —
skipping Turks and Caicos Islands * — 20909 20909
skipping French Polynesia * — 19915 19915
skipping Cook Islands * — — 19264
skipping Anguilla * — — 19216
skipping Curaçao * — 17718 14183
skipping Montserrat * — — 16199
skipping American Samoa * — 15743 —
skipping Cuba * — 9500 11255
skipping Zanzibar — — 1211
skipping Syria * — 533 925
skipping North Korea * — — 654


In [8]:
data_items[0]

{'country': 'Luxembourg', 'imf': 127673, 'wb': 133590, 'un': 133745}

In [9]:
# Step 3: Process the data.

processed_data = []

for item in data_items:
    avg = int(round((item['imf'] + item['wb'] + item['un'])/3, 0))
    
    processed_data.append(
        {
            'country': item['country'],
            'avg gdp pc': avg
        }
    )

In [10]:
processed_data[0]

{'country': 'Luxembourg', 'avg gdp pc': 131669}

In [11]:
# Step 4: Write the data.

import csv

with open('records_v4.csv', 'w') as f:
    csv_writer = csv.DictWriter(f, fieldnames=['country', 'avg gdp pc'])
    csv_writer.writeheader()
    csv_writer.writerows(processed_data)