****
## JSON exercise

Using data in file 'data/world_bank_projects.json' and the techniques demonstrated above,
1. Find the 10 countries with most projects
2. Find the top 10 major project themes (using column 'mjtheme_namecode')
3. In 2. above you will notice that some entries have only the code and the name is missing. Create a dataframe with the missing names filled in.

### 1. Sort countries according to the number of projects they have been awarded 

In [58]:
df = pd.read_json('data/world_bank_projects.json')

1. Replace NaNs with an empty list
2. Create a new column based on the number of projects
3. Sort values

In [59]:
df.loc[df['mjtheme'].isnull(), 'mjtheme'] = df.loc[df['mjtheme'].isnull(), 'mjtheme'].apply(lambda x: [])

In [60]:
df = df.assign(num_of_projects = df.mjtheme.apply(len)) 

In [61]:
df.sort_values(by='num_of_projects', ascending=False)[['countryname', 'themecode']].head(10)

Unnamed: 0,countryname,themecode
219,Republic of Niger,7827512849
159,Democratic Republic of Sao Tome and Prin,2279274021
75,Kyrgyz Republic,7023214527
388,Republic of Tajikistan,7879857754
390,Hashemite Kingdom of Jordan,7983767780
391,Republic of Tunisia,6640512967
397,Republic of Indonesia,2753542857
155,Mongolia,7586415977
64,Islamic State of Afghanistan,2786252356
290,Republic of Rwanda,5354555259


### 2. Top 10 major project themes

1. Make a long list from the major theme codes of all projects
2. Use a counter to obtain their frequencies
3. Sort in descending order

In [63]:
all_codes = []
for item in df.mjthemecode.dropna():
    item = item.split(',')
    #print(item)
    all_codes.append(item)

In [64]:
all_codes_flat = [item for sublist in all_codes for item in sublist]

In [65]:
from collections import Counter
top = Counter(all_codes_flat).most_common(10)
print(top)

[('11', 250), ('10', 216), ('8', 210), ('2', 199), ('6', 168), ('4', 146), ('7', 130), ('5', 77), ('9', 50), ('1', 38)]


Make a dictionary from major theme codes and names

In [66]:
all_namecode = []
for item in df.mjtheme_namecode:
    all_namecode.extend(item)

In [67]:
mjtheme_dict = {}
for item in all_namecode:
    code, name = item.values()
    if not name:
        #print('Empty name')
        continue
    elif code in mjtheme_dict.keys():
        #print('Already added. Checking...')
        if mjtheme_dict[code] == [name]:
            #print('Same name. Pass.')
            continue
        else:
            mjtheme_dict.setdefault(code, []).append(name)
    else:
        mjtheme_dict.setdefault(code, []).append(name)

Top 10 major themes:

In [68]:
for item in top:
    code, freq = item
    print(mjtheme_dict[code], 'awarded', freq, 'times')

['Environment and natural resources management'] awarded 250 times
['Rural development'] awarded 216 times
['Human development'] awarded 210 times
['Public sector governance'] awarded 199 times
['Social protection and risk management'] awarded 168 times
['Financial and private sector development'] awarded 146 times
['Social dev/gender/inclusion'] awarded 130 times
['Trade and integration'] awarded 77 times
['Urban development'] awarded 50 times
['Economic management'] awarded 38 times


### 3. Filling the missing names

Verify that some code names are missing:

In [70]:
all_namecode # From above

[{'code': '8', 'name': 'Human development'},
 {'code': '11', 'name': ''},
 {'code': '1', 'name': 'Economic management'},
 {'code': '6', 'name': 'Social protection and risk management'},
 {'code': '5', 'name': 'Trade and integration'},
 {'code': '2', 'name': 'Public sector governance'},
 {'code': '11', 'name': 'Environment and natural resources management'},
 {'code': '6', 'name': 'Social protection and risk management'},
 {'code': '7', 'name': 'Social dev/gender/inclusion'},
 {'code': '7', 'name': 'Social dev/gender/inclusion'},
 {'code': '5', 'name': 'Trade and integration'},
 {'code': '4', 'name': 'Financial and private sector development'},
 {'code': '6', 'name': 'Social protection and risk management'},
 {'code': '6', 'name': ''},
 {'code': '2', 'name': 'Public sector governance'},
 {'code': '4', 'name': 'Financial and private sector development'},
 {'code': '11', 'name': 'Environment and natural resources management'},
 {'code': '8', 'name': ''},
 {'code': '10', 'name': 'Rural dev

Create a new mjtheme_namecode column with missing values filled

In [71]:
new_column = []
for item in df.mjtheme_namecode:
    temp = []
    for d in item:
        if not d['name']:
            d['name'] = mjtheme_dict[d['code']][0]
        else:
            pass
        temp.append(d)
    new_column.append(temp)

Assign the new column to its place

In [72]:
df['mjtheme_namecode'] = new_column

Verify that missing values are filled:

In [73]:
all_namecode = []
for item in df.mjtheme_namecode:
    all_namecode.extend(item)
all_namecode

[{'code': '8', 'name': 'Human development'},
 {'code': '11', 'name': 'Environment and natural resources management'},
 {'code': '1', 'name': 'Economic management'},
 {'code': '6', 'name': 'Social protection and risk management'},
 {'code': '5', 'name': 'Trade and integration'},
 {'code': '2', 'name': 'Public sector governance'},
 {'code': '11', 'name': 'Environment and natural resources management'},
 {'code': '6', 'name': 'Social protection and risk management'},
 {'code': '7', 'name': 'Social dev/gender/inclusion'},
 {'code': '7', 'name': 'Social dev/gender/inclusion'},
 {'code': '5', 'name': 'Trade and integration'},
 {'code': '4', 'name': 'Financial and private sector development'},
 {'code': '6', 'name': 'Social protection and risk management'},
 {'code': '6', 'name': 'Social protection and risk management'},
 {'code': '2', 'name': 'Public sector governance'},
 {'code': '4', 'name': 'Financial and private sector development'},
 {'code': '11', 'name': 'Environment and natural resou