# JSON examples and exercise
****
+ get familiar with packages for dealing with JSON
+ study examples with JSON strings and files 
+ work on exercise to be completed and submitted 
****
+ reference: http://pandas.pydata.org/pandas-docs/stable/io.html#io-json-reader
+ data source: http://jsonstudio.com/resources/
****

****
## JSON exercise

Using data in file 'data/world_bank_projects.json' and the techniques demonstrated above,
1. Find the 10 countries with most projects
2. Find the top 10 major project themes (using column 'mjtheme_namecode')
3. In 2. above you will notice that some entries have only the code and the name is missing. Create a dataframe with the missing names filled in.

### Importing necessary lbraries.

In [1]:
import pandas as pd
from collections import defaultdict

### Reading World Bank JSON File as a Pandas DataFrame

In [2]:
path = 'data/world_bank_projects.json'

df = pd.read_json( path )

### Displaying the top 10 countries with World Bank Projects

In [3]:
df.countryname.value_counts()[:10]

Republic of Indonesia              19
People's Republic of China         19
Socialist Republic of Vietnam      17
Republic of India                  16
Republic of Yemen                  13
Kingdom of Morocco                 12
People's Republic of Bangladesh    12
Nepal                              12
Republic of Mozambique             11
Africa                             11
Name: countryname, dtype: int64

### Creating a dictionary of all major project themes along with their code as the key.

In [4]:
project_code_names = defaultdict(str)

for x in df.mjtheme_namecode:
    for y in x:
        if y['name'] != "":
            project_code_names[y["code"]] = y["name"]
            
 

### Filling in the missing values for project theme names using the dictionary created above

In [5]:
for i, x in enumerate(df.mjtheme_namecode):
    for j, y in enumerate(x):
        if y['name'] == "":
            df.mjtheme_namecode[i][j]["name"] = project_code_names[y["code"]] 

### Counting the major themes in the DataFrame and storing the result in a Dictionary

In [6]:
count = defaultdict(int)

for x in df.mjtheme_namecode:
    for y in x:
        for key, value in y.items():
            if key == "name":
                count[value] += 1

### Making a sorted copy of the dictionary created above using its value as the key for sorting

In [7]:
sorted_themes = { k:v for k, v in sorted(count.items(), key=lambda num: num[1], reverse=True) }

### Displaying the top 10 major Project Themes

In [8]:
for i, theme in enumerate(sorted_themes.items()):
    if i < 10:
        print( theme  )

('Environment and natural resources management', 250)
('Rural development', 216)
('Human development', 210)
('Public sector governance', 199)
('Social protection and risk management', 168)
('Financial and private sector development', 146)
('Social dev/gender/inclusion', 130)
('Trade and integration', 77)
('Urban development', 50)
('Economic management', 38)
