# JSON examples and exercise
****
+ get familiar with packages for dealing with JSON
+ study examples with JSON strings and files 
+ work on exercise to be completed and submitted 
****
+ reference: http://pandas.pydata.org/pandas-docs/stable/io.html#io-json-reader
+ data source: http://jsonstudio.com/resources/
****

In [1]:
import pandas as pd

## imports for Python, Pandas

In [2]:
import json
from pandas.io.json import json_normalize

****
## JSON exercise

Using data in file 'data/world_bank_projects.json' and the techniques demonstrated above,
1. Find the 10 countries with most projects
2. Find the top 10 major project themes (using column 'mjtheme_namecode')
3. In 2. above you will notice that some entries have only the code and the name is missing. Create a dataframe with the missing names filled in.

Question 1: Find top 10 countries with most projects

In [3]:
json_df=pd.read_json('world_bank_projects.json')  #reads the json data and saves to df
countprojs=json_df.countryshortname.value_counts()          #returns a list of country with most projects to least
countprojs.head(10)                                      #list only top 10 countries w/ most projects



Indonesia             19
China                 19
Vietnam               17
India                 16
Yemen, Republic of    13
Bangladesh            12
Morocco               12
Nepal                 12
Mozambique            11
Africa                11
Name: countryshortname, dtype: int64

Question 2: Top 10 major project themes 

In [4]:
# Display Theme names and Theme codes
themes = pd.DataFrame(columns=['code', 'name'])
for row in json_df.mjtheme_namecode:
    themes = themes.append(json_normalize(row))
themes.reset_index(drop=True, inplace=True)

themes.head()

Unnamed: 0,code,name
0,8,Human development
1,11,
2,1,Economic management
3,6,Social protection and risk management
4,5,Trade and integration


In [5]:
# Count the project themes and list the top 10
theme_counts = themes.name.value_counts()
print('Top 10 project themes:')
theme_counts.head(10)

Top 10 project themes:


Environment and natural resources management    223
Rural development                               202
Human development                               197
Public sector governance                        184
Social protection and risk management           158
Financial and private sector development        130
                                                122
Social dev/gender/inclusion                     119
Trade and integration                            72
Urban development                                47
Name: name, dtype: int64

 Question 3: Fill in missing theme names

In [6]:
# Create a themes dictionary to map theme codes to theme names and then display it
themes_dict = {}

for row in themes.itertuples():
    if row[2] != '':
        themes_dict[row[1]] = row[2]
        
themes_dict

{'1': 'Economic management',
 '10': 'Rural development',
 '11': 'Environment and natural resources management',
 '2': 'Public sector governance',
 '3': 'Rule of law',
 '4': 'Financial and private sector development',
 '5': 'Trade and integration',
 '6': 'Social protection and risk management',
 '7': 'Social dev/gender/inclusion',
 '8': 'Human development',
 '9': 'Urban development'}

In [7]:
# Fill in missing theme names using the themes dictionary
for row in themes.itertuples():
    if row[2] == '':
        themes.set_value(row[0], 'name', themes_dict[row[1]])
        
# Are there any more missing themes?
print('Missing themes:', len(themes[themes['name'] == '']))

Missing themes: 0


In [8]:
# Display the major project themes with missing names
print('Missing themes added:')
themes.name.value_counts().head(10)

Missing themes added:


Environment and natural resources management    250
Rural development                               216
Human development                               210
Public sector governance                        199
Social protection and risk management           168
Financial and private sector development        146
Social dev/gender/inclusion                     130
Trade and integration                            77
Urban development                                50
Economic management                              38
Name: name, dtype: int64