# <span style='font-family:sans-serif'> **World Bank Data Analysis**

  <span style='font-family:sans-serif'>This project uses a multi-nested JSON file called 'World Bank Project' to perform a few analysis on the projects funded by world bank.  Every country that has projects is listed, some countries can have more than 1 project.   
    
    The analysis requested include
    
    1 Top 10 countries with the most projects  
    2 10 most frequent projects lists  
    3 fill empty values for project names

### <span style='font-family:sans-serif'> Top 10 countries with the most projects
    This code loads the json data file from world bank projects into a dataframe, groups, counts, and sorts the data on the country name and displays the first 10.  

In [4]:
  #The JSON file is placed in a dataframe 'sample_json_df'

import json, pandas as pd
json.load((open('world_bank_projects.json')))
sample_json_df = pd.read_json('world_bank_projects.json')  

  #changed the column names to reflect information to be displayed
    
sample_json_df.rename(columns = {'countryshortname':'Country','status':'Number of Projects'}, inplace=True)

  #Aggregate by'Country' and display in order of number of projects
    
json_df = sample_json_df.groupby('Country').count().sort_values('Number of Projects', ascending = False).reset_index()

  #display top 10 only
print(json_df[['Country','Number of Projects']].head(10))

              Country  Number of Projects
0               China                  19
1           Indonesia                  19
2             Vietnam                  17
3               India                  16
4  Yemen, Republic of                  13
5               Nepal                  12
6          Bangladesh                  12
7             Morocco                  12
8          Mozambique                  11
9              Africa                  11





### <span style='font-family:sans-serif'> 10 most frequent projects
    
    
    Here we load the JSON into a normalized dataframe and using the 'mjtheme-namecode' column, counts the 10 most frequent projects.

In [5]:
  #Load JSON data
  #Read JSON data into normalized dataframe

import json, pandas as pd
from pandas.io.json import json_normalize
df = json.load((open('world_bank_projects.json')))
df = json_normalize(df, record_path=['mjtheme_namecode'])

  #add a column to aggregate
df = df.assign(count = 1)

  #aggregate and print the name's and code # of all projects
print(df.groupby(['code', 'name']).agg('count').sort_values('name', ascending = True).reset_index())

  #the empty 'name' values cannot aggregate properly - the quick workaround is to count by 'code'
  #this gives an accurate count of projects but requires manually connecting the name of the project
  #the next step fixes this issue
print(df.groupby(['code']).agg('count').sort_values('count', ascending = False).reset_index())

   code                                          name  count
0     1                                                    5
1     8                                                   13
2     7                                                   11
3     6                                                   10
4     5                                                    5
5     9                                                    3
6     3                                                    3
7     4                                                   16
8     2                                                   15
9    11                                                   27
10   10                                                   14
11    1                           Economic management     33
12   11  Environment and natural resources management    223
13    4      Financial and private sector development    130
14    8                             Human development    197
15    2                 

### <span style='font-family:sans-serif'> Fill empty values for project names
    Here we use the normalized dataframe, deep copy to reference 'mjtheme_namecode' to fill correct names to all rows, basically remapping the name in from the new dictionary to fill blanks.

In [6]:
  #below the problem from 1.2 is re-ran using the dataframe with all names filled
    
  #load a new instance of the dataframe
df = json.load((open('world_bank_projects.json')))
df = json_normalize(df, record_path=['mjtheme_namecode'])

  #create dictionary copy of dataframe & map(overwrite)'name' values
dfNew = df.copy(deep = True)
dfNew = dfNew[dfNew.name != ""]
dfNew = dfNew.drop_duplicates()
dfNew = dfNew.set_index('code')
dfNew = dict(dfNew)
df['name'] = df['code'].map(dfNew['name'])    

  #count projects and display with code's and name's
df = df.assign(count = 1)
print(df.groupby(['code', 'name']).agg('count').sort_values('count', ascending = True).reset_index())

   code                                          name  count
0     3                                   Rule of law     15
1     1                           Economic management     38
2     9                             Urban development     50
3     5                         Trade and integration     77
4     7                   Social dev/gender/inclusion    130
5     4      Financial and private sector development    146
6     6         Social protection and risk management    168
7     2                      Public sector governance    199
8     8                             Human development    210
9    10                             Rural development    216
10   11  Environment and natural resources management    250
