<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#World-Bank-Project" data-toc-modified-id="World-Bank-Project-1"><span class="toc-item-num">1&nbsp;&nbsp;</span><span style="font-family: sans-serif">World Bank Project</span></a></span><ul class="toc-item"><li><span><a href="#-Top-10-countries-with-the-most-projects" data-toc-modified-id="-Top-10-countries-with-the-most-projects-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span><span style="font-family: sans-serif"> Top 10 countries with the most projects</span></a></span></li><li><span><a href="#-10-most-frequent-projects" data-toc-modified-id="-10-most-frequent-projects-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span><span style="font-family: sans-serif"> 10 most frequent projects</span></a></span></li><li><span><a href="#-Fill-empty-values-for-project-names" data-toc-modified-id="-Fill-empty-values-for-project-names-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span><span style="font-family: sans-serif"> Fill empty values for project names</span></a></span></li></ul></li></ul></div>

# <span style='font-family:sans-serif'> **JSON Project**

##  <span style='font-family:sans-serif'>World Bank Project
    This project uses a single multi-nested JSON file called 'World Bank Project'.  
    Below are a few short examples of reading, manipulating, and analyzing the data using Pandas dataframes.  
    
    The analysis requested included 
    1.1 Top 10 countries with the most projects  
    1.2 10 most frequent projects lists  
    1.3 fill empty values for project names

### <span style='font-family:sans-serif'> Top 10 countries with the most projects
    This code loads the json data file from world bank projects into a dataframe, groups, counts, and sorts the data on the country name and displays the first 10.  

In [48]:
  #Load json file from local directory
  #read the file into a df & rename columns

import json, pandas as pd
json.load((open('world_bank_projects.json')))
sample_json_df = pd.read_json('world_bank_projects.json')                      
sample_json_df.rename(columns = {'countryshortname':'Country','status':'Number of Projects'}, inplace=True)

  #group, count, sort on countryshortname
json_df = sample_json_df.groupby('Country').count().sort_values('Number of Projects', ascending = False).reset_index()

  #display top 10 only
print(json_df[['Country','Number of Projects']].head(10))

              Country  Number of Projects
0               China                  19
1           Indonesia                  19
2             Vietnam                  17
3               India                  16
4  Yemen, Republic of                  13
5               Nepal                  12
6          Bangladesh                  12
7             Morocco                  12
8          Mozambique                  11
9              Africa                  11


### <span style='font-family:sans-serif'> 10 most frequent projects
    Here we load the JSON into a normalized dataframe and using the 'mjtheme-namecode' column, counts the 10 most frequent projects.

In [49]:
  #Load JSON data
  #Read JSON data into normalized dataframe

import json, pandas as pd
from pandas.io.json import json_normalize
df = json.load((open('world_bank_projects.json')))
df = json_normalize(df, record_path=['mjtheme_namecode'])

  #use the 'code' column to record 'count'
df.rename(columns = {'code':'count'}, inplace = True)

  #display the name and count of all projects
print(df.groupby("name").count().sort_values('count', ascending = False).head(10).reset_index())

                                           name  count
0  Environment and natural resources management    223
1                             Rural development    202
2                             Human development    197
3                      Public sector governance    184
4         Social protection and risk management    158
5      Financial and private sector development    130
6                                                  122
7                   Social dev/gender/inclusion    119
8                         Trade and integration     72
9                             Urban development     47


### <span style='font-family:sans-serif'> Fill empty values for project names
    Here we use the normalized dataframe, deep copy to reference 'mjtheme_namecode' to fill correct names to all rows, basically remapping the name in from the new dictionary to fill blanks.

In [58]:
  #import libraries, load JSON data from local directory and read into a flattened dataframe

import json, copy
import pandas as pd
from pandas.io.json import json_normalize
df = json.load((open('world_bank_projects.json')))
df = json_normalize(df, record_path=['mjtheme_namecode'])

  #deep copy the dataframe, remove blanks, drop duplicates, type cast as dictonary, map names to their code
dfNew = df.copy(deep = True)
dfNew = dfNew[dfNew.name != ""]
dfNew = dfNew.drop_duplicates()
dfNew = dfNew.set_index('code')
dfNew = dict(dfNew)
df['name'] = df['code'].map(dfNew['name'])

  #display all codes and associated names
df2 = df.groupby('name').count()
df3 = df2.sort_values('code').reset_index(level = 0)    