Name: Kemin Wang

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import geopandas
import os
import pycountry
import datetime
import pandas_datareader.data as web
from pandas_datareader import wb
import us
from ipywidgets import interact, interact_manual

Here I define in the global space some variables/api dictionaries. Users can add more variables of interest later on for retrieving data.

In [2]:
shp_path = r'C:\Users\Kemin\Documents\GitHub\homework-4-kemin98-1\ne_110m_admin_1_states_provinces.shp'
fred_api_dict = {'population':'POP', 'unemployment':'URN'}
wb_api_dict = {'population':'SP.POP.TOTL', 'unemployment':'SL.UEM.TOTL.NE.ZS', 'SP.POP.TOTL': 'population', 'SL.UEM.TOTL.NE.ZS':'unemployment'}

This function gets the global map with low resolution

In [3]:
def get_continent_shp(continent):
    world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
    world = world.drop(columns=['pop_est', 'gdp_md_est'])
    continent_df = world[world['continent'] == continent]
    return continent_df

This function gets all the continents for future filtering purposes

In [4]:
def get_all_continents():
    world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
    return world['continent'].unique()
    

I get this code from https://stackoverflow.com/questions/16253060/how-to-convert-country-names-to-iso-3166-1-alpha-2-values-using-python. I learn to use the pycountry package to get alpha 2 code.


In [5]:
def get_country_codes(continent_df):
    country_list = continent_df['iso_a3']
    
    input_countries = country_list

    countries = {}
    for country in pycountry.countries:
        countries[country.alpha_3] = country.alpha_2

    codes = [countries.get(country, 'Unknown code') for country in input_countries]
    
    return codes
    

This function retrieves data from the worldbank and then performs country name normalization. It can get as many variables as we want

In [6]:
def get_wb(continent_df, variables, year, codes):
    
    indicator = [wb_api_dict[variable] for variable in variables]

    df_wb = wb.download(indicator=indicator, 
                     country=codes, 
                     start=year, end=year)

    df_wb = df_wb.rename(columns=wb_api_dict)
    df_wb = df_wb.reset_index()
    df_wb['country'] = df_wb['country'].map(cname_converter)  
    return df_wb
    

This function normalizes all the names in the country column. This can be updated for more countries if we need to in the future

In [7]:
def cname_converter(cname):
    
    if cname == 'Bahamas, The':
        return 'Bahamas'
    elif cname == 'Dominican Republic':
        return 'Dominican Rep.'
    elif cname == 'United States':
        return 'United States of America'
    else:
        return cname

This function merges the continent dataframe with the data from world bank

In [8]:
def merge_wb_data(continent_df, variables, year, codes):
    df_wb = get_wb(continent_df, variables, year, codes)
    continent_df = continent_df.merge(df_wb, left_on='name', right_on='country', how='left')
    return continent_df

This function reads the states and province shapfile

In [9]:
def get_sap_shp():
    sap = geopandas.read_file(shp_path)
    return sap

This function retrieves data from the fred. This function can retrieve as many variables as we want in a specific given year

In [10]:
def get_fred_data(variables):
    start = f'{year}-01'
    end = f'{year}-01'
    us_multi_series = []
    for variable in variables:
        series_name = [st.abbr+fred_api_dict[variable] for st in us.STATES]
        us_data = web.DataReader(series_name, 'fred', start, end)
        us_data.columns = [s[:2] for s in us_data]
        us_data = us_data.T
        us_data = us_data.reset_index().rename(columns={'index':'state', datetime.datetime(year, 1, 1):variable})
        us_multi_series.append(us_data)
    
    fred_data = us_multi_series[0]
    
    for series in us_multi_series[1:]: 
        fred_data = fred_data.merge(series, on='state')

    fred_data['state'] = fred_data['state'].map(lambda s: 'US-'+s)
    
    return fred_data
        

This function merges the narrowed-down states and provinces shapefile with fred data

In [11]:
def merge_states_and_series(sap, fred_data):
    country_df = sap[sap['iso_a2'] == 'US']
    country_df = country_df.merge(fred_data, left_on='iso_3166_2', right_on='state', how='left')
        
    return country_df
    

This function concats the merged continent dataframe and the merged states and provinces shapfile together

In [12]:
def concat_continent_and_country(continent_df, country_df):
    final_df = pd.concat([continent_df, country_df])
    return final_df

This functions helps change the dataframe used when doing interactive plotting

In [13]:
def continent_or_states (df_final, areas):
    if areas in continents:
        df_final = df_final[df_final['continent']==areas]
        return df_final
    else:
        df_final = df_final[df_final['state'].str.startswith(areas, na=False)]
        return df_final

This functions prepares for the ultimate dataframe used in interactive plotting. I have included the part starting with 'if 'US' in areas'. This part will take care of US data if US is one of the countires in 'areas'/we want to investigate. But we can also create other 'if xx in areas' parts for different countries in this section if we want to see more countires on the plot.

In [14]:
def prepare_for_interactive(continent='North America'):
    continent_df = get_continent_shp(continent=continent)
    codes = get_country_codes(continent_df)
    df_wb = get_wb(continent_df, variables, year, codes)
    continent_df = merge_wb_data(continent_df, variables, year, codes)
    sap = get_sap_shp()
    if 'US' in areas:
        fred_data = get_fred_data(variables)
        country_df = merge_states_and_series(sap, fred_data)
    df_final = concat_continent_and_country(continent_df,country_df)
    return df_final
   

Define the inputs, which can be freely changed by users. Since my plot demonstrates the data in a static given year, the year variable is a integer not a list.  

In [15]:
variables = ['population', 'unemployment']
year = 2012
areas = ['North America', 'US']
continents = get_all_continents()

Make df_final a global variable easy for interactive plotting

In [16]:
df_final = prepare_for_interactive(continent='North America')

Perform interactive plotting. Change the color the plasma.

In [17]:
@interact(areas=areas, variables=variables)
def plot(areas=areas[0], variables=variables[0]):
    global df_final
    df_interactive = continent_or_states(df_final, areas)
    fig, ax = plt.subplots(figsize=(12,12))

    from mpl_toolkits.axes_grid1 import make_axes_locatable
    divider = make_axes_locatable(ax)
    cax = divider.append_axes('right', size='5%', pad=0.1)
    ax = df_interactive.plot(ax=ax, column=variables, legend=True, cax=cax, cmap='plasma')
    ax.axis('off')
    ax.set_title(f'{variables} in {areas} during {year}')

interactive(children=(Dropdown(description='areas', options=('North America', 'US'), value='North America'), D…

In this project I have used a small portion of my code in HW3, mostly in the interactive plotting part relying on ipywidgets. I will be very likely to use this code for my final project since this code is generalizable. Users can choose to plot other countires' states level data by importing those countries' shapfiles and data series while making little changes to the structure of the code. For continent level plot, this code is highly generalizable. Users can change the continent directly through inputs and retrive relavent data from the world bank. My final project involves a plotting on the population by each state in US betweem 2009-2011, therefore this code would be highly useful and I can do a interactive plot with respect to different years. 