# Lecture 20 – part I               
                                            
## Basic spatial data visualization         
   - Visualize world map with 'maps'       
   - Life expectancy on a map              
     - Raw values                          
     - Modeled deviance from 'expected'            

Case-studies:

   - CH08B How is life expectancy related to the average income of a country?     
                                             
Data used:

    worldbank-lifeexpectancy                  

___

### Part I:                       
 World map and Visualize Life-expectancy 

In [None]:
import pandas as pd
import numpy as np
from plotnine import *
import statsmodels.formula.api as smf
from datetime import datetime
from mizani.breaks import date_breaks
from mizani.formatters import date_format
from stargazer.stargazer import Stargazer
import warnings

%matplotlib inline
warnings.filterwarnings("ignore")

Import world map polygons

In [None]:
world_map = pd.read_csv("data_map/worldmap.csv")

What we need is a 'polygon'

Note: it has longitude and latitude data with groups and order -> this is important to draw a map regions and subregions are just for us to relate 

It will convert Map of World:
   - unscaled, with guides and axis labels

In [None]:
wm = (
    ggplot(world_map, aes(x="long", y="lat", group="group"))
    + geom_polygon(fill="white", color="black")
)
wm

 Set coordinates are equally distanced, with a more appropriate theme

In [None]:
(
    wm
    + coord_equal()
    + theme_minimal()
    + theme(
        axis_title_x=element_blank(),
        axis_title_y=element_blank(),
        panel_grid_minor=element_blank(),
        panel_grid=element_blank(),
        axis_text=element_blank(),
    )
)

We can create a theme_map – It is not implemented in plotnine

In [None]:
theme_map = [
    theme_minimal(),
    theme(
        axis_title_x=element_blank(),
        axis_title_y=element_blank(),
        panel_grid_minor=element_blank(),
        panel_grid=element_blank(),
        axis_text=element_blank(),
    ),
]

In [None]:
wm + coord_equal() + theme_map

Add countries as a filler:

Note: important to remove legend!


In [None]:
(
    ggplot(world_map, aes(x="long", y="lat", group="group", fill="region"))
    + geom_polygon()
    + coord_equal()
    + theme_map
    + scale_fill_discrete(guide=False)
)

We want to show life-expectancy on this map

In [None]:
life = pd.read_csv("https://osf.io/sh9mu/download")

Take year 2017 only

In [None]:
life = life.loc[lambda x: x["year"] == 2017]

We need to match the 'region' variable from world_map  and 'countryname' from lfe


There are some nonmatching names, replace them in the `life` dataset

In [None]:
rename_dict = {
    "Bahamas, The": "Bahamas",
    "Brunei Darussalam": "Brunei",
    "Congo, Dem. Rep.": "Democratic Republic of the Congo",
    "Congo, Rep.": "Republic of Congo",
    "Cote d'Ivoire": "Ivory Coast",
    "Egypt, Arab Rep.": "Egypt",
    "Gambia, The": "Gambia",
    "Iran, Islamic Rep.": "Iran",
    "Kyrgyz Republic": "Kyrgyzstan",
    "Lao PDR": "Laos",
    "Micronesia, Fed. Sts.": "Micronesia",
    "Russian Federation": "Russia",
    "Slovak Republic": "Slovakia",
    "St. Lucia": "Saint Lucia",
    "St. Vincent and the Grenadines": "Saint Vincent",
    "Trinidad and Tobago": "Trinidad",
    "United Kingdom": "UK",
    "United States": "USA",
    "Yemen, Rep.": "Yemen",
}

In [None]:
for country_name in rename_dict.keys():
    life["countryname"] = np.where(
        life["countryname"] == country_name,
        rename_dict[country_name],
        life["countryname"],
    )

Now we can match the lfe data to world_map

In [None]:
world_map_exp = world_map.merge(life, left_on = "region",right_on = "countryname",how="left")

Show the life-expectancy

In [None]:
lifeexp_map = (
    ggplot(world_map_exp, aes(x="long", y="lat", group="group", fill="lifeexp"))
    + geom_polygon()
    + coord_equal()
    + theme_map
)
lifeexp_map

Change coloring life-expectancy: scale from green to red

In [None]:
(
    lifeexp_map
    + scale_fill_gradient(low="red", high="lightgreen", name="")
    + ggtitle("Life expectancy at birth in years (2017)")
)

#### Task:
  - Plot instead of the raw life-expectancy the residuals of the following model:
        lfeexp ~ log(gdp/capita)

Notes: you need to use `life` to compute and re-join the dataframes
  

Create log gdp/capita


In [None]:
life["ln_gdppc"] = np.log(life["gdppc"])

In [None]:
reg = smf.ols("lifeexp ~ ln_gdppc",data = life).fit()
reg.summary()

Scatter plot for the model

In [None]:
(
    ggplot(life, aes(x="ln_gdppc", y="lifeexp"))
    + geom_point(color="blue")
    + geom_smooth(method="ols", color="red", formula="y ~ x")
    + labs(x="Log of GDP per Capita", y="Life Expectancy at birth")
    + theme_bw()
)

Save the residuals

In [None]:
life["lfe_res"] = reg.resid

In [None]:
world_map_exp = world_map.merge(life, left_on = "region",right_on = "countryname",how="left")

In [None]:
lifeexp_map = (
    ggplot(world_map_exp, aes(x="long", y="lat", group="group", fill="lfe_res"))
    + geom_polygon()
    + coord_equal()
    + scale_fill_gradient(low="red", high="lightgreen", name="")
    + ggtitle("Deviance from Life Exp. (2017)")
    + theme_map
)
lifeexp_map