<a href="https://colab.research.google.com/github/fengyuqi621/intothedata/blob/master/Section_7_Solutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Jupyter Notebook

This is a Jupyter Notebook, which is a basically just a super fancy Python shell.

You may have "cells" that can either be text (like this one) or executable Python code. Notebooks are really nice because they allow you to rapidly develop Python code by writing small bits of code, testing their output, and moving on to the next bit; this interactive nature of the notebook is a huge plus to professional Python developers. 

It's also nice, because it's really easy to share your code with others and surround it with text to tell a story! 

# Colaboratory
Colaboratory is a service provided by Google to take a Jupyter Notebook (a standard formay of a `.ipynb` file) and let users edit/run the code in the notebook for free! 

This notebook is write-protected so you are not able to edit the  notebook that the whole class will look at, but you are able to open up the notebook in "playground mode" which lets you make edits to a temporary copy of the notebook. If you want to save the changes you made to this notebook, you will have to follow the instructions when you try to save to copy the notebook to your Google Drive. 

# Setup
Make sure you run the following cell(s) before trying to run any the following cells. You do not need to understand what they are doing, it's just a way to make sure there is a file we want to use stored on the computer running this notebook.


In [None]:
# This takes a little while to run unfortunately
!curl -L http://download.osgeo.org/libspatialindex/spatialindex-src-1.8.5.tar.gz 2>/dev/null | tar xz
!apt-get install -qq g++ make
import os
os.chdir('/content/spatialindex-src-1.8.5')
!./configure 2>&1 >/dev/null && make 2>&1 >/dev/null && make install && ldconfig
!pip install -q rtree

In [None]:
import requests
import zipfile


def save_file(url, file_name):
  r = requests.get(url)
  with open(file_name, 'wb') as f:
    f.write(r.content)

    
save_file('https://courses.cs.washington.edu/courses/cse163/19sp/' +
          'files/lectures/05-13/data.zip', 'data.zip')
save_file('https://courses.cs.washington.edu/courses/cse163/19sp/' +
          'files/lectures/05-13/gz_2010_us_040_00_5m.json', 
          'gz_2010_us_040_00_5m.json')
save_file('https://courses.cs.washington.edu/courses/cse163/19sp/' +
          'files/lectures/05-13/stormhistory.csv', 'stormhistory.csv')

with zipfile.ZipFile("data.zip","r") as zip_ref:
    zip_ref.extractall()
    
!pip install --upgrade geopandas
!pip install --upgrade pyshp
!pip install --upgrade shapely
!pip install --upgrade descartes

In [None]:
%matplotlib inline
import pandas as pd
import geopandas
import matplotlib.pyplot as plt

# Dissolve
The first two cells are just review from Wednesday to plot information about the world in the `countries` dataset.

In [None]:
countries = geopandas.read_file('data/ne_110m_admin_0_countries.shp')

In [None]:
countries.plot(column='POP_EST', figsize=(15, 7), legend=True)

When we first learned `pandas`, we learned about the `groupby` operation. `geopandas` provides a similar function called `dissolve`. It is basically the same as groupby for the non-geometry column, but will combine all the geometries by overlapping them. One annoying thing about `dissolve` is that it operates on all columns, we we first have to make a smaller `GeoDataFrame` that only has the columns we want to group by or aggregate. 

In [None]:
populations = countries[['CONTINENT', 'POP_EST', 'geometry']]

In [None]:
popluation_by_continent = populations.dissolve(by='CONTINENT', aggfunc='sum')
popluation_by_continent.plot(column='POP_EST', legend=True, figsize=(10, 5))

# Section Problems

## Problem 1) `highlight_population`
Write a function named `highlight_population` that takes the countries `GeoDataFrame` and the name of a continent and makes a plot like we did in lecture that colors the population of the countries in the continent. Instead of plotting the raw population numbers, you should plot the percentage of the population of that continent that lives there. To do this, you are allowed to add a new column to the dataset called `pop_ratio`.

The plot should show all countries outside of the content as grey (color being #EEEEEE and edgecolor #FFFFFF). The plot should include a legend. The legend should be scaled so the minimum value is 0 (`vmin=0`) and the maximum value is 1 (`vmax=1`).

In [None]:
# Solution
def highlight_population(countries, continent):
  countries_in_continent = countries[countries['CONTINENT'] == continent]
  fig, ax = plt.subplots(1, figsize=(15, 10))
  
  total_pop = countries_in_continent['POP_EST'].sum()
  countries_in_continent['pop_ratio'] = countries_in_continent['POP_EST'] \
    / total_pop
    
  countries.plot(ax=ax, color='#EEEEEE', edgecolor='#FFFFFF')
  countries_in_continent.plot(ax=ax, column='pop_ratio', legend=True,
                              vmin=0, vmax=1)
  fig.show()
  

In [None]:
highlight_population(countries, 'Africa')

## Problem 2) gdp_and_population_ratio
Write a function named `gdp_and_population_ratio` that takes the countries `GeoDataFrame` and makes a plot with two subplots. The first subplot should show ratio of the population that lives in each countriy as the color. The second subplot should show the ratio of the world GDP that each country has. To do this, you are allowed to add a new columns to the dataset called `pop_ratio` and `gdp_ratio`.

The plot should include a legend. The legend should be scaled so the minimum value is 0 (`vmin=0`) and the maximum value is 1 (`vmax=1`).

In [None]:
# Solution
def gdp_and_population_ratio(countries):
  total_pop = countries['POP_EST'].sum()
  total_gdp = countries['GDP_MD_EST'].sum()
  countries['pop_ratio'] = countries['POP_EST'] / total_pop
  countries['gdp_ratio'] = countries['GDP_MD_EST'] / total_gdp

  fig, [ax1, ax2] = plt.subplots(2, figsize=(15, 10))

  countries.plot(ax=ax1, column='pop_ratio', legend=True, vmin=0, vmax=1)
  countries.plot(ax=ax2, column='gdp_ratio', legend=True, vmin=0, vmax=1)

  fig.show()

In [None]:
gdp_and_population_ratio(countries)