# Class 12: Maps



In [None]:
import YData

# YData.download.download_class_code(14)   # get class code    
# YData.download.download_class_code(14, True)  # get the code with the answers 

YData.download_data("dennys.csv")

YData.download.download_data("States_shapefile.geojson")
YData.download.download_data("state_demographics.csv")
YData.download.download_data("ne_110m_graticules_10.prj")
YData.download.download_data("ne_110m_graticules_10.shp")
YData.download.download_data("ne_110m_graticules_10.shx")
YData.download.download_data("ne_110m_graticules_10.dbf")


If you are using colabs, you should install the YData packages by uncommenting and running the code below and run the code below to mount the your google drive.

In [None]:
# !pip install https://github.com/lederman/YData_package/tarball/master
# from google.colab import drive
# drive.mount('/content/drive')

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

## Review of seaborn!

[Seaborn](https://seaborn.pydata.org/index.html) is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. 

I.e., it is built on top of of matplotlib but produces better looking plots that are easier to create. 

Let's start by examining different themes which can produce better looking plots. We can do this using the `sns.set_theme()` method. 


In [None]:
# Import seaborn
import seaborn as sns

# Apply the default theme
sns.set_theme()   # default style is 'darkgrid')
#sns.set_theme(style='whitegrid')

# Side note: Matplotlib also has themes
# plt.style.available
# plt.style.use('fivethirtyeight')


### Penguins!  

Let's get a little more practice with seaborn by continuing to explore the penguins data set. 



In [None]:
# Let's look at some penguins
penguins = sns.load_dataset("penguins")

penguins.head()

### Plotting a single quantitative variable using sns.displot()

We can plot a single quantitative variables using the `sns.displot()` function.

Properties we can set include
- `x`: The name of the data column you want to plot
- `hue`: The name of the column that colors each point
- `kind` The type of plot

Different options for `kind` are: “hist”, “kde”, “ecdf”


#### Warm-up exercise

Please create a `sns.displot()` to create a visulation of *flipped length*, where each *species* is in a different color (i.e., different hue). Also, experiment with the "kind" of visualization and choose the kind you think creates the best visualization. 


In [None]:
# plot the flipper length
...


### Pairs plots

One of the most useful visualizations for exploring the relationships between several quantitative variables is to create a "pairs plot" which creates a series of scatter plots between all quantitative variables in the data.  We can do this in seaborn using the `sns.pairplot(data)` function!


Use the `pairplot()` function to visualize the relationships between all columns in the `penguins` DataFrame. Also, make each species have a different color. 



In [None]:
# Create pair plots for the different varaibles in the penguins data set

...


<img src = "https://i.imgflip.com/1ezfdq.jpg">

## 1. Spatial mapping with geopandas

Visualizing spatial data through maps is another powerful way to see trends in data. There are several mapping packages in Python. Here we will use the geopandas package to create maps. 

The geopandas package defines a geopandas DataFrame, which is the same as a pandas DataFrame but has an additional column called `geometry` which specifies geographic information. 

Let's explore this now!


### Visualizing boundaries

Let's start by looking some geopanda DataFrames and visualizing some geometric boundaries.



In [None]:
import geopandas as gpd

# see which maps come with geopandas
gpd.datasets.available
# Oh no! They don't provide datasets anymore!

Let's get a geopandas DataFrame that has th countries in the world...

In [None]:
# View the world geopandas DataFrame

# read data into a geodataframe
url = "https://naciscdn.org/naturalearth/110m/cultural/ne_110m_admin_0_countries.zip"
world = gpd.read_file(url)
# In previous versions of Pandas, the datasert was included with the package
#world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

# print the data type
print(type(world))

# look at the first few rows of the data
world.head()

In [None]:
# Plot a world map with particular properties

...


In [None]:
world.query("ADMIN == 'United States of America'")

In [None]:
world.query("SOVEREIGNT == 'United States of America'")

In [None]:
# Plot just the United States

...


### Coordinate reference systems and projections

A coordinate reference system (CRS) is a framework used to precisely measure locations on the surface of the Earth as coordinates. The goal of any spatial reference system is to create a common reference frame in which locations can be measured precisely and consistently as coordinates, which can then be shared unambiguously, so that any recipient can identify the same location that was originally intended by the originator.

There are two different types of coordinate reference systems: Geographic Coordinate Systems and Projected Coordinate Systems. [Projected coordinate systems](https://en.wikipedia.org/wiki/List_of_map_projections) map 3D coordinates into a 2D plane so they can be plotted. Different projected coordiate systems perserve different properties, such as keeping all angles intact which is usefor for navigation (e.g., the Mercator projection) or keeping the size of land areas intact (e.g., the Eckert IV projection). 

A detailed discussion of CRS is beyond the scope of the class. But for the purposes of this class, it is just important that all layers in a map are using the same project (otherwise, for example, data points representing cities and the underlying spatial map won't line up). 

Let's very briefly explore different map projections... 


In [None]:
# Read Graticules (lines on a map)
graticules = gpd.read_file("ne_110m_graticules_10.shp")
print(graticules.crs)
graticules.head(3)

In [None]:
# Web Mercator projection - perserves angles (EPSG:4326 projection)

print(world.crs) # print the default CRS


# plot the map
...

In [None]:
# Eckert IV is an equal-area projection  ("ESRI:54012")

...

In [None]:
# Robinson projection - neither equal-area nor conformal ("ESRI:54030") 

...

To learn more about "What your favorite map projection says about you" see: https://xkcd.com/977/

### Maps with layers and markers

We can also plot points on a map. When doing so, it's important that the points and the underlying map use the same coordinate reference system (CRS).

Let's add Denny's locations to the map of the United States!


In [None]:
# Let's start by getting a map of just the United States

...

In [None]:
# visualize just the United States

...

In [None]:
# Get the coordinate reference system (CRS) for our map

...

Let's now load our Denny's data!

In [None]:
# Let's load our Denny's data
dennys = pd.read_csv("dennys.csv")
dennys.head(3)

To convert longitude and latitude coordinates into geometric objects; i.e., we will convert them into Shaply objects.  We can use the `gpd.points_from_xy(long, lat)` function. 

In [None]:
# Let's convert our longitude and latitude coordinates into geometric (Shapely) objects 

...

In [None]:
# Let's now convert out data into a geopandas DataFrame

...

In [None]:
# We can plot the location of the Denny's using the plot function

...


In [None]:
# Let's check the CRS

...

Before plotting data, we should set the appropriate coordinate reference system (CRS). This is partlcularly imporant when we are combining different layers on a map, such as putting city locations on the map that has the outlines of regional borders. 

The CRS that uses longitude and latitude coordinates is the [World Geodetic System 1984 (WGS84)](https://epsg.io/4326). This system is often referred to by its EPSG Geodetic Parameter Dataset code which is `4326`. 

Thus, we should set the set coordinate system to be EPSG 4326. We can do this using the method `.set_crs(4326)`. Let's set this on our `dennys_gpd` DataFrame. 


In [None]:
# Let's set the CRS to match the CRS of our map (which is EPSG 4326)

...

Now that we have our Denny's location in the same coordinate system as our map, we can add the points to the map. 

In [None]:

...

### Choropleth maps

In choropleth maps, predefined regions are filled in with colors based values of interest. 

Typically to create a choropleth map we join data of interest onto a map. 

Let's explrore this now...


In [None]:
import plotly.express as px

gapminder_2007 = px.data.gapminder().query("year == 2007")   # the plotly package (which we skipped for now) comes with the gapminder data

gapminder_2007.head()


In [None]:
# Join the gapminder data onto our world map

...

In [None]:
# Plot a choropleth map of life expectancy



In [None]:
# Change the color scale



In [None]:
# We can plot quantiles



### Anorther choropleth map example

Let's fit a choropleth map examining which states in the USA are growing in terms of people having lots of childern. 

Any thoughts on which state this might be? 

To start, let's load a map with the outlines of the states in the USA, and load demographic data.

In [None]:
state_map = gpd.read_file("States_shapefile.geojson")

print(state_map.crs)

state_map.head(3)

In [None]:
# load demographic data on the states

state_demographics = pd.read_csv("state_demographics.csv")
state_demographics.head(3)

In [None]:
# In order to join the DataFrames, we need to make sure the states have the same capitalization

...

In [None]:
# Join the demographic information on to the the USA map

...

In [None]:
# Let's plot the map 

...

Is there anything [wrong with this map](https://xkcd.com/1138/)? 

In [None]:
# Let's look at the proportion of people under the age of 5

...

In [None]:
# Let's plot the new map

...

In [None]:
# But what does it mean?
...