# Class 12: Mapping

Plan for today:
- Creating maps


## Notes on the class Jupyter setup

If you have the *ydata123_2024a* environment set up correctly, you can get the class code using the code below (which presumably you've already done given that you are seeing this notebook). 

In [2]:
import YData

# YData.download.download_class_code(11)   # get class code    
# YData.download.download_class_code(11, True)  # get the code with the answers 

YData.download_data("dennys.csv")

YData.download.download_data("States_shapefile.geojson")
YData.download.download_data("state_demographics.csv")
YData.download.download_data("ne_110m_graticules_10.prj")
YData.download.download_data("ne_110m_graticules_10.shp")
YData.download.download_data("ne_110m_graticules_10.shx")
YData.download.download_data("ne_110m_graticules_10.dbf")


There are also similar functions to download the homework:

In [3]:
#YData.download.download_homework(5)  # downloads the homework 

If you are using colabs, you should install the YData packages by uncommenting and running the code below and run the code below to mount the your google drive.

In [4]:
# !pip install https://github.com/emeyers/YData_package/tarball/master
# from google.colab import drive
# drive.mount('/content/drive')

In [5]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

## 1. Spatial mapping with geopandas

Visualizing spatial data through maps is another powerful way to see trends in data. There are several mapping packages in Python. Here we will use the geopandas package to create maps. 

The geopandas package defines a geopandas DataFrame, which is the same as a pandas DataFrame but has an additional column called `geometry` which specifies geographic information. 

Let's explore this now!


### Visualizing boundaries

Let's start by looking some geopanda DataFrames and visualizing some geometric boundaries.

Below we load the gapminder data again and get the gapminder data from 2007. We also show which maps come with geopandas. 


In [2]:
import geopandas as gpd
import plotly.express as px

gapminder_2007 = px.data.gapminder().query("year == 2007")   # the plotly package comes with the gapminder data


# see which maps come with geopandas
gpd.datasets.available


['naturalearth_cities', 'naturalearth_lowres', 'nybb']

Let's get a geopandas DataFrame that has th countries in the world...

In [3]:
# View the world geopandas DataFrame

# turn off deprecation warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

# read data into a geodataframe
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

# print the data type


# look at the first few rows of the data



In [4]:
# Plot a world map with particular properties






In [5]:
# Plot just the United States




### Coordinate reference systems and projections

A coordinate reference system (CRS) is a framework used to precisely measure locations on the surface of the Earth as coordinates. The goal of any spatial reference system is to create a common reference frame in which locations can be measured precisely and consistently as coordinates, which can then be shared unambiguously, so that any recipient can identify the same location that was originally intended by the originator.

There are two different types of coordinate reference systems: Geographic Coordinate Systems and Projected Coordinate Systems. [Projected coordinate systems](https://en.wikipedia.org/wiki/List_of_map_projections) map 3D coordinates into a 2D plane so they can be plotted. Different projected coordiate systems perserve different properties, such as keeping all angles intact which is usefor for navigation (e.g., the Mercator projection) or keeping the size of land areas intact (e.g., the Eckert IV projection). 

A detailed discussion of CRS is beyond the scope of the class. But for the purposes of this class, it is just important that all layers in a map are using the same project (otherwise, for example, data points representing cities and the underlying spatial map won't line up). 

Let's very briefly explore different map projections... 


In [7]:
# Read Graticules (lines on a map)
graticules = gpd.read_file("ne_110m_graticules_10.shp")


# print the CRS
print(graticules.crs)


# View the GeoDataFrame
graticules.head(3)


EPSG:4326


geopandas.geodataframe.GeoDataFrame

In [9]:
# Web Mercator projection - perserves angles (EPSG:4326 projection)

# print the default CRS


# plot the map




In [10]:
# Eckert IV is an equal-area projection  ("ESRI:54012")





In [11]:
# Robinson projection - neither equal-area nor conformal ("ESRI:54030") 




To learn more about "What your favorite map projection says about you" see: https://xkcd.com/977/

### Maps with layers and markers

We can also plot points on a map. When doing so, it's important that the points and the underlying map use the same coordinate reference system (CRS).

Let's add Denny's locations to the map of the United States!


In [12]:
# Let's start by getting a map of just the United States





In [13]:
# visualize just the United States




In [14]:
# Get the coordinate reference system (CRS) for our map



Let's now load our Denny's data!

In [17]:
# Let's load our Denny's data
dennys = pd.read_csv("dennys.csv")
dennys.head(3)

Unnamed: 0.1,Unnamed: 0,address,city,state,zip,longitude,latitude
0,1,2900 Denali,Anchorage,AK,99503,-149.8767,61.1953
1,2,3850 Debarr Road,Anchorage,AK,99508,-149.809,61.2097
2,3,1929 Airport Way,Fairbanks,AK,99701,-147.76,64.8366


To convert longitude and latitude coordinates into geometric objects; i.e., we will convert them into Shaply objects.  We can use the `gpd.points_from_xy(long, lat)` function. 

In [17]:
# Let's convert our longitude and latitude coordinates into geometric (Shapely) objects 




In [15]:
# Let's now convert out data into a geopandas DataFrame




In [16]:
# We can plot the location of the Denny's using the plot function




In [18]:
# Let's check the CRS




Before plotting data, we should set the appropriate coordinate reference system (CRS). This is partlcularly imporant when we are combining different layers on a map, such as putting city locations on the map that has the outlines of regional borders. 

The CRS that uses longitude and latitude coordinates is the [World Geodetic System 1984 (WGS84)](https://epsg.io/4326). This system is often referred to by its EPSG Geodetic Parameter Dataset code which is `4326`. 

Thus, we should set the set coordinate system to be EPSG 4326. We can do this using the method `.set_crs(4326)`. Let's set this on our `dennys_gpd` DataFrame. 


In [19]:
# Let's set the CRS to match the CRS of our map (which is EPSG 4326)






Now that we have our Denny's location in the same coordinate system as our map, we can add the points to the map. 

In [20]:
#state_map = gpd.read_file("States_shapefile.geojson")






### Choropleth maps

In choropleth maps, predefined regions are filled in with colors based values of interest. 

Typically to create a choropleth map we join data of interest onto a map. 

Let's explrore this now...


In [21]:
# Join the gapminder data onto our world map




In [22]:
# Plot a choropleth map of life expectancy





In [23]:
# Change the color scale




In [24]:
# We can plot quantiles




### Anorther choropleth map example

Let's fit a choropleth map examining which states in the USA are growing in terms of people having lots of childern. 

Any thoughts on which state this might be? 

To start, let's load a map with the outlines of the states in the USA, and load demographic data.

In [25]:
state_map = gpd.read_file("States_shapefile.geojson")





In [29]:
# load demographic data on the states

state_demographics = pd.read_csv("state_demographics.csv")
state_demographics.head(3)

Unnamed: 0,State,under_5,over_64,bachelors_degree,total
0,Alabama,295811.997,741954.681,1095959.202,4849377
1,Alaska,54518.168,69252.808,202601.3,736732
2,Arizona,430814.976,1070305.956,1810769.196,6731484


In [26]:
# In order to join the DataFrames, we need to make sure the states have the same capitalization





In [27]:
# Join the demographic information on to the the USA map





In [28]:
# Let's plot the map 




Is there anything [wrong with this map](https://xkcd.com/1138/)? 

In [29]:
# Let's look at the proportion of people under the age of 5






In [30]:
# Let's plot the new map

