## Lecture 13 Map making and analysis in Python

Map making in Python involves utilizing various libraries and tools to create visually appealing and informative maps for geospatial analysis and visualization. One of the key libraries used for this purpose is GeoPandas, which extends the capabilities of Pandas to support geospatial data. GeoPandas allows users to easily read, manipulate, analyze, and visualize geospatial datasets, such as shapefiles and GeoJSON files. With GeoPandas, users can perform operations like overlaying multiple layers, creating choropleth maps, and adding custom markers to maps. By leveraging the power of Matplotlib, Folium, and other visualization libraries, Python users can create interactive and static maps that effectively communicate spatial patterns, trends, and insights. Whether for academic research, urban planning, environmental monitoring, or business intelligence, map making in Python with GeoPandas offers a versatile and efficient solution for geospatial data analysis and visualization needs.

In the lecture, we will cover
- the format for geospatial data
- reading and processing geospatial data
- creating choropleth maps
- cutomizing the color and legend
- adding multiple layers to the maps
- Include geo information for analysis

In [2]:
#pip install geopandas geodatasets mapclassify

### Geospatial data resource and format

A .shp file, short for Shapefile, is a commonly used file format in geospatial data analysis and mapping. It stores vector data, including points, lines, and polygons, along with associated attributes. Shapefiles are widely supported by GIS (Geographic Information Systems) software and libraries, making them a popular choice for storing and sharing geospatial datasets. They consist of multiple files, including a main .shp file containing the geometric data, a .shx file containing the shape index, and a .dbf file containing attribute data. 

In this lecture, we will use the geospatial data from

https://www.naturalearthdata.com/downloads/

https://geodacenter.github.io/data-and-lab/

#### Geoda

GeoDa, short for Geographic Data Analysis, is a powerful open-source software tool designed for exploratory spatial data analysis (ESDA). 

In [1]:
import geopandas as gpd
import geodatasets

  "class": algorithms.Blowfish,


#### Create the geospatial data from columns

In [3]:
from shapely.geometry import Point

#### Merge the geospatial data with attributes

In [4]:
import pandas as pd

In [5]:
# FIPS: Federal Information Processing System
# https://www.bls.gov/respondents/mwr/electronic-data-interchange/appendix-d-usps-state-abbreviations-and-fips-codes.htm


### processing geospatial data and making a map

#### Exercise 1

- Plot the map for lower 48 US states (excluding Alaska and Hawaii). Only plot the boundary. 
- Extract the MA county information from the county data and only plot for the MA county. 

In [6]:
# The FIPS for MA is 25


### creating choropleth maps and customization

Choropleth maps are a type of thematic map that represent spatial data through color gradients or shading. They use color-coded areas, such as countries, states, or administrative regions, to visualize quantitative data, with darker or lighter shades indicating higher or lower values respectively. 

In [7]:
# https://matplotlib.org/stable/users/explain/colors/colormaps.html


### Missing values

In [8]:
import numpy as np



### Adding multiple layers

In order to add multiple layers to the map, we need to make the sure the map has the same coordination system. 

In [10]:
#Marker choices: https://matplotlib.org/stable/api/markers_api.html


## Fit models with Geospatial data

In most of the cases, you can just extract the coordinates and add it to your model (linear regression, random forest etc. ). 

However, there are some models that are preferred when involved with geospatial data. 

- The Generalized Additive Model (GAM) is a flexible modeling approach that extends linear models by allowing non-linear relationships between the predictors and the response variable. It achieves this by using smooth functions for each predictor, which can capture complex patterns in the data without assuming a strict linear relationship. To include the smooth function into GAM model, we use the `s` function to add splines to the features.
- Spatial autoregressive models take into account the spatial dependence between observations. These models are particularly useful when the assumption of independence between observations is violated due to spatial autocorrelation.
    - Spatial Lag Model (SLM): Incorporates the spatial dependence of the dependent variable.
    - Queen Contiguity: Considers observations as neighbors if they share any boundary point (like a chess queen moving on a board).

In [27]:
#pip install pygam libpysal spreg

In [28]:
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
from pygam import LinearGAM, s
from libpysal.weights import Queen
from spreg import ML_Lag
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression

# Load geospatial data
chicago = gpd.read_file(geodatasets.get_path("geoda.chicago_commpop"))

# Display the first few rows of the dataset
print(chicago.head())

         community  NID  POP2010  POP2000  POPCH   POPPERCH  popplus  popneg  \
0          DOUGLAS   35    18238    26470  -8232 -31.099358        0       1   
1          OAKLAND   36     5918     6110   -192  -3.142390        0       1   
2      FULLER PARK   37     2876     3420   -544 -15.906433        0       1   
3  GRAND BOULEVARD   38    21929    28006  -6077 -21.698922        0       1   
4          KENWOOD   39    17841    18363   -522  -2.842673        0       1   

                                            geometry  
0  MULTIPOLYGON (((-87.60914 41.84469, -87.60915 ...  
1  MULTIPOLYGON (((-87.59215 41.81693, -87.59231 ...  
2  MULTIPOLYGON (((-87.62880 41.80189, -87.62879 ...  
3  MULTIPOLYGON (((-87.60671 41.81681, -87.60670 ...  
4  MULTIPOLYGON (((-87.59215 41.81693, -87.59215 ...  
