<a href="https://colab.research.google.com/github/noahcreany/EcologyCenter_SpatialPy/blob/main/1_EC_PythonIntro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Welcome to an Intro to GeoSpatial Analysis in Python

This will be a general introduction to using open source python packages (R Libraries) for mapping and spatial statistics. 

Outline for Workshop:
1.   A brief intro to python
2.   Wrangling geometries with GeoPandas
3.   Make a Map
4.   Spatial Statistics

*Note: Many of these modules were inspired by Python "courses" on Kaggle.com (A great source to get started in Python - https://www.kaggle.com/learn).*

Let's get started!




#A Brief Intro to Python

I'm going to assume some of you might have some programming experience in R - perhaps that's what brought you here. Nevertheless, I'll cover some basic aspects of Python as it is experienced here in Google Colab (Jupyter Notebook).

In [1]:
# Comments are made with #
a_var = 3

print(a_var)

3


In [2]:
# Assigning objects uses =, instead of <- (like in R)
a_list = ['item 1', 'item 2','item 3'] # or [1,2,3] if numeric list

#This is the syntax for Python 3 to print a list
print(a_list)
#or
a_list

['item 1', 'item 2', 'item 3']


['item 1', 'item 2', 'item 3']

In [3]:
#Sometimes a dictionary is helpful for loops:
a_dict = {'item 1':1, 'item 2':2,'item 3':3}
print('Get Value of Item 1: ', a_dict['item 1'])
print('Get Value of Item 2: ', a_dict['item 2'])

Get Value of Item 1:  1
Get Value of Item 2:  2


In [4]:
#For loops in Python
for i in range(a_dict['item 2']):
  print(i)

0
1


You'll notice, in Python counting starts at 0 instead of 1, as in R. This is important to keep in mind with loops and ranges of data. 

In [5]:
def a_function(num,mul):  #def defines a function, then the name of the function, and then (parameters of the function):
  product = num*mul       #next you define what the function does, these are local variables so you can use any variable names you'd like
  return product          #Finally, you need to return a value, otherwise it won't appear to do anything

a_function(7,9) #Function, product of 7*9

63

#Pandas, the 'Tidyverse' of Python
Pandas is a "dataframe" package in Python

There are a few conventions for its use, but understanding how to subset and manipulate variables is the same as in GeoPandas. GeoPandas just adds geometries to Pandas.

Pandas Cheatsheet:
https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

GeoPandas Cheatsheet:
https://github.com/prasunkgupta/python-cheat-sheets/blob/master/geopandas-shapely-geopy.ipynb

In [8]:
#import packages in python, often abbreviated like this:
import pandas as pd
import numpy as np
import geopandas as gpd
import requests

In [7]:
#If geopandas is not found
!pip install geopandas

Collecting geopandas
  Downloading geopandas-0.10.2-py2.py3-none-any.whl (1.0 MB)
[K     |████████████████████████████████| 1.0 MB 10.5 MB/s 
Collecting fiona>=1.8
  Downloading Fiona-1.8.21-cp37-cp37m-manylinux2014_x86_64.whl (16.7 MB)
[K     |████████████████████████████████| 16.7 MB 9.3 MB/s 
[?25hCollecting pyproj>=2.2.0
  Downloading pyproj-3.2.1-cp37-cp37m-manylinux2010_x86_64.whl (6.3 MB)
[K     |████████████████████████████████| 6.3 MB 19.9 MB/s 
Collecting click-plugins>=1.0
  Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Collecting cligj>=0.5
  Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
Collecting munch
  Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Installing collected packages: munch, cligj, click-plugins, pyproj, fiona, geopandas
Successfully installed click-plugins-1.1.1 cligj-0.7.2 fiona-1.8.21 geopandas-0.10.2 munch-2.5.0 pyproj-3.2.1


In [10]:
df = pd.DataFrame(a_dict, index = [1])
df.head()

Unnamed: 0,item 1,item 2,item 3
1,1,2,3


In [11]:
#print columns
for i in df.columns: print(i)

#Alternatively
df.columns.to_list()

item 1
item 2
item 3


['item 1', 'item 2', 'item 3']

###Pandas often makes use of appending routines to the DataFrame, and as a result can often use minimal code. 

Let's say we want to remove spaces from all of our columns:

In [12]:
#As above to call on the columns we use df.columns:
df.columns = df.columns.str.replace(' ','')
df.head()

Unnamed: 0,item1,item2,item3
1,1,2,3


Without spaces, we can use variable (column) names by appending them to the DataFrame. Additionally, we can make use of some built in features to change their values

In [13]:
df.item1 = df.item1.mul(1000)
df.item2 = np.log10(df.item2)
df.item3 = np.log2(df.item3)
df.head()

Unnamed: 0,item1,item2,item3
1,1000,0.30103,1.584963


In [14]:
df= df.round(3)
df.head()

Unnamed: 0,item1,item2,item3
1,1000,0.301,1.585


##Let's move on to something more interesting - Webscraping data and Pandas data wrangling
Let's grab some COVID-19 Data from the NYTime's Github page. It's fairly easy to import data into pandas from the web.

In [78]:
url = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv'
df = pd.read_csv(url)
df.head()

Unnamed: 0,date,state,fips,cases,deaths
0,2020-01-21,Washington,53,1,0
1,2020-01-22,Washington,53,1,0
2,2020-01-23,Washington,53,1,0
3,2020-01-24,Illinois,17,1,0
4,2020-01-24,Washington,53,1,0


In [79]:
#How many states in DF?
print('Number of States: ',len(df.state.unique()))

Number of States:  56


In [80]:
#Date Range
print('DateRange:')
print('First: ', df.date.min(), '| Last:',df.date.max())

DateRange:
First:  2020-01-21 | Last: 2022-04-17


In [196]:
#Which State has the most cumulative cases?
statesorted = df.groupby('state')['cases'].max().sort_values(ascending = False)
statesorted.head()

state
California    9158448
Texas         6715756
Florida       5878404
New York      5058582
Illinois      3100157
Name: cases, dtype: int64

In [159]:
statesorted.shape

(56,)

Subsetting in Pandas is similar to R, where you can use the index(ie.df[1:5]) or call on specific names within columns.


In [19]:
#.loc and iloc are special subsetting tools, iloc is integer based
df.iloc[5:10]

Unnamed: 0,date,state,fips,cases,deaths
5,2020-01-25,California,6,1,0
6,2020-01-25,Illinois,17,1,0
7,2020-01-25,Washington,53,1,0
8,2020-01-26,Arizona,4,1,0
9,2020-01-26,California,6,2,0



When was the first covid death in Utah?

In [86]:
df.loc[df['state']=='Utah','date'].min()

'2020-02-25'

How many cumulative cases in Utah?

In [94]:
print(f"{statesorted['Utah']:,}")

929,587
