<a href="https://colab.research.google.com/github/noahcreany/EcologyCenter_SpatialPy/blob/main/1_EC_PythonIntro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Welcome to an Intro to GeoSpatial Analysis in Python

This will be a general introduction to using open source python packages (R Libraries) for mapping and spatial statistics. 

Outline for Workshop:
1.   A brief intro to python
2.   Wrangling geometries with GeoPandas
3.   Make a Map
4.   Spatial Statistics

*Note: Many of these modules were inspired by Python "courses" on Kaggle.com (A great source to get started in Python - https://www.kaggle.com/learn).*

Let's get started!




#A Brief Intro to Python

I'm going to assume some of you might have some programming experience in R - perhaps that's what brought you here. Nevertheless, I'll cover some basic aspects of Python as it is experienced here in Google Colab (Jupyter Notebook).

In [None]:
# Comments are made with #
a_var = 3

print(a_var)

In [None]:
# Assigning objects uses =, instead of <- (like in R)
a_list = ['item 1', 'item 2','item 3'] # or [1,2,3] if numeric list

#This is the syntax for Python 3 to print a list
print(a_list)
#or
a_list

In [None]:
#Sometimes a dictionary is helpful for loops:
a_dict = {'item 1':1, 'item 2':2,'item 3':3}
print('Get Value of Item 1: ', a_dict['item 1'])
print('Get Value of Item 2: ', a_dict['item 2'])

In [None]:
#For loops in Python
for i in range(a_dict['item 2']):
  print(i)

You'll notice, in Python counting starts at 0 instead of 1, as in R. This is important to keep in mind with loops and ranges of data. 

In [None]:
def a_function(num,mul):  #def defines a function, then the name of the function, and then (parameters of the function):
  product = num*mul       #next you define what the function does, these are local variables so you can use any variable names you'd like
  return product          #Finally, you need to return a value, otherwise it won't appear to do anything

a_function(7,9) #Function, product of 7*9

#Pandas, the 'Tidyverse' of Python
Pandas is a "dataframe" package in Python

There are a few conventions for its use, but understanding how to subset and manipulate variables is the same as in GeoPandas. GeoPandas just adds geometries to Pandas.

Pandas Cheatsheet:
https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

GeoPandas Cheatsheet:
https://github.com/prasunkgupta/python-cheat-sheets/blob/master/geopandas-shapely-geopy.ipynb

In [None]:
#import packages in python, often abbreviated like this:
import pandas as pd
import numpy as np
# import geopandas as gpd
import requests

In [None]:
df = pd.DataFrame(a_dict, index = [1])
df.head()

In [None]:
#print columns
for i in df.columns: print(i)

#Alternatively
df.columns.to_list()

###Pandas often makes use of appending routines to the DataFrame, and as a result can often use minimal code. 

Let's say we want to remove spaces from all of our columns:

In [None]:
#As above to call on the columns we use df.columns:
df.columns = df.columns.str.replace(' ','')
df.head()

Without spaces, we can use variable (column) names by appending them to the DataFrame. Additionally, we can make use of some built in features to change their values

In [None]:
df.item1 = df.item1.mul(1000)
df.item2 = np.log10(df.item2)
df.item3 = np.log2(df.item3)
df.head()

In [None]:
df= df.round(3)
df.head()

##Let's move on to something more interesting - Webscraping data and Pandas data wrangling
Let's grab some Ramen Ratings from a website - theramenrater.com It's fairly easy to import data into pandas from the web.

In [None]:
url = 'https://www.theramenrater.com/wp-content/uploads/2022/10/4300The-Big-List.xlsx'
df = pd.read_excel(url)
df

In [None]:
#How many unique Brand Reviews
print('Number of Ramens: ',len(df.Brand.unique()))

In [None]:
#How many reviews by Brand and type?
df.Brand.value_counts()

In [None]:
#Reviews as % of whole
df.Brand.value_counts(normalize = True).mul(100).round(2)

Subsetting in Pandas is similar to R, where you can use the index(ie.df[1:5]) or call on specific values: within columns.

In [None]:
# Which brand has the highest ratings?
df = df[df.Stars.isin([0,1,2,3,4,5])] 
ratings = df.groupby('Brand')['Stars'].mean().sort_values(ascending = False).to_frame()
ratings

In [None]:
# What is the Best Ramen in US
usramen = df[df['Country']=='United States']
usramen.groupby('Brand')['Stars'].mean().sort_values(ascending = False).to_frame()

Looks like there are a lot of restaurants in there... lets remove them

In [None]:
usramen = usramen[~usramen.Style.isin(['Restaurant', 'Bar', 'Bottle'])]

In [None]:
usramen.groupby(['Brand','Style'])['Stars'].mean().sort_values(ascending = False).to_frame()