# Quick introduction to GeoPandas
From the [GeoPandas documentation](http://geopandas.org/):
> GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends  the datatypes used by [pandas](http://pandas.pydata.org) to allow spatial operations on geometric types. Geometric operations are performed by [shapely](http://toblerity.github.io/shapely). Geopandas further depends on [fiona](http://toblerity.github.io/fiona) for file access and [descartes](https://pypi.python.org/pypi/descartes) and [matplotlib](http://matplotlib.org) for plotting.

## Creating a GeoDataFrame from a DataFrame with coordinates

In [None]:
import pandas as pd
import geopandas
from shapely.geometry import Point
import matplotlib.pyplot as plt

The variable below named *car* is an example of a Python **dictionary**. Dictionaries are wrapped with curly braces "{}" and use keys to reference values. Numbers or strings can be used as keys.
```Python
car = {
    "make": "toyota",  # key: value
    "color": "silver",  # key: value
    "miles": 50  # key: value
}
```
As you can see above, *car* has three keys: make, color, and miles. The values do not have to be the same type - in this case two are strings and one an integer.

In [None]:
car = {"make": "toyota", "color": "silver", "miles": 50}

# Dictionaries have a built-in method call "keys" that returns the key names
print(car.keys())

# Values can be looked up by using keys
print(f'make: {car["make"]}')

# New keys can be added to an existing dictionary
car["model"] = "prius"
print(car.keys())

# An existing key's value can be updated
print(f'color: {car["color"]}')
car["color"] = "red"
print(f'color: {car["color"]}')

In [None]:
# Dictionary containing information on five cities located in South America
city_data = {
    'City': ['Buenos Aires', 'Brasilia', 'Santiago', 'Bogota', 'Caracas'],
    'Country': ['Argentina', 'Brazil', 'Chile', 'Colombia', 'Venezuela'],
    'Latitude': [-34.58, -15.78, -33.45, 4.60, 10.48],
    'Longitude': [-58.66, -47.91, -70.66, -74.08, -66.86]
}

New items can be added to the end of existing lists by using the *append* function.
```Python
my_shopping_list = ['eggs', 'milk', 'fruit']
my_shopping_list.append('bread')
```

In [None]:
my_shopping_list = ['eggs', 'milk', 'fruit']
print(my_shopping_list)
my_shopping_list.append('bread')
print(my_shopping_list)

# YOUR TURN: Add two additional items to the shopping list and then print the updated list.
# YOUR CODE HERE


# Previously added items can be removed using the "pop" function.
# Uncomment the following four lines and re-run this cell to see what "pop".
#my_shopping_list.pop()
#print(f'Updated list: {my_shopping_list}')
#removed_item = my_shopping_list.pop()
#print(f'Updated list: {my_shopping_list}. Removed item: {removed_item}')

**YOUR TURN**
In the cell below add the following two cities to the *city_data* dictionary.


City | Country | Latitude | Longitude
--- | --- | --- | --- 
Lima | Peru | -12.05 | -77.04
Georgetown | Guyana | 6.80 | -58.16

In [None]:
# Add Lima and Georgetown to "city_data"
# Remember, you will need to use city_data's keys (City, Country, etc.) to access each
# key's respective list

city_data["City"].append("Lima")
city_data["City"].append("Georgetown")
city_data["Country"].append("Peru")
city_data["Country"].append("Guyana")
city_data["Latitude"].append(-12.05)
city_data["Latitude"].append(6.80)
city_data["Longitude"].append(-77.04)
city_data["Longitude"].append(-58.16)

In [None]:
# Pandas DataFrame created from the dictionary
df = pd.DataFrame(city_data)
print(df.head())  # Head only returns the first five results if left empty
print("")
print(df.tail(2))  # Tail can be used to look at the end of a dataframe

In [None]:
# GeoDataFrame's require spatial information to be in a specific format
# The follow three lines creates shapely point objects for the lat/lon
df['Coordinates'] = list(zip(df.Longitude, df.Latitude))
df['Coordinates'] = df['Coordinates'].apply(Point)
gdf = geopandas.GeoDataFrame(df, geometry='Coordinates')
print(gdf.head(1))
print(type(gdf['Coordinates'].values[0]))

In [None]:
# GeoPandas has a few built-in datasets you can access at any time
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))

# Restricting the world map extent to the continent of South America and
# styling the fill and stroke (edge) colors
ax = world[world.continent == 'South America'].plot(
    color='white', edgecolor='black')

# Plotting the city data with blue points
gdf.plot(ax=ax, color='blue')

plt.show()

## Calculate new fields and create Choropleth maps

Columns in dataframes are examples of pandas series. New columns (series) can be generated by adding, subtracting, dividing, etc. columns.

In [None]:
d = {'colA': [1, 2], 'colB': [3, 4]}
df = pd.DataFrame(data=d)
print(df.describe())  # Describe generates statistics for the dataframe
print("")
print(f'Columns are pandas series: {type(df["colA"])}')

In [None]:
# Adding column A and column B creates a new series with the sums
df['colA'] + df['colB']

In [None]:
# Permanently add the new series to the dataframe by assigning the series to a new column name
df['colC'] = df['colA'] + df['colB']
df.head()

In [None]:
# Back to geopandas...

# Load example data (provided by geopandas)
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))

# This dataset has five attributes (columns) plus geometry (polygons)
world.head()

In [None]:
# Basic plot of the dataset with random color
world.plot()

*geopandas* makes it easy to create Choropleth maps (maps where the color of each shape is based on the value of an associated variable). Simply use the plot command with the `column` argument set to the column whose values you want used to assign colors.

In [None]:
# We can create a subset of the world dataset that only contains countries in the continent of Africa
africa = world[(world.pop_est > 0) & (world.continent == "Africa")].copy()  # Countries with no population are excluded as well
africa.head()

In [None]:
# Find the coordinate system for the polygons in our geopandas dataframe using "crs"
print(africa.crs)  # Returns epsg:4326 also known as WGS84 (degrees) - a geographic coordinate system

# You will soon be asked to calculate population density so the
# dataframe should be put into a projected dataframe
africa = africa.to_crs({'init': 'epsg:3857'})  # epsg:3857 - WGS84 Web Mercator (meters)
print(africa.crs)

In [None]:
print(africa.loc[13]['name'])
print(africa.loc[13]['geometry'].area / 10**6)  # Convert from square meters to square kilometers
africa.loc[13]['geometry']  # Preview the geometry for an individual country by referencing its index position

**YOUR TURN** In the cell below, create a new column named *pop_den* for the dataframe *africa*. *pop_den* should contain the population density for each country.

In [None]:
africa['pop_den'] = # YOUR CODE HERE

africa.head()

In [None]:
# Choropleth map for population
africa.plot(column='pop_est')

In [None]:
# YOUR TURN - Create a Choropleth map for the newly calculate population density
africa.plot(column='YOUR_CODE_HERE')

In [None]:
# You can create a more useful map by specifying a color map and classification scheme
africa.plot(column="YOUR_CODE_HERE", cmap='OrRd', scheme='quantiles')  # ‘equal_interval’, ‘quantiles’ or ‘percentiles’ are other scheme choices