## Introduction to GeoPandas

### Part 1: Working with GeoDataFrames


This workshop introduces some of the concepts and basic features of GeoPandas, an extension the popular data science library Pandas. GeoPandas works with many vector data formats such as shapefiles, geoJSON files, and geoPackages.

GeoPandas Documentation: https://geopandas.org/en/stable/docs.html

Begin by importing the following libraries.

```import pandas as pd```

```import geopandas as gpd```

```import matplotlib.pyplot as plt```

```%matplotlib inline```

### GeoDataFrame

A GeoDataFrame is a two-dimensional data structure expressed as columns and rows. A GeoDataFrame is a Pandas DataFrame that contains a column for geometry.

Create a Pandas dataFrame named ```df``` with data for 4 cities

In [394]:
df = pd.DataFrame(
    {'City': ['San Francisco', 'San Jose','Palo Alto','Gilroy'],
     'Latitude': [37.78, 37.32, 37.44, 37.00],
     'Longitude': [-122.39,-121.87,-122.14,-121.56]})
df

Unnamed: 0,City,Latitude,Longitude
0,San Francisco,37.78,-122.39
1,San Jose,37.32,-121.87
2,Palo Alto,37.44,-122.14
3,Gilroy,37.0,-121.56


In [1]:
type(df)

To make this a GeoDataFrame, specify a coordinate reference system using  ```crs``` and use the ```points_from_xy()``` function to create the geometry data from the **Longitude** and **Latitude** columns.

```cities = gpd.GeoDataFrame(df, crs=4326, geometry=gpd.points_from_xy(df.Longitude, df.Latitude))```

```type(cities)```

### Reading files

The ```read_file()``` function creates a GeoDataFrame from an input file or URL. 

Create a GeoDataFrame named ```counties``` from a shapefile of California county boundaries.

```counties = gpd.read_file('data/tl_2020_06_county20.shp')```

```counties```

### Inspecting the GeoDataFrame

View the Data Types

```counties.dtypes```

View geometries and types

```counties.geometry```

```counties.geom_type```

View the GeoDataFrame on a map using the ```plot()```  and ```explore()``` functions.

```counties.head()```

Create a new GeoDataFrame named ```railways``` from a GeoJSON file of railway lines in the San Francisco Bay Area.

```railways = gpd.read_file('data/Passenger_Railways_2019.geojson')```

Create a new GeoDataFrame named ```stations``` from ```data/Passenger_Rail_Stations_2019.geojson```

### Projections and Coordinate Reference Systems 

To view the Projection or Coordinate Reference System (CRS) of a geoDataFrame use the ```crs``` property.

```railways.crs```

```stations.crs```

```counties.crs```

Transform the ```counties``` GeoDataFrame into WGS84 (4326) using the ```to_crs()``` function. View the crs.

```counties = counties.to_crs(4326)```

```counties.crs```

### Finding and Selecting Data

Searching and filtering the GeoDataFrame by rows, columns, or cell values.

View the first 7 rows of the ```counties``` GeoDataFrame

```counties[:7]```

View the **GEOID20** column

```counties[['GEOID20']]```


View the **GEOID20** and **NAME20** columns

```counties[['GEOID20','NAME20']]```

View the first 19 rows of the **GEOID20** and **NAME** columns

```counties[['GEOID20','NAME20']][0:18]```

View the first row of ```counties```

### Selecting Data

Selecting data by columns

Subset the GeoDataFrame to contain only the **GEOID20**, **NAME20**, and **geometry** columns.

```counties = counties[['GEOID20', 'NAME20','geometry']]```

In [5]:
counties = gpd.read_file('data/tl_2020_06_county20.shp')
counties = counties.to_crs(4326)

Use the ```rename()``` function and provide a dict of column names to update

```counties.rename(columns={'NAME20':'County','GEOID20':'FIPS'}, inplace=True)```

Plot the data using ```explore```

Selecting data by cell values

From ```railways```, create a new GeoDataFrame ```caltrainRail``` containing only data where the **operator** column equals Caltrain.

```caltrainRail = railways.loc[railways['operator']=='Caltrain']```

From ``` stations```, create a new GeoDataFrame ```caltrainStations``` containing only data where the **agency_nm** column contains CALTRAIN

```caltrainStations = stations.loc[stations['operator']=='CALTRAIN']```

You can search for a specified string within a cell value with ```str.contains()```

```stations.loc[stations['station_na'].str.contains('Oakland')]```

Use the ```isin()``` function to select only rows matching San Francisco, San Mateo, or Santa Clara from the **County** column.

```SFSMSC = counties.loc[counties['County'].isin(['San Mateo','Santa Clara','San Francisco'])]```

Plot the data. Use ```edgecolor='black'``` to view the boundary lines

Notice this map shows the Farralon Islands off the coast of San Francisco. We can find and remove them from the GeoDataFrame using the ```explode()``` function.

```SFSMSC = SFSMSC.explode(index_parts=False)```

Extract only the first 3 rows of the GeoDataFrame

```SFSMSC = SFSMSC[0:3]```

Plot the data

```SFSMSC.plot(edgecolor='black', color='gray', alpha=0.3)```

### Plotting Data

Plot the boundaries, railways, and stations.

```SFSMSC.plot(color='gray', alpha=0.3)```

```caltrainStations.plot(color='red')```

```caltrainRail.plot(color='black')```

Plot all of the data in the same figure

```fig, ax = plt.subplots(figsize=(10, 10))```

```SFSMSC.plot(ax=ax, alpha=0.3, color="gray")```

```caltrainRail.plot(ax=ax, color='black')```

```caltrainStations.plot(ax=ax, alpha=0.5, color='red')```

Add labels from the ```cities``` geoDataFrame

```fig, ax = plt.subplots(figsize=(10, 10))```

```SFSMSC.plot(ax=ax, alpha=0.3, color="gray", edgecolor='white')```

```caltrainRail.plot(ax=ax, color='black')```

```caltrainStations.plot(ax=ax, alpha=0.5, color='red')```

```for x, y, label in zip(cities.geometry.x, cities.geometry.y, cities['City']):
    ax.annotate(label, xy=(x, y), xytext=(3,3), textcoords="offset points")
ax.set_axis_off()```

### Writing files

Use the```to_file()``` function to write a GeoDataFrame to a file

Create a shapefile from ```caltrainStations```

```caltrainStations.to_file('data/caltrainStations.shp')```

Save the map as a png file


```fig.savefig("data/caltrain.png")```

Save ```SFSMSC```, ```caltrainStations```, and ```caltrainRails``` layers to a GeoPackage

```SFSMSC.to_file("data/caltrain.gpkg", layer='counties')```

```caltrainStations.to_file("data/caltrain.gpkg", layer='stations')```

```caltrainRail.to_file("data/caltrain.gpkg", layer='railways')```

### Reading Data from a CSV file

First, create a Pandas DataFrame using ```read_csv()```

This csv file contains information (including lat/long points) about Public Libraries in San Francisco

```df = pd.read_csv('data/libraries.csv')```

In [7]:
type(df)

Use the ```points_from_xy()``` function to assign geometries from the **longitude** and **latitude** columns.

```libraries = gpd.GeoDataFrame(df, 
    geometry = gpd.points_from_xy(df['longitude'], df['latitude']), 
    crs = 'EPSG:4326')```

Write the data to a new file

```libraries.to_file('data/libraries.geojson')```