# Intro to `geopandas`

The beauty of `geopandas` is that it enables us to manage spatial info using the Python Data Analysis Library: https://pandas.pydata.org

Let's start by installing the package:

In [None]:
# Example
# import sys
# !conda install --yes --prefix {sys.prefix} numpy=1.22
# !conda install --yes --prefix {sys.prefix} geopandas
import geopandas as gpd

## A quick recap on `pandas`

Pandas is a Python package "providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive". 

It provides us with a range of capabilities:

- DataFrame object for data manipulation with integrated indexing.
- Tools for reading and writing data between in-memory data structures and different file formats.
- Data alignment and integrated handling of missing data.
- Reshaping and pivoting of data sets.
- Label-based slicing, fancy indexing, and subsetting of large data sets.
- Data structure column insertion and deletion.
- Group by engine allowing split-apply-combine operations on data sets.
- Data set merging and joining.
- Hierarchical axis indexing to work with high-dimensional data in a lower-dimensional data structure.
- Time series-functionality: Date range generation[6] and frequency conversions, moving window statistics, moving window linear regressions, date shifting and lagging.
- Provides data filtration.




## So what is special about `geopandas`?

"GeoPandas is a project to add support for geographic data to pandas objects. It currently implements GeoSeries and GeoDataFrame types which are subclasses of pandas.Series and pandas.DataFrame respectively. GeoPandas objects can act on shapely geometry objects and perform geometric operations."

See the Git repo for more information: https://github.com/geopandas/geopandas

The GeoPandas dataframe holds a geometry column which enables cartesian geometry operations (meaning it can interpret pairs of numerical coordinates in space). 

The coordinate reference system (crs) can be stored as an attribute on an object, and is automatically set when loading from a file. Objects may be transformed to new coordinate systems with the `to_crs()` method. 

Here we will cover the following basic operations:

- Reading data to a geopandas dataframe
- Manipulating column data 
- Creating a new column
- Changing coordinate reference systems
- Writing data to a geopandas dataframe


### Reading vector shapefile data to a `geopandas` dataframe

Let's read in the shapefile for GMU. 

To load this in, we can find the current folder using the `os` package which we previously used, as follows, via the `getcwd` function:

In [None]:
import os

## getcwd stands for 'get current working directory'
current_dir = os.getcwd()

print(current_dir)    

The `current_dir` variable is merely a string of the directory path which we can manipulate.

In [None]:
## getcwd stands for 'get current working directory'
current_dir = os.getcwd()

path = current_dir + '/../'+  'data'+ '/gmu.shp'

print(path)    

Now we're ready to read in the data using the path we've specified.

Let's first load `geopandas` which should already be installed in your environment. 

Then we can use the GeoPandas function `read_file` and provide the following arguments:
- `path` which contains the path to the shapefile we want to load, and
- `crs` which states the coordinate reference system


In [None]:
#load the file as the variable named data
data = gpd.read_file(path, crs='epsg:4326') 
print(data)

## Basic `geopandas` functions

`geopandas` provide us with some great functionality, for example, we can change the crs as follows:

In [None]:
# The previous crs was in decimel degrees (epsg:4326), so let's change to meters ('epsg:3857')
data = data.to_crs('epsg:3857')
print(data)

Now we are working with a crs which is in meters, we can take the area of this shape as follows:

In [None]:
# Due to our current CRS, the area will be in square meters
data['area'] = data['geometry'].area 
print(data)

The beauty is we can manipulate this as a normal pandas dataframe.

So let's for example, convert our square meters into square kilometers (which requires us to divide by 1e6)

Remember, we can select a variable by using the square parentheses to index (e.g. `data['area']` gets the area column), and then create a new column this way too (e.g. `data['area_km2']` is the new column we wish to make).

In [None]:
data['area_km2'] = data['area'] / 1e6
print(data['area_km2'])

We can see the whole dataframe structure with our new column, as follows:

In [None]:
print(data)

We are able to loop over any content in a GeoDataFrame the same way we would a normal DataFrame, by using the `iterrows()` function, as follows:

In [None]:
for row in data.iterrows():
    print(row)

This means we can access and print specific parts of each row. 

The important thing to remember is that you have the row index (here it's a zero) and then the actual row information.

For example, we can break out the row index here using `[0]`, and the row information using `[1]`:

In [None]:
for row in data.iterrows():
    
    ##this will print our row index
    print(row[0]) 

    ##this will print our row information
    print(row[1])

We can then access just the geometry as follows:

In [None]:
for row in data.iterrows():
    
    ##this will print our row geometry
    print(row[1]['geometry'])

And we can carry out any manipulations we want in this loop, such as taking the area (let's reuse this as it used it before, so you will be familiar):

In [None]:
for row in data.iterrows():
    
    ##this will print our row geometry
    area_km2 = (row[1]['geometry'].area / 1e6)
    
    ##this will round our area to 1 decimal place
    area_km2 = round(area_km2, 1)
    
    print("The area of GMU campus is {} square kilometers".format(area_km2))