# Reading data into GeoPandas

## Review

Basic data structure in GeoPandas is a GeoDataFrame.

A GeoDataFrame is simply a regular Pandas data frame with the addition of a GeoSeries

A series in Pandas is analagous to a field in a database or a column in a table or an attribute in a feature dataset.

A GeoSeries is a special type of column that contains vector geometry.

The geometries are handled internally as Shapely object

## Reading from single layer files

GeoPandas uses Fiona under the hood to read geospatial data into GeoDataFrames

GeoPandas can also read directly from a PostGIS database

### Reading Shapefiles


In [None]:
%matplotlib inline
import geopandas as gpd

In [None]:
raptors = gpd.read_file("data/Raptor_Nests.shp")

View the GeoDataFrame as an attribute table

In [None]:
raptors.tail(10)

View the GeoDataFrame as a map

In [None]:
raptors.plot(figsize=(10,10))

View the map with colors determined by a column value

In [None]:
raptors.plot(column = "recentstat", legend=True, figsize=(10, 10))

Get basic information about a GeoDataFrame using the info method

Remember that GeoPandas is based on Pandas and Pandas uses NumPy behind the scenes and NumPy is written in C and uses C data types.

Because of this the Data Type (Dtype) column displays C data types and string and date types are viewed as objects in C.

In [None]:
raptors.info()

You can also get information about the GeoDataFrame's Coordinate Reference System

In [None]:
raptors.crs

We can of course, view the same data in UTM Zone 13 CRS if we have that information available

In [None]:
raptors_utm = gpd.read_file("data/Raptor_UTM.shp")

In [None]:
raptors_utm.crs

And if we plot it out it seems to be the same data, but the values in the X and Y axes are different because the coordinates are now in UTM

In [None]:
raptors_utm.plot(column="recentstat", legend=True, figsize=(10, 10))

But with GeoPandas it is very easy to convert to a different coordinate reference system using the GeoDataFrames to_crs method

In [None]:
raptors_utm2 = raptors.to_crs(epsg=26913)

In [None]:
raptors_utm2.crs

In [None]:
raptors_utm2.plot(column="recentstat", legend=True, figsize=(10, 10))

### Reading data from a GeoJSON file

Not really different than reading data from a shapefile

In [None]:
raptors_jsn = gpd.read_file("data/wildlife_raptor.geojson")

In [None]:
raptors_jsn.crs

In [None]:
raptors_jsn.plot(figsize=(10,10))

In [None]:
raptors_jsn.info()

By now you may be thinking looks great for points but what about polygons?

In [None]:
buowl = gpd.read_file("data/wildlife_buowl.geojson")

In [None]:
buowl.crs

In [None]:
buowl.info()

In [None]:
buowl.plot(column="recentstatus", legend=True, figsize=(10,10))

In [None]:
buowl

## Reading layers from a multi-layer file

Not fundamentally different from a single layer file, you just need to specify the name of the layer

### Reading data from a geopackage

In [None]:
eagles = gpd.read_file("data/DJBasin.gpkg")

In [None]:
eagles.info()

Will read the first layer in the geopackage.

But what if you want to read a specific layer?
What if you don't know what layers are contained in the layer?

In [None]:
import fiona
fiona.listlayers("data/DJBasin.gpkg")

In [None]:
eagles = gpd.read_file("data/DJBasin.gpkg", layer = "BAEA_Nests")

In [None]:
eagles.info()

In [None]:
eagles.plot(column="status", legend=True, figsize=(10,10))

If you want, you can read all of the layers at once into a dictionary that has the layer name as the key

In [None]:
layers = {} # create an empty dictionary named layers
for lyr in fiona.listlayers("data/DJBasin.gpkg"): # loop through the layers in the file
    # add a new entry to the layers dictionary with the layer name as the key and the GeoDataFrame as the value
    layers[lyr] = gpd.read_file("data/DJBasin.gpkg", layer = lyr) 

In [None]:
layers["BAEA_Nests"].plot()

### Read data from a file geodatabase

First lets list the layers in the database

In [None]:
fiona.listlayers("data/DJBasin.gdb")

Now we can read the layers just like we read the geopackage layer

In [None]:
gbh = gpd.read_file("data/DJBasin.gdb", layer = "Heron_Rookeries")

In [None]:
gbh.crs

In [None]:
gbh.info()

In [None]:
gbh.plot(figsize=(10,10))

## Read data from a PostGIS database

To me this is the ultimate combination. Besides the fact that PostGIS will provide the best performance and most features of any storage format, if you know basic Spatial SQL you can do a lot of data manipulation on the fly using SQL b efore the data even comes into GeoPandas and thus you can save yourself a lot of steps when it comes to cleaning data.

Reading data from a PostGIS database requires a database connection object that is easily created with the psycopg modules that we installed along with GeoPandas.

First we import psycopg2

In [None]:
import psycopg2

Next we create a database connection object

In [None]:
try:
    conn = psycopg2.connect("dbname='korimage_enterprise' user='korimage_reader' host='www.korimageo.com' password='reader'")
except:
    print("I am unable to connect to the database")

Finally we submit a SQL query to the database using the GeoPandas read_postgis method

In [None]:
eagles = gpd.read_postgis("SELECT gid, lat_y_dd AS latitude, long_x_dd AS longitude, status, nest_id, geom FROM dj_eagle WHERE gid <=70 AND status = 'ACTIVE NEST'", conn)

In [None]:
eagles.info()

In [None]:
eagles.plot()

In [None]:
import pandas as pd
eagle_table = pd.read_sql_query("SELECT gid, lat_y_dd AS latitude, long_x_dd AS longitude, status, nest_id, geom FROM dj_eagle WHERE gid <=70", conn)

In [None]:
eagle_table.plot()

In [None]:
%who_ls GeoDataFrame