# (OPTIONAL) Inspect data & load to postgis using python geopandas

It is also possible to use the Python ecosystem to inspect spatial data and load it to PostGIS. This section is optional, as the result is exactly the same as when using `ògr2ogr`. It is simply a different set of tools to achieve the same goal. Depending on your preferences and already existing technology stack, you might prefer one way over the other. In the following Python is used to explore and load the districts dataset into PostGIS.

We will make use of [Geopandas](https://geopandas.org/en/stable/), which is built on the famous python package Pandas. Geopandas interfaces with many other specialized packages of the python geo-ecosystem to provide an amazing user experience.

# Explore districts data
Run the following cells to read and visualize the data.

In [None]:
import geopandas

# Reading data is straight-forward with Geopandas. Nice to know: Under the hood Geopandas uses the 
# specialized Fiona package, which is all about reading and writing data.
districts_data = geopandas.read_file("./data/20220405_statistischeQuartiereZurich/stzh.adm_statzonen_v.shp")

In [None]:
# Using .head(N) we can display the first N rows of data.
districts_data.head(3)

In [None]:
# Geopandas makes it easy to obtain all kind of information about the data we loaded.
print(f'Nr of features: {len(districts_data)}')
print(f'Coordinate reference system: {districts_data.crs}')
print(f'Nr of attribute columns: {len(districts_data.columns)}')
print(40*'-')
print(f'Column names:')
for column in districts_data.columns:
    print(column)

In [None]:
# Using .plot() generates static visualizations. 
# It uses the famous matplotlib package under the hood. 
districts_data.plot()

In [None]:
# There is even the possibility to visualize data in an interactive way using .explore().
# This is possible thanks to Geopandas making use of the folium python package.
districts_data.explore(column='stzname', legend=False)

# Load the data into PostGIS
Geopandas uses packages like sqlalchemy under the hood which are specialized in interacting with databases. A first step is the creation of a connection string, a simple text which contains all information needed to connect to the database following a particular convention. This connection string is then used to establish a connection to the database (called engine below) which is used by geopandas to load the data into PostGIS.

**Your turn:**
- Replace DATABASE_NAME, HOST, PORT, USERNAME and PASSWORD in the cell below with the connection information of the PostGIS sandbox component. Make sure to keep the quotes (') so that python reads the connection information as strings (text).
- Run both cells below to load the data into the database.
- Once again you can use pgAdmin to check the newly created table in the database.

In [None]:
user = 'USER'
password = 'PASSWORD'
host = 'HOST'
port = 'PORT'
database_name = 'DATABASE_NAME'

connection_string = f'postgresql://{user}:{password}@{host}:{port}/{database_name}'
print(f'{connection_string=}')

In [None]:
from sqlalchemy import create_engine

table_name = 'zh_districts_from_geopandas' 
print(f'Start loading to PostGIS table with name {table_name}...')
engine = create_engine(connection_string)
districts_data.to_postgis(table_name, engine, if_exists='replace', index=False)
print('Successfully loaded')