# Explore data & load to PostGIS using ogr2ogr

In this section we will use the GDAL command line utility [ogrinfo](https://gdal.org/programs/ogrinfo.html) to list explore the datasets and make sure they match our expectations. The following commands are run on the command line. This jupyter lab setup provides you with a linux bash shell with the necessary commands configured. 

Let's open the terminal and navigate to the folder of this story (same folder as this jupyter notebook). Use the commands `pwd` (shows where you currently are), `ls` (lists folder and filenames) and `cd` (change directory). Good to know: When using `cd` you can start typing and hit Tabulator for autocompletion.

# Explore road network data

Let's now use ogrinfo to explore the road network shapefile using read only mode (`-ro`) and print only summary information (`-so`). Run the following command in the terminal:

```shell
ogrinfo -ro -so "./data/20220405_veloFusswegnetzZurich/taz_mm.tbl_routennetz.shp"
```

![open terminal](./story_images/open_terminal.gif)

The output lists all layers in the data source. Not surprisingly there is only a single layer in the shapefile:
```
INFO: Open of `./data/20220405_veloFusswegnetzZurich/taz_mm.tbl_routennetz.shp'
     using driver `ESRI Shapefile' successful.
1: taz_mm.tbl_routennetz (Line String)
```

When a layer is specified, ogrinfo provides useful information about this specific layer. Run the following in the terminal:

```shell
ogrinfo -ro -so "./data/20220405_veloFusswegnetzZurich/taz_mm.tbl_routennetz.shp" "taz_mm.tbl_routennetz"
```

The terminal should show the following output:
```
INFO: Open of `./data/20220405_veloFusswegnetzZurich/taz_mm.tbl_routennetz.shp'
      using driver `ESRI Shapefile' successful.

Layer name: taz_mm.tbl_routennetz
Metadata:
  DBF_DATE_LAST_UPDATE=2022-04-05
Geometry: Line String
Feature Count: 40065
Extent: (2676247.120400, 1241239.066500) - (2689662.340100, 1254306.994900)
Layer SRS WKT:
PROJCRS["CH1903+ / LV95",
    BASEGEOGCRS["CH1903+",
        DATUM["CH1903+",
            ELLIPSOID["Bessel 1841",6377397.155,299.1528128,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4150]],
    CONVERSION["Swiss Oblique Mercator 1995",
        METHOD["Hotine Oblique Mercator (variant B)",
            ID["EPSG",9815]],
        PARAMETER["Latitude of projection centre",46.9524055555556,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8811]],
        PARAMETER["Longitude of projection centre",7.43958333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8812]],
        PARAMETER["Azimuth of initial line",90,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8813]],
        PARAMETER["Angle from Rectified to Skew Grid",90,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8814]],
        PARAMETER["Scale factor on initial line",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8815]],
        PARAMETER["Easting at projection centre",2600000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8816]],
        PARAMETER["Northing at projection centre",1200000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8817]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping (large and medium scale)."],
        AREA["Liechtenstein; Switzerland."],
        BBOX[45.82,5.96,47.81,10.49]],
    ID["EPSG",2056]]
Data axis to CRS axis mapping: 1,2
id1: Real (20.0)
velo: Integer (6.0)
velostreif: String (5.0)
veloweg: Integer (6.0)
einbahn: String (5.0)
fuss: Integer (6.0)
name: String (150.0)
map_velo: Integer (6.0)
map_fuss: Integer (6.0)
se_anno_ca: String (254.0)
objectid: Real (38.0)
```

**How cool is that?** With this simple command we get a summary about the number of features (around 40k), the coodinate reference system (CH1903+ / LV95) and the attribute data (columns of the attribute table). The dataset also has a meta data document (metadaten.pdf) which contains valuable additional information about how to interpret the attributes.

**Your turn:**
- Using the meta data document, what attribute do you think is suitable for our bike indicator to distinguish what roads can be used by bikes (velos)?

# Let's learn about the district data
Let's now explore also the district data with the same two step approach, first let's find the name of the data layer and then use the name of the layer to get information about it. The data can be found in the data folder under 20220405_statistischeQuartiereZurich.

**Your turn:**
- What is the geometry type of the features?
- How many features are there?
- What is the coordinate reference system?
- What columns does the attribute table have?

# Conclusion data exploration
Upon exploration you saw that both datasets are in the new swiss coordinate reference system (CH1903+ / LV95) which is suitable for our usecase at the city level. You also found that the attribute `velo` seems to be a good indicator whether a road is suitable (1) for a bike or not (0). The data looks all good and you feel ready to load it into PostGIS.

# Load the data into PostGIS
`ogr2ogr` is an extremely powerful command line tool that converts between almost dataformats, for example from shapefile to a new PostGIS table. The following commands might look intimidating at first due to the many parameters, but we will explain them step by step later.

**Your turn:**
- Replace DATABASE_NAME, HOST, PORT, USERNAME and PASSWORD in the commands below with the connection information of the PostGIS sandbox component.
- Run both commands to load the data into the database.

Load the road network data:
```shell
ogr2ogr \
-f "PostgreSQL" \
-progress \
-nln "zh_roads" \
-nlt PROMOTE_TO_MULTI \
-lco FID=fid \
-lco GEOMETRY_NAME=geom \
--config OGR_TRUNCATE YES \
PG:"dbname='DATABASE_NAME' host='HOST' port='PORT' user='USERNAME' password='PASSWORD'" \
"./data/20220405_veloFusswegnetzZurich/taz_mm.tbl_routennetz.shp"
```

Load the districts data:

```shell
ogr2ogr \
-f "PostgreSQL" \
-progress \
-nln "zh_districts" \
-nlt PROMOTE_TO_MULTI \
-lco FID=fid \
-lco GEOMETRY_NAME=geom \
--config OGR_TRUNCATE YES \
PG:"dbname='DATABASE_NAME' host='HOST' port='PORT' user='USERNAME' password='PASSWORD'" \
"./data/20220405_statistischeQuartiereZurich/stzh.adm_statzonen_v.shp"
```

Let's have a look at the parameters:
- `-f "PostgreSQL"` - Specify the target format to be a PostgreSQL (PostGIS) table.
- `-progress` - Display a progress bar when loading the data.
- `-nln "zh_roads"` - The name of the new database table should be zh_roads.
- `-nlt PROMOTE_TO_MULTI` - If single and multi geometries are mixed, promote all to multi to have uniform geometries.
- `-lco FID=fid` - Create a feature id column named fid.
- `-lco GEOMETRY_NAME=geom` - Name the geometry column geom.
- `--config OGR_TRUNCATE YES` - Drop all rows before loading data if a table with that name already exists. This allows overwrites existing data without destroying views on the data. 
- `PG:"dbname='DATABASE_NAME' host='HOST' port='PORT' user='USERNAME' password='PASSWORD'"` - A connection string holds all necessary data to establish a connection to the database. Replace DATABASE_NAME, HOST, PORT, USERNAME and PASSWORD with the connection information of the PostGIS sandbox component. 
- `"./data/20220405_veloFusswegnetzZurich/taz_mm.tbl_routennetz.shp"` - Path to the file to load.


# Use pgAdmin to check the new tables in the database

**Your turn:**
- Use pgAdmin to connect to the database and make check if you see the new tables.

![check tables](./story_images/check_tables.gif)