### Basic spatial data handling in Python via Jupyterhub

This is a Jupyter Notebook. It consists of *Markdown* cells for formatted text, and *Code* cells, which tell the server to actually **do computations** as you click on > 'Run this cell' (Shift+Enter).

In the first *Code* cell, below, we will load a module call 'os' and use it to inspect the file system of our account on the server

In [None]:
import os
os.getcwd() # get current working directory

In [None]:
os.listdir("./data") # list all files in the "/data" folder

There is a .tif file, a spatial raster, in the data folder. How can we load this type of data in Python? Well... there are several options. In plain Python, without any additional modules, this would be very cumbersome. GDAL could be used, which is very fast but also quite oldschool, with difficult syntax. Rasterio is a so-called wrapper around GDAL, meaning that it uses GDAL functionality under the hood, but wrapped in somewhat easier code.

In [None]:
import rasterio

my_file = "./data/T2M_daily_mean_max_topography_2011_2020_present_30.tif"

with rasterio.open(my_file) as src:
    data = src.read()
    transform = src.transform
    crs = src.crs
    height, width = data.shape[1], data.shape[2]

# an f-string allows to print text and interpret elements in {}
print(f"height and width of the provided data are {height, width}")
print(f"The CRS is {crs}, and the transformation parameters:")
print(transform)


In [None]:
print(data)

In [None]:
print(f"The shape of 'data' is {data.shape}")
print(f"The shape of 'data.squeeze()' is {data.squeeze().shape}")

A basic module for visualisation in Python is matplotlib.pyplot, which, by convention, is imported as 'plt'. The 'imshow' function works for 2-dimensional data

In [None]:
import matplotlib.pyplot as plt
plt.imshow(data.squeeze())

As we can see, the image looks like Berlin, with the Müggelsee and Havel visible. The axis ticks seem to correspond to height and width (2007 raster cells). There is still a simpler way for us to code this, by using a Python module with higher abstraction level.

In [None]:
import rioxarray

heat_raster = rioxarray.open_rasterio(my_file)
heat_raster.plot()

Not only was the code above shorter and easier to understand - but we also get a color bar and spatial coordinates along the axes.

Now, how about vector data? The module 'geopandas', a geospatial version of the tabular data library 'pandas', is likely the first choice for handling vector data in Python

In [None]:
import geopandas as gpd

berlin = gpd.read_file("./data/berlin.gpkg")
berlin.explore()#tiles="CartoDB positron")

Interactive visualization, powered by leaflet, is very convenient. However, it does get tricky in Python when combining raster and vector data!
We will need to install an additional module, which is not part of the HU Jupyterhub setup. This can be done by "%pip install {module name}"

In [None]:
%pip install cartopy

In [None]:
print("CRS of the raster file:", heat_raster.rio.crs)
print("CRS of the vector file:", berlin.crs)

In [None]:
berlin = berlin.to_crs(32633)
print("CRS of the raster file:", heat_raster.rio.crs)
print("CRS of the vector file:", berlin.crs)

In [None]:
import cartopy.crs as ccrs

fig=plt.figure(figsize=[12,8])
ax = fig.add_axes([0,0,1,1],projection=ccrs.UTM(32))
raster_image=heat_raster.plot(ax=ax, cmap="magma")
berlin.plot(ax=ax,color='none', edgecolor="white",linewidth=2)
plt.show()

In [None]:
from rasterio.features import rasterize
help(rasterize)

In [None]:
rasterized = rasterize(
        (berlin.geometry[0], 1),
        out_shape=heat_raster.shape[1:],
        transform=heat_raster.rio.transform(),
        fill=0,
        all_touched=True
    )

In [None]:
plt.imshow(rasterized)

In [None]:
with rasterio.open(
    "./data/berlin_rasterized.tif", # filename
    'w',                            # open in 'write' mode
    driver='GTiff',                 # file type
    height=height,                  # size of dimension 1
    width=width,                    # size of dimention 2
    count=1,                        # size of dimension 3 (layers)
    dtype=data.dtype,               # data type, e.g. byte, int8, float32
    crs=crs,                        # coordinate reference system
    transform=transform             # geotransformation parameters
) as dst:
    dst.write(rasterized, 1)        # data to write into that file

Now we know how to do a few basic things. Code that is likely to be used often can be outsourced to functions (like in R). The provided script "customFunctions.py" contains some functions to be used in the remainder of this exercise. If the script is in the same folder as *this* notebook, the functions can be imported like any official module

In [None]:
from customFunctions import writeRaster
help(writeRaster)