# Looking at channel profiles in the Killmade Burn

*This lesson has been written by Simon M. Mudd at the University of Edinburgh*

*Last update 10/11/2021*

This notebook looks at some data from the Killmade Burn, a tributary to the Whiteadder Water, a small catchment in the Scottish Borders.

This is a study catchment for our course *Eroding Landscapes*. 

## Load the data

We are going to load the data using `geopandas`. 
`pandas` is a python packages for dealing with various datasets, and it is very good at handling csv data. `geopandas` builds on `pandas` so that spatial information is recognised by package. Before we do anything with the data we need to `import` these two python packages. 

In [None]:
import pandas as pd
import geopandas as gpd

Data that goes into `pandas` is called a *dataframe*. The dataframe holds the data but also the data column names.
To get a `geopandas` dataframe we first load a csv file into a `pandas` dataframe. I'll use the `head` command to show you what the first few rows of data look like:

In [None]:
# Read some csv data into a pandas dataframe. 
df = pd.read_csv("el_study_chi_data_map.csv")
df.head()

Now we read this into a `geopandas` dataframe.
`geopandas` is very similar to `pandas` execpt that it knows where in space the data is. That is, each row of your dataframe has some "geometry" information. 

You need to tell `geopandas` which columns from your pandas dataframe hold the x and y locations. 
In this example the x and y locations are longitude and latitude. 

You do this with the function `.points_from_xy` and the way you tell `geopandas` to use the correct columns is like this:
`gpd.points_from_xy(df.longitude, df.latitude))`. 

See the code below for the full example:

In [None]:
# Create a geopandas dataframe by telling it where the x and y columns are in the pandas dataframe
gdf = gpd.GeoDataFrame(
    df, geometry=gpd.points_from_xy(df.longitude, df.latitude))
gdf.head()

It is really easy to get data out of `pandas` and `geopandas` dataframes. Observe:

In [None]:
# You can get data using 
# both the name of the column after a full stop 
# or the column in quotes within brackets
print(gdf.flow_distance)
print(gdf["flow_distance"])

## Plot some points

`geopandas` has some basic plotting routines. Lets plot these points:

In [None]:
gdf.plot(marker='o', color="k", markersize=5);

Lets add some styling to this plot. I need to `import matplotlib` for this first.

In [None]:
import matplotlib.pyplot as plt
gdf.plot(marker='o', color="k", markersize=5);
plt.xlabel("longitude")
plt.ylabel("latitude")

Those funny channels to the right are in the lake. We are only interested in the channels to the right. 
They actually have a different `basin_key` than the channels to the left. We can plot the basin numbers:

In [None]:
gdf.plot(marker='o', c=gdf.basin_key, markersize=5)
plt.xlabel("longitude")
plt.ylabel("latitude")

If you want to select only one basin, you can make a new dataframe. 
In the below example I have selected `basin_key == 4` which happens to be the Killmade Burn, our study site. 

In [None]:
gdf_b0 = gdf[(gdf['basin_key'] == 4)]
gdf_b0.plot(marker='o', c=gdf_b0.source_key, markersize=5)
plt.xlabel("longitude")
plt.ylabel("latitude")

## Plot some profiles

Killmade Burn is basin 4. Basin 0 is also interesting (it is the on on the opposite side of the valley). 
We can plot these valleys by selecting the correct data:

In [None]:
basin_0 = 0
gdf_b0 = gdf[(gdf['basin_key'] == basin_0)]

fig = plt.figure()
ax = fig.add_subplot(1, 1,1)

plt.scatter(gdf_b0.chi,gdf_b0.elevation,c=gdf_b0.flow_distance)
plt.xlabel(r"$\chi$ (m)")
plt.ylabel("elevation (m)")
ax.text(0.1,0.9,"Basin "+str(basin_0),transform=ax.transAxes)
fig.show()


We could add a second basin:

In [None]:
import numpy as np

basin_0 = 0
gdf_b0 = gdf[(gdf['basin_key'] == basin_0)]
basin_4 = 4
gdf_b4 = gdf[(gdf['basin_key'] == basin_4)]

fig = plt.figure()
fig.set_size_inches(18.5, 10.5)
ax = fig.add_subplot(1, 1,1)

plt.scatter(gdf_b0.chi,gdf_b0.elevation,c=gdf_b0.flow_distance)

# I am adding a chi distance of 8 to the chi coordiante to have the basins stack
plt.scatter(np.add(gdf_b4.chi,5),gdf_b4.elevation,c=gdf_b4.flow_distance)
plt.xlabel(r"$\chi$ (m)")
plt.ylabel("elevation (m)")

plt.text(0.5, 150, "Snails Cleugh", size=12,
         ha="left", va="top",
         bbox=dict(boxstyle="square",
                   ec=(0., 0., 0.),
                   fc=(1., 1.0, 1.0),
                   )
        )

plt.text(8, 150, "Killmade Burn", size=12,
         ha="left", va="top",
         bbox=dict(boxstyle="square",
                   ec=(0., 0., 0.),
                   fc=(1., 1.0, 1.0),
                   )
        )

ax.set_ylim(100,450)
fig.show()


We can also do this with the channel profile. 

In [None]:
basin_0 = 0
gdf_b0 = gdf[(gdf['basin_key'] == basin_0)]
basin_4 = 4
gdf_b4 = gdf[(gdf['basin_key'] == basin_4)]

fig = plt.figure()
fig.set_size_inches(18.5, 10.5)
ax = fig.add_subplot(1, 1,1)

plt.scatter(gdf_b0.flow_distance,gdf_b0.elevation,c=gdf_b0.flow_distance)

# I am adding a chi distance of 8 to the chi coordiante to have the basins stack
plt.scatter(np.add(gdf_b4.flow_distance,3000),gdf_b4.elevation,c=gdf_b4.flow_distance)
plt.xlabel("Flow distance (m)")
plt.ylabel("elevation (m)")

plt.text(1000, 150, "Snails Cleugh", size=12,
         ha="left", va="top",
         bbox=dict(boxstyle="square",
                   ec=(0., 0., 0.),
                   fc=(1., 1.0, 1.0),
                   )
        )

plt.text(4000, 150, "Killmade Burn", size=12,
         ha="left", va="top",
         bbox=dict(boxstyle="square",
                   ec=(0., 0., 0.),
                   fc=(1., 1.0, 1.0),
                   )
        )

ax.set_ylim(100,450)
fig.show()


We can also isolate source keys

In [None]:
print(gdf_b4.head())

In [None]:
import numpy as np

basin_0 = 0
gdf_b0 = gdf[(gdf['basin_key'] == basin_0) & ((gdf['source_key'] == 0) | (gdf['source_key'] == 2)) ]
basin_4 = 4
gdf_b4 = gdf[(gdf['basin_key'] == basin_4) & ((gdf['source_key'] == 35) | (gdf['source_key'] == 43)) ]

fig = plt.figure()
fig.set_size_inches(18.5, 10.5)
ax = fig.add_subplot(1, 1,1)

plt.scatter(gdf_b0.chi,gdf_b0.elevation,c=gdf_b0.flow_distance)

# I am adding a chi distance of 8 to the chi coordiante to have the basins stack
plt.scatter(np.add(gdf_b4.chi,5),gdf_b4.elevation,c=gdf_b4.flow_distance)
plt.xlabel(r"$\chi$ (m)")
plt.ylabel("elevation (m)")

plt.text(0.5, 150, "Snails Cleugh", size=12,
         ha="left", va="top",
         bbox=dict(boxstyle="square",
                   ec=(0., 0., 0.),
                   fc=(1., 1.0, 1.0),
                   )
        )

plt.text(8, 150, "Killmade Burn", size=12,
         ha="left", va="top",
         bbox=dict(boxstyle="square",
                   ec=(0., 0., 0.),
                   fc=(1., 1.0, 1.0),
                   )
        )

ax.set_ylim(100,450)
fig.show()


## Calculate the chi gradient

In the `channel_incision` directory you will find some notebooks about channel steepness. I'll summarize here:

* Channels tend to become gentler as you gain drainage area. If you look at landscapes we thing are steadliy eroding, the relasionship can be described with
$S = k_s A^{-\theta}$

* A number of studies have suggested $k_s$ correlates with erosion rates (measured with cosmogenics). 

* So we want to measure $k_s$ in landscape to use as a proxy for erosion rate. In the past, many authors used extracted this from Slope-Area data, but this is very noisy. 

* To reduce noise, we do a clever coordinate transformation that makes a coordiante, $\chi$, that incorporates drainage area. 

* This transformation is specifically designed so that the local slope in $\chi$-elevation space corresponds to $k_s$. 

* So, we are going to take the slope of $\chi$-elevation and see if there are any patterns.

First, we use the `gradient` function get the gradient between pixels.

* **WARNING**: *The way the data is organised, there are artifical jumps between source keys. So the data at the ends of each tributary is not correct.*

* **Explanation**: *When the data has finished a tributary, it jumps up to the headwaters of the next tributary, but the gradient function is not clever enough to realise this, so it just calculates a gradient between the bottom of one tributary and the top of the next one. This is nonsense, so the gradient data at the edges should be ignored).*

gdf["k_sn"] = np.gradient(gdf.elevation,gdf.chi)
gdf.head()

Now plot these data. 

In [None]:
basin_0 = 0
gdf_b0 = gdf[(gdf['basin_key'] == basin_0)]
basin_4 = 4
gdf_b4 = gdf[(gdf['basin_key'] == basin_4)]

fig = plt.figure()
fig.set_size_inches(18.5, 10.5)
ax = fig.add_subplot(1, 1,1)

plt.scatter(gdf_b0.chi,gdf_b0.k_sn,c=gdf_b0.source_key)

# I am adding a chi distance of 8 to the chi coordiante to have the basins stack
plt.scatter(np.add(gdf_b4.chi,10),gdf_b4.k_sn,c=gdf_b4.source_key)
plt.xlabel(r"$k_{sn}$")
plt.ylabel("elevation (m)")

plt.text(0.5, 150, "Snails Cleugh", size=12,
         ha="left", va="top",
         bbox=dict(boxstyle="square",
                   ec=(0., 0., 0.),
                   fc=(1., 1.0, 1.0),
                   )
        )

plt.text(8, 150, "Killmade Burn", size=12,
         ha="left", va="top",
         bbox=dict(boxstyle="square",
                   ec=(0., 0., 0.),
                   fc=(1., 1.0, 1.0),
                   )
        )

ax.set_ylim(0,300)
fig.show()

We can also smooth this data. `pandas` has lots of tools for smoothing data. You can use the `rolling` function to smooth the $k_{sn}$ values. Below I use a 25 pixel window. You can use different filters; in this case I use something called a `hamming` filter, which is too tedious to explain but if you really want to know about it there is always google. 

In [None]:
gdf['k_sn_smoothed'] = gdf['k_sn'].rolling(25,win_type='hamming').mean()

Okay, we will plot both the smoothed and the raw data. We can also isolate the sources (you can play with the `source_key` numbers). 

In [None]:
basin_0 = 0
gdf_b0 = gdf[(gdf['basin_key'] == basin_0)  & ((gdf['source_key'] == 0) | (gdf['source_key'] == 2)) ]
basin_4 = 4
gdf_b4 = gdf[(gdf['basin_key'] == basin_4)  & ((gdf['source_key'] == 35) | (gdf['source_key'] == 38)) ]

fig = plt.figure()
fig.set_size_inches(18.5, 10.5)
ax = fig.add_subplot(1, 1,1)

plt.scatter(gdf_b0.chi,gdf_b0.k_sn,c=gdf_b0.source_key,alpha = 0.4,s=0.2)
plt.scatter(gdf_b0.chi,gdf_b0.k_sn_smoothed,c=gdf_b0.source_key,s=1)

# I am adding a chi distance of 8 to the chi coordiante to have the basins stack
plt.scatter(np.add(gdf_b4.chi,10),gdf_b4.k_sn,c=gdf_b4.source_key,alpha = 0.4,s=0.2)
plt.scatter(np.add(gdf_b4.chi,10),gdf_b4.k_sn_smoothed,c=gdf_b4.source_key,s=1)
plt.xlabel(r"$k_{sn}$")
plt.ylabel("elevation (m)")

plt.text(0.5, 80, "Snails Cleugh", size=12,
         ha="left", va="top",
         bbox=dict(boxstyle="square",
                   ec=(0., 0., 0.),
                   fc=(1., 1.0, 1.0),
                   )
        )

plt.text(8, 80, "Killmade Burn", size=12,
         ha="left", va="top",
         bbox=dict(boxstyle="square",
                   ec=(0., 0., 0.),
                   fc=(1., 1.0, 1.0),
                   )
        )

ax.set_ylim(0,100)
fig.show()

Okay, remember how I said the data can't be trused at the ends of the tributaries? Well, when we smooth over 25 pixels this messed up data gest smeared over 25 pixels. So in the tributaries you need to ignore the ends on only look at the middle pixels to see the $k_{sn}$ on that tributary. 