# Looking at the `_chi_data_map.csv` file

In a few of the notebooks we have used `lsdtopotools` to extract a channel profile. The software has a few different ways to extract profiles, but one produces a file with `chi_data_map.csv` in the filename. This file has a selection of useful data in it. Lets load the file and have a look.

In [None]:
import pandas as pd
import geopandas as gpd
import cartopy as cp
import cartopy.crs as ccrs
import rasterio as rio
import matplotlib.pyplot as plt
import numpy as np

We are going to load the csv data, which we generated in the previous notebook. 
We first use a package called `pandas` to do this, and then import that into `geopandas`. 
`pandas` is for dealing with data that you might use excel to look at. So data that might be in a spreadsheet. But it has many powerful data processing options. It was originally developed by a "quant" financial analyst to look at stock trends! But it is a super useful python package for many applications. 

In [None]:
df = pd.read_csv("Sorbas_SRTM30_UTM_chi_data_map.csv")
gdf = gpd.GeoDataFrame(
    df, geometry=gpd.points_from_xy(df.longitude, df.latitude))

# We have to tell the geopandas data what geographic system we are in by using something called an EPSG code. 
# All major geographic projection and transformation system have this code. 
gdf.crs = "EPSG:4326" 

# The head command shows you what is in the file.
print(gdf.head())

Okay, I am not going to explain "chi" yet. Most other things should be self explanatory. The exceptions are `source_key` and `basin_key`:

* Each basin is tagged by a `basin_key` so if you pick this, then you isolate the basin. 
* Each channel has a "source" at the tip of the channel. These have a source key. So if you pick this you have the flow path from that source to the outlet. 


In [None]:
z = gdf.elevation
x = gdf.flow_distance
S = np.gradient(np.asarray(z),np.asarray(x))
gdf["slope"] = S
gdf.head()

In [None]:
plt.rcParams['figure.figsize'] = [10, 10]

# First lets isolate just one of these basins. There is only basin 0 and 1
gdf_b1 = gdf[(gdf['basin_key'] == 0)]
gdf_b1.head()

In [None]:
plt.rcParams['figure.figsize'] = [10, 10]

# First lets isolate just one of these basins. There is only basin 0 and 1
gdf_b1 = gdf[(gdf['basin_key'] == 0)]

# The main stem channel is the one with the minimum source key in this basin
min_source = np.amin(gdf_b1.source_key)
gdf_b2 = gdf_b1[(gdf_b1['source_key'] == min_source)]
#gdf_b2 = gdf_b1

# Now make channel profile plots
z = gdf_b2.elevation
x_locs = gdf_b2.flow_distance
S = gdf_b2.slope

# Create two subplots and unpack the output array immediately
plt.clf()
f, (ax1, ax2) = plt.subplots(2, 1)
ax1.scatter(x_locs, z,s = 0.2)
ax2.scatter(x_locs, S,s = 1)


ax1.set_xlabel("Distance from outlet ($m$)")
ax1.set_ylabel("elevation (m)")

ax2.set_xlabel("Distance from outlet ($m$)")
ax2.set_ylabel("Slope (m/m)")

plt.tight_layout()

In [None]:
gdf['slope_rolling'] = gdf.slope.rolling(10,win_type='hamming').mean()
gdf.head()

In [None]:
plt.rcParams['figure.figsize'] = [10, 10]

# First lets isolate just one of these basins. There is only basin 0 and 1
gdf_b1 = gdf[(gdf['basin_key'] == 0)]

# The main stem channel is the one with the minimum source key in this basin
min_source = np.amin(gdf_b1.source_key)
gdf_b2 = gdf_b1[(gdf_b1['source_key'] == min_source)]
#gdf_b2 = gdf_b1

# Now make channel profile plots
z = gdf_b2.elevation
x_locs = gdf_b2.flow_distance
S = gdf_b2.slope
SR = gdf_b2.slope_rolling

# Create two subplots and unpack the output array immediately
plt.clf()
f, (ax1, ax2) = plt.subplots(2, 1)
ax1.scatter(x_locs, S,s = 1)
ax2.scatter(x_locs, SR,s = 1)


ax1.set_xlabel("Distance from outlet ($m$)")
ax1.set_ylabel("Slope (m/m)")

ax2.set_xlabel("Distance from outlet ($m$)")
ax2.set_ylabel("Rolling Slope (m/m)")

plt.tight_layout()

In [None]:
plt.rcParams['figure.figsize'] = [10, 10]

# First lets isolate just one of these basins. There is only basin 0 and 1
gdf_b1 = gdf[(gdf['basin_key'] == 0)]

# The main stem channel is the one with the minimum source key in this basin
min_source = np.amin(gdf_b1.source_key)
gdf_b2 = gdf_b1[(gdf_b1['source_key'] == min_source)]
#gdf_b2 = gdf_b1

# Now make channel profile plots
z = gdf_b2.elevation
x_locs = gdf_b2.flow_distance
S = gdf_b2.slope
SR = gdf_b2.slope_rolling
A = gdf_b2.drainage_area

# Create two subplots and unpack the output array immediately
plt.clf()
f, (ax1, ax2) = plt.subplots(2, 1)
ax1.scatter(x_locs, SR,s = 1)
ax2.scatter(A, SR,s = 1)


ax1.set_xlabel("Distance from outlet ($m$)")
ax1.set_ylabel("Slope (m/m)")

ax2.set_xlabel("Distance from outlet ($m$)")
ax2.set_ylabel("Rolling Slope (m/m)")
ax2.set_xscale("log")
ax2.set_yscale("log")
ax2.set_ylim(0.001,0.2)

plt.tight_layout()