# Tutorial 1: Formatting data

### Formatting data to feed into riserfit
This scripts gives an example of how riser profiles can be extracted from a DEM. The files necessary for this tutorial are located in the Tutorials\Data folder. riserfit provides one big wrapper function to build up riser profiles from DEMs: `rf.construct_z_profiles_from_centerpoints()` should work for most applications. It requires an input DEM file and **a .csv file containing x and y centerpoints at the desired riser locations**. Two examples of such files are supplied in the Data\Risers\Midpoints directory. Running `rf.construct_z_profiles_from_centerpoints()` generates a list of pandas dataframes, one for each profile, and a list of names to identify the dataframes.

The next step is convert this somewhat annoying list of dataframes into a `rf.Riser` instance. This is the main class of the riserfit package. It has a plethora of methods that work on arbitrarily long lists of riser profiles and thus take a lot of `for`-looping overhead away from the user. For ease of use, this `rf.Riser` instance can be saved to a compressed file and loaded directly, avoiding the need to repreatedly extract profiles from the DEM. It also saves all your progress: if you have calculated linear diffusion ages once and save the `rf.Riser` instance, you can load the instance and access the diffusion ages directly!

To make use of this fact, we add some fancy `try`-`except` to our script: If our little script can find the `Riser` instances, there's no need to run `rf.construct_z_profiles_from_centerpoints()`. **Just remember: If you want to change parameters but keep the instance names the same, the script won't actually do anything. You need to delete the old instances first - or rename them!**

In [None]:
# Some imports
import riserfit as rf
import os
import matplotlib.pyplot as plt

In [None]:
# set working directory and parameters
os.chdir(r"C:\\Users\\Lennart\\lennartGit\\personal\\riserfit\\Tutorials")

RASTERNAME = r"\\Data\\DEM\\terraces.tif"
SPACING_DX = 0.5 # in m
N_POINTS = 80 # points projected out from the centerpoint. Total n of points is 2*N_POINTS + 1
SWATH_NUMBER = 4 # number of parallel lines used to average each profile
SWATH_SPACING = 1 # in m

# Relative paths to the .csv files containing your midpoint x, y data
terraces = ["T7", "T3"]
fnames = [f"\\Data\\Risers\\Midpoints\\midpoints_{t}.csv" for t in terraces]

instance_list = []
try: # try to find the riser instances!
    
    for t in terraces:
        
        instance_list.append(
            rf.load_instance(f"\\Data\\Risers\\Instances\\{t}_Riser_instance.gz") # this is the default instance name
        )
    print("Found existing Riser instances!")
    
except: # Do the heavy lifting
    
    for t, fn in zip(terraces, fnames):
    
        # create a list of pandas dataframes, one for each riser profile
        dfs, names = rf.construct_z_profiles_from_centerpoints(
            rasterpath=RASTERNAME, # relative path from current wd
            pointfilepath=fn, # relative path from current wd
            n_points=N_POINTS,
            spacing=SPACING_DX,
            swath_number=SWATH_NUMBER,
            swath_spacing=SWATH_SPACING,
            smooth_first=True, # this is only used to calculate the steepest gradient, it doesn't affect the profiles!
            method="linear", # interpolation method to extract elevation values
            savedir=f"\\Data\\Risers\\Profiles\\{t}\\" # where to save the created csv files
        )
        
        # create Riser instance from the list of dataframes
        riser = rf.initialize_riser_class(
            dfs, 
            names, 
            "x", # the column name in the df containing x data (easting for UTM)
            "y", # the column name in the df containing y data (northing for UTM)
            identifier=t # the "name" of the Riser instance. Used for saving to file
        )
        # save to .gz file, note that this is now an instance method!
        riser.save_instance(
            r"\\Data\\Risers\\Instances\\"
        )
        instance_list.append(riser)

If you execute the cell above two times, it will only display the processing message the first time. The second time it should display the "Found existing Riser instances" message. This setup drastically improves execution times for datasets with many riser profiles and for large DEMs.

### Data formatting inside riserfit
Within riserfit, all data is stored attached to each Riser instance in the form of parameters. Every parameter is a list, every list has the length of `however many profiles there are`. The contents of various lists can be very different: `floats`, `strings`, or more `lists` and/or `np.ndarrays`

In [None]:
# Have we actually created Riser instances? Let's check out the output from the cell above. 
#It is a list that contains two Riser instances...
print(instance_list)

# For now, let's just look at the T3 instance. The second is structured in the same way.
riser = instance_list[1]
# It has many properties, such as the profile names, the riser height, diffusion age, etc.
print(riser.name) # Profile names
print(riser.best_a) # "Best-fit" riser height (from midpoint to crest or toe of riser)
print(riser.best_kt) # "Best-fit" linear diffusion age
# You will notice that the last two parameters are just empty lists. That is because we haven't actually
# done the linear diffusion fitting that would calculate both a and kt. The parameters are just pre-allocated.

# The only real data that we have at the moment are x and z data from each profile. In riserfit these are referred to
# as d (the distance, or x) and z (the elevation). 
print(riser.d[0]) # we just look at the first entry of the d-list. It is an entire np.ndarray!
print(riser.z[0]) # this is also a np.ndarray!

# Again, each entry in any of the parameters of the Riser instance refers to a single profile.
# If we want to plot a profile, we can do it this way:

# Let's look at a good and a bad example...
nice_id = 4 # nice profile
not_nice_id = 0 # ugly profile
plt.scatter(riser.d[nice_id], riser.z[nice_id], label="nice profile", s=4)
plt.scatter(riser.d[not_nice_id], riser.z[not_nice_id], label="not so nice profile", s=4)
plt.legend()
plt.show()

### Exporting data from riserfit
Once you have done all your calculations, calculated riser heights, far-field slopes, and diffusion ages, you may want to export all data into a nice .csv file. riserfit also has a solution for this: `rf.Riser.build_Riser_instance_dataframe()`. This function generates a pandas dataframe that can then be exported or otherwise manipulated using the pandas package.

Not all data is formatted in a way that allows for extraction into a dataframe. For example, the profile names are simple strings that fit into a csv cell, but the $d$ and $z$ data for each profile are arrays! `build_Riser_instance_dataframe()` automatically excludes data that is not in a sensible format, e.g. lists, np.ndarrays, or dicts. As a result, our exported dataframe is quite empty. We haven't generated any data after all!

To show a better example of how this function works, we can add some dummy data! This also showcases a useful functionality: `rf.Riser.add_parameter()`, which creates a new attribute of the desired name.

In [None]:
df = riser.build_Riser_instance_dataframe()
print(df.head())

# add some dummy data
important_data = list(range(0, len(riser.name)))
riser.add_parameter("important_parameter", important_data)

df = riser.build_Riser_instance_dataframe()
print(df.head())
