### This notebook has been modified by Nancy Williams to run inside a Binder. This involved changing some of the file structures and omitting verification runs because those have already been tested.

# Introduction to the FluxEngine tutorials
These tutorials are written using Jupyter, which allows the creation of notebooks that combine text embedded Python and/or command line code in a format which can be viewed from a web browser. This allows code to appear alongside their explanations and you can run code and modify examples live from within the web browser.

Embedded Python code looks like this:

In [5]:
#this is a python example
print("Hello Jupyter!")

Hello Jupyter!


To run this example, click in the cell (box) containing the example code and then click the "Run" button on the tool bar at the top.

![run_button.png](attachment:run_button.png)

Any code in the selected cell will run and any output will appear below it - in this case the words "Hello Jupyter!" These code cells are editable allowing you to modify parts of the code and test your changes by re-running them. Try modifying the code above to print a different message and run it again.

FluxEngine has several command line tools that provide different functionality. We will be running these tools through Jupyter notebooks in order to keep everything together in a single document, but usually you would use Command Prompt (Windows) or Terminal (MacOS and Linux) to run them. In these notebooks, command line commands are distinguished from Python code by an exclamation mark (`!`) prefix before the command. For example:

# Importing toolboxes
This imports the matplotlib.pyplot toolbox and nicknames it plt. We will need this later.

In [None]:
import matplotlib.pyplot as plt; #Used later for plotting

# Getting setup with FluxEngine
Hopefully you have already downloaded and installed FluxEngine prior to this workshop. It you have not you should do this now. To download the latest version go to the git hub repository [here](https://github.com/oceanflux-ghg/FluxEngine/).

FluxEngine uses several third party libraries and tools. A script is included in the download to automatically install these dependencies for MacOS and Linux, and installation instructions are included for Windows in the instructions (see section 3 of [the instructions](https://github.com/oceanflux-ghg/FluxEngine/blob/master/FluxEngineV3_instructions.pdf)). If you have not done this already you should do this now.

# Verifying FluxEngine has been installed correctly
It is important to verify that FluxEngine, and its dependencies, have been installed correctly. This will prevent problems arising later, and gives us confidence that the flux calculation is being performed correctly by running a known scenario and comparing it to previously published data. This is all handled automatically using a verification script. Do this by running the command below. It will take about ten minutes to run and on most systems it will provide real-time output showing the script's progress. On some systems this real-time output will not appear and you will instead get all the output displayed once the script finishes. You can continue reading the rest of the tutorial while you wait but do not run any code until the verification has finished running. Once it has finished check to make sure a message saying "Validation successful! All values are within threshold limits" is displayed.

Note: Even if you have already ran the verification script, if you want to create the plot at the end of this tutorial you will need to run it again. This is because the tutorial assumes there is a copy of the verification data in the current working directory. If you do not want to wait for the verification script to run again (about ten minutes), that is no problem but the final plot will not display.

In [None]:
#Run the verification script.
!fe_verify_socatv4.py

<div class="alert alert-block alert-info">
<b>Command line help</b> - If you have never used the command line to run something before, do not worry, it is easy. Command Prompt (on Windows, Terminal on MacOS and Linux) provides a way to type commands to your operating system instead of using the graphical user interface. This means you can run tools which do not have graphical user interfaces and also gives you a way to automate and document the analysis steps you have done (e.g. by saving the commands to a file). Lots of tools (especially free/open source scientific ones) do not have graphical user interfaces (including FluxEngine, for now at least) simply because they can be very time consuming to build and maintain.
    
The anatomy of a command line command looks like this:
```
program_to_run options
```
Where `program_to_run` is the name or path to the program or tool you want to run, and `options` is a list of options (sometimes prefixed with a dash `-` or double-dash `--`) containing the parameters you are using with the command. In the case of the command above, `!python fe_verify_socatv4.py`, the `!` character tells the Jupyter notebook that what follows is a command line instruction. `python` is the name of the program to run (the Python interpreter), and we specify a single option `fe_verify_socatv4.py`. Here, the option is the path of the script we want to run. The path is relative to our working directory, which we have already set to be folder we made to keep all the our tutorial files together. The whole command therefore runs FluxEngine's verification script using the Python interpreter. All the command line tools which come with FluxEngine are prefixed with `fe_` to help to identify them.
</div> 

# Creating a custom configuration file
For the remainder of this tutorial, we will create a new run configuration based on the verification run but with a number of changes and then visualise the output. To do this, you will first need to understand FluxEngine configuration files.

FluxEngine uses plain text configuration files to specify input data and to select options that change how the flux calculation is performed. We will be using the configuration file from the verification run as the basis for our new FluxEngine run, but it is important not to modify the original file otherwise future verification runs may not work. You could copy the configuration file with your file browser, but here is some Python code to do that for you:

In [None]:
from fluxengine.core.fe_setup_tools import get_fluxengine_root;
from os import path, getcwd;
import shutil;
try:
    shutil.copy(path.join(get_fluxengine_root(), "configs", "socatv4_sst_salinity_gradients-N00.conf"), path.join(getcwd(), "custom_config.conf"));
    print("Configuration file successfully copied to:", path.join(getcwd(), "custom_config.conf"));
except Exception as e:
    print("There was a problem copying the configuration file. Check your working directory is correct.");
    print(e);

## Anatomy of a configuration file
Now that we have made a new configuration file based on the verification run, lets open it. If you kept the default directory, you can find it in the `FluxEngineTutorials/tutorial_01/custom_config.conf` in your home directory. Use a text editor such as Notepad++ to open this file and view its contents.

<div class="alert alert-block alert-warning">
<b>Opening config files: </b> Configuration files are plain text formatted files and can be opened in software such as Notepad, Notepad++ or TextEdit. You should not use Microsoft Word or other word processing software to edit configuration files because they sometimes insert invisible formatting characters to files which can prevent the FluxEngine interpretting the file correctly. On some older Windows systems configuration files may display all on one line. The easiest way to avoid this is by [installing Notepad++](https://notepad-plus-plus.org/download/v7.7.html) (a free lightweight text editor), and using this to edit configuration files.
</div>

Configuration files contain a list of options/setting names assigned to values using the format `option = value`. The order that options are defined does not matter but it is often useful to group related options together. Any text preceeded by a `#` symbol is a comment. Comments provide helpful information about what a particular setting, or group of settings, does. It is not necessary to understand all of the settings at this point, but see if you can identify where the following settings are defined by reading the comments:
 - Where is input data specified? What types of input data are supplied (e.g. salinity data)?
 - Where is the flux calculation selected? Which flux equation (e.g. the 'rapid' or 'bulk') does the verification run use?
 - Where is gas transfer velocity parameterisation specified?
 - Where is the output directory set? Optional: Locate the output directory and open one of the files (see the information box below if you need for help opening netCDF files).

If you want to find out more information about a particular setting, you can look at the description in [the instructions](https://github.com/oceanflux-ghg/FluxEngine/blob/master/FluxEngineV3_instructions.pdf), (see section 7.2).

<div class="alert alert-block alert-info">
<b>Opening netCDF files:</b> FluxEngine uses netCDF formatted files for input and creates new netCDF formatted files to store output. The easiest way to open netCDF files is to use Panoply (or similar software) which provides a graphical user interface to open, view and plot data in netCDF format. You can download Panoply for Mac, Windows or Linux [here](https://www.giss.nasa.gov/tools/panoply/). Alternatively, you can use a programming language such as Python, R or Matlab to read netCDF files and plot them. This is beyond the scope of these tutorials, and it is recommended that you use Panoply unless you already have experience reading and plotting netCDF files.
</div> 

## Modifying the config file

### Specifying input data
FluxEngine requires a minimum of six types of input data to perform the flux calculation. These are:
- Sea surface temperature (skin, sub-skin or both)
- Salinity
- Air pressure
- Wind speed
- Aqueous gas (partial pressure, fugacity or concentration)
- Atmospheric gas (partial pressure, fugacity or concentration)

Each input dataset is described as a 'data layer', within FluxEngine. These are supplied as netCDF (.nc) files. NetCDF files can store several geospatial variables together in one file along with metadata (description, units, expected range, etc.) and information about the dimensions (spatial and temporal coordinates). The figure below shows a netCDF file opened in Panoply with the list of variables circled on the left and the dimensions that the data uses on circles on the right.

![netcdf_anatomy.png](attachment:netcdf_anatomy.png)

If you click on a variable you can see meta data about that variable, such as the units and dimensions. In the below picture we have selected the `sst_skin_mean` variable and we can read the metadata about the variable on the left hand side (units and dimensions are highlighted).

![netcdf_anatomy_pt2.png](attachment:netcdf_anatomy_pt2.png)


To define an input data layer in the FluxEngine configuration file we need to specify a file path and a 'product' name. The product name is just the name of the variable within the netCDF file (e.g. `sst_skin_mean` in the above example). An example from the current configuration file that specifies the wind speed input data is
```
windu10_path = <FEROOT>/data/verification_data/globwave/<YYYY>/<YYYY><MM>_OCF-WSP-GLO-1M-100-MGD-GW-v2.nc
windu10_prod = wind_speed_cor_mean
```

We are defining the file path to our wind speed netCDF file using `windu10_path` and the product name using `windu10_prod`. Notice that tokens are used to represent the year and month (`<YYYY>` and `<MM>`). These are substituted for the numerical representation of the current year and month when FluxEngine runs, allowing different input files to be selected for different time steps. The `<FEROOT>` token is substituted for the installation directory of the FluxEngine package, and is used by internal configuration files to access data that comes with FluxEngine (in this case the data used to run the verification).

For our custom flux calculation we will use a salinity dataset from the [World Ocean Atlas (WOA)](https://www.nodc.noaa.gov/SatelliteData/sss/). To do this we need to specify the location of the files containing the new salinity data, and the product name of the salinity variable within those files. Find the section of the configuration file which defines the salinity input and modify them to look like this:

```
salinity_path = copied_data/WOA_salinity/surface_woa18_A5B7_s<MM>_01.nc
salinity_prod = salinity_mean
```

Now FluxEngine will look for salinity data in the relative path (i.e. from your current working directory). It will expect the data to be in a folder called `copied_data`. We need to create this folder and put the salinity data in there. The salinity data is included with FluxEngine, so we will just use a Python script to copy it into our working directory (alternatively you could do this using your file browser).


In [None]:
from fluxengine.core.fe_setup_tools import get_fluxengine_root; #Gets filepath to fluxengine root directory
from os import mkdir, path, getcwd; #cross-platform filepath manipulation and access to current working directly
import shutil;

try:
    shutil.copytree(path.join(get_fluxengine_root(), "tutorials", "01_introduction", "data"), path.join(getcwd(), "copied_data"));
    print("WOA salinity data successfully copied to:", path.join(getcwd(), "copied_data"));
except Exception as e:
    print("There was a problem copying the the WOA salinity data. The folder 'copied_data' may already exist. If so, try deleting it before rerunning this cell.");
    print(e);

Now there will be a new directory named `copied_data` in the tutorial's working directory. Inside will be a copy of the WOA salinity data. You will also notice that there is a separate file for each month. So that FluxEngine will use the correct month we used the `<MM>` token to specify the numerical representation of the month when defining the path to the data.

If you open one of the WOA salinity data netCDF files using Panoply, there are three variables: `lon`, `lat` and `salinity_mean`. The first two define the lon-lat grid and the last contains the mean monthly sea surface salinity. We want to use the sea surface salinity, which is why we set `salinity_prod = salinity_mean`.

<div class="alert alert-block alert-info">
<b>Extra info:</b> There is a general pattern to specifying data layers, which is `datalayername_suffix`. You have already seen the `path` and `prod` suffixes used to define file paths and product names. Others can be used to define other properties of a data layer. For example, minimum and maximum allowed ranges for the input data, preprocessing such as unit conversions, or whether the input file has a temporal dimension. [The manual](https://github.com/oceanflux-ghg/FluxEngine/blob/master/documentation_FluxEngineV4.pdf) (section 7.6.1) provides more information on using suffixes to change the way specific input data layers are handled.
</div>

### Selecting a gas transfer velocity parameterisation
We will use a more recent wind-based gas transfer velocity (k) parameterisation too. Find the line that starts `k_parameterisation`. It is currently set to use the parameterisation described by [Nightingale et al., 2000](https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/1999GB900091). There are lots of different gas transfer velocity parameterisations built into FluxEngine and you can use Python to create new ones. The most common parameterisations are listed in the comments of the configuration file and we will stick to using one of these for now. [Wanninkhof (2014)](t) is a more recent parameterisation which is applicable to a wider range of ocean conditions. To configure FluxEngine to use this parameterisation set `k_parameterisation = k_Wanninkhof2014`.

We will also update the Schmidt number parameterisation to use the relationship described in the [same paper](https://aslopubs.onlinelibrary.wiley.com/doi/pdf/10.4319/lom.2014.12.351). Currently the configuration file does not define a Schmidt number parameterisation and so it will use the default version described by [Wanninkhof (1992)] which is a little outdated compared to the 2014 version. Add a new line that reads `schmidt_parameterisation = schmidt_Wanninkhof2014`. Now our gas transfer parameterisation and Schmidt number calculation will be consistent. Note that it does not matter where you add this line, but it might help you find it again if you add it below the k parameterisation line.

### Setting the output directory
Finally we need to change the output directory, otherwise we will overwrite the output produced by the verification run. Set the `output_dir` setting in the configuration file to `tutorial_output`. When FluxEngine runs our custom configuration file it will create a new subdirectory in the working directory called `tutorial_output` and store all of the output files here. By default, FluxEngine will sort these output files into further subdirectories based on year and month. We are only going to run our new configuration file for a single month so we do not really want all these extra directories. We can overwrite the output directory structure by adding the following line:

`output_structure =`

We purposefully leave the value empty to tell FluxEngine that we do not need any additional directories to be created. Now any output files will be put in the root output directory (i.e. the one defined in `output_dir`).

Make sure you save the configuration file. For more information about each configuration file option you can look in section 7.2 of [the manual](https://github.com/oceanflux-ghg/FluxEngine/blob/master/FluxEngineV4_instructions.pdf).

## Running FluxEngine using our custom configuration file
All that is left to do now is to run FluxEngine with the new configuration file. There are two ways to run FluxEngine. The first is using the command line tool and specifying a configuration file, a start date and a stop date as options. This is the approach we will use because it does not assume knowledge of Python and because is is flexible enough to be used for most analyses. 

The second method is to import FluxEngine as a Python module and write a custom script to drive it, i.e. by using the `run_fluxengine` function from the `fluxengine.core.fe_setup_tools` module. The main advantage of using the Python module is that it allows FluxEngine to be embedded in other software, e.g. as part of a larger model or analysis pipeline. It is also possible access lower-level functions and gain a lot of flexibility over how FluxEngine runs. For example, you can hook into the way FluxEngine reads configuration files and make small runtime modifications to the parameters so that a single configuration file can be used as a template to perform a series of similar runs (e.g. as part of a sensitivity analysis).

The command line tool can be run by running `python fe_run.py path/to/configuration/file.config -s startdate -e enddate` in your Command Prompt (Windows) or Terminal (MacOS/Linux) window. The first part of the command `python fe_run.py` tells the computer to use Python to run the command line tool which will in turn set up and run FluxEngine. The next part of the command specifies the path of the configuration file to use. Next the `-s` and `-e` specify the start and end dates, which must be provided in the one of the following formats: `YYYY`, `YYYY-MM-DD` or `YYYY-MM-DD hh:mm`. For more information on using the `fe_run.py` tool you can run it with the `-h` comment ('h' for help) or see [the instructions](https://github.com/oceanflux-ghg/FluxEngine/blob/master/FluxEngineV4_instructions.pdf) (section 5.1).

<div class="alert alert-block alert-info">
<b>The `!` prefix:</b> Remember that code which starts with `!` is interpretted as a command line instruction, and is equivalent to entering the command (without the `!`) into your command prompt (Windows) or terminal (Mac/Linux) window.
</div>

Run the command below to perform our new flux calculation. Notice, to reduce execution time we have set the end date to the end of January so that we only calculate the fluxes for a single month.

In [None]:
!fe_run.py "custom_config.conf" -s "2010-01-01" -e "2010-01-31"

<div class="alert alert-block alert-info">
<b>Command line help (part 2)</b> - In the above command <tt>!fe_run.py "custom_config.conf" -s "2010-01-01" -e "2010-01-31"</tt> we are running a Python script (<tt>fe_run.py</tt>) that runs FluxEngine. The script takes a number of options to tell it what and how to run. These are specifies as a list of parameters, the first (and only mandatory) of these is the path to the FluxEngine configuration file (in this case it is in the root of the tutorial's working directory, so we just specify the file name: <tt>custom_config.conf</tt>). Next are two optional parameters which specify the start and end dates. Optional parameters start with a <tt>-</tt> or <tt>--</tt> and often appear as name value pairs (e.g. the name <tt>-s</tt>, indicating start date, appears before the value <tt>2010-01-01</tt>). They are optional because they have default values and so if you do not explicitly specify them FluxEngine will use the default values. The default values can be found in the help (by running <tt>python fe_run.py -h</tt>).
</div> 

## Visualising output
If all has gone well you will see a log of the FluxEngine run and a message at the end saying 'completed successfully'. If you look at the output directory `tutorial_output` you should see a file called `OceanFluxGHG-month01-jan-2010-v0.nc`. If you have Panoply installed, you can open this file directly and look at its contents.

![Panoply.png](attachment:Panoply.png)

If you double click a variable name (e.g. 'OF' - the air-sea gas flux) Panoply will give you the option to plot it. Spend some time exploring the FluxEngine outputs. Outputs that may be of interest are the air-sea gas flux (OF), the gas transfer velocity (OK3) and the interface and aqueous gas concentrations (OIC1 and OSFC, respectively). You will also notice that a copy of the input datasets are provided in the output file. This makes it convenient to see what values were used to make the calculation.

Plotting in Panoply is fine, but it is not very flexible. For example, if we want to compare the calculated CO<sub>2</sub> fluxes between the verification run and our custom run, we would not be able to do this directly in Panoply. You can export data to a .csv file by right-clicking on a variable and choosing one of the export options. Then you could import the data into your preferred data visualisation or analysis software and perform any operations you like. However, since we are already running a Python interpretter in this Jupyter notebook, we might as well do this directly with Python! Lets compare the air-sea CO<sub>2</sub> fluxes between the verification and custom runs now, and do not worry if you do not follow the Python code - R and Matlab both provide their own libraries for importing data stored in netCDF files.

In [None]:
from netCDF4 import Dataset #allows reading of netCDF files
import matplotlib.pyplot as plt #for plotting data
import numpy as np #matrix manipulation

#read in the FluxEngine output files
verificationNetCDF = Dataset("verification_output/verification_socatv4_sst_salinity_N00/2010/01/OceanFluxGHG-month01-jan-2010-v0.nc", 'r') #Read the January data from the verification run
verificationFlux = verificationNetCDF.variables["OF"][:] #Extract the flux data from the 'OF' variable in the netCDF file
customNetCDF = Dataset("tutorial_output/OceanFluxGHG-month01-jan-2010-v0.nc", 'r') #Read the January data from the custom run
customFlux = customNetCDF.variables["OF"][:] #Extract the flux data from the 'OF' variable in the netCDF file

#calculate difference between the new run and the verification
fluxDifference = customFlux - verificationFlux #Calculate the change in calculated flux
fluxDifference = np.squeeze(fluxDifference) #remove any dimensions of size 1 (in this case, the time dimension)

#plot the differences in flux
plt.figure(figsize=(8, 8))
plt.imshow(fluxDifference)
cbar = plt.colorbar(orientation="horizontal")
cbar.set_label(r"Difference in flux (gC m$^{-2}$ s$^{-1}$)", fontsize=16)
plt.show()


Not the nicest looking figure, but good enough for an exploratory look at the results! You can see that the difference between the calculated fluxes is largest around regions where there are typically high wind speeds (open the `WS1_mean` variable in the FluxEngine output using Panoply if you want to visualise the wind speed data). The change in salinity data is unlikely to have a big effect, so it is likely that these differences are due to the updated gas transfer velocity parameterisation.

# Next tutorial
Next, we'll look at how to use FluxEngine to calculate gas fluxes using in situ data. Return to the Jupyter dashboard and open the notebook stored in [Tutorials/02_using_insitu_data/02_using_insitu_data.ipynb](../02_using_insitu_data/02_using_insitu_data.ipynb).