# Notebook 1: Get started

In this notebook, we will:

1. import basic Python libraries/packages/modules
2. setup up the working directory
3. copy and upzip the D&B model and data into our working directory
4. create some folders for saving D&B outputs and analysis
5. do some trial runs [optional]

# Library import and directory setup

Now we are in the directory of `/home/<session number>/notebooks`, we will change to the working directory `/home/<session number>/tccas_r10043`

**tccas_r10043.tgz** is the tarball includes D&B model source code and necessary data, `tccas` refers to the project name and `r10043` is the model version code.

First step, make sure we're in the user directory, aka ``/home/\<session-token\>``

Use `Python os.chdir` to change it permanently for the notebook, instead of only in the cell

---

Tips:

- In Python '#' means comment, i.e., contents after it will not run
- We use keyword `import` to load/import functions and variables from libraries or other Python scripts.
- There are two ways of importing, 1) `import xxx` or 2) `import xxx from yyy` if you just need some stuff of the library
- You can rename the imported like `import appleInc as app` or `from appleInc import iphone15Pro as iphone`
- Use with `!` to run *ONE* line of Linux bash command
- Or start with `%%` at the beginning of a cell to make the whole cell support bash.


## Configure NetCDF


NetCDF is essential for running D&B, unfortunately it's not pre-installed on Google Colab

In [1]:
# Install NetCDF C library
!apt-get update -qq
!apt-get install -y libnetcdf-dev

# Install NetCDF Fortran library
!apt-get install -y libnetcdff-dev

W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libnetcdf-dev is already the newest version (1:4.8.1-1).
libnetcdf-dev set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 44 not upgraded.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libnetcdff7
Suggested packages:
  netcdf-bin netcdf-doc
The following NEW packages will be installed:
  libnetcdff-dev libnetcdff7
0 upgraded, 2 newly installed, 0 to remove and 44 not upgraded.
Need to get 460 kB of archives.
After this operation, 1,801 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libnetcdff7 amd64 4.5.4+ds-1 [134 kB]
Get:2

Check if we have NetCDF installed

In [2]:
!which nf-config
!nf-config --all

/usr/bin/nf-config

This netCDF-Fortran 4.5.4 has been built with the following features: 

  --cc        -> gcc
  --cflags    -> -I/usr/include -Wdate-time -D_FORTIFY_SOURCE=2

  --fc        -> gfortran
  --fflags    -> -I/usr/include -I/usr/include
  --flibs     -> -L/usr/lib/x86_64-linux-gnu -lnetcdff -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -Wl,-z,now -lnetcdf -lnetcdf -lm 
  --has-f90   -> 
  --has-f03   -> yes

  --has-nc2   -> yes
  --has-nc4   -> yes

  --prefix    -> /usr
  --includedir-> /usr/include
  --version   -> netCDF-Fortran 4.5.4



Also, we need to have the corresponding Python package installed

In [3]:
!pip install netCDF4 --quiet

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.3/9.3 MB[0m [31m61.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m51.0 MB/s[0m eta [36m0:00:00[0m
[?25h

# Working directory configuration

We will first access Google Drive.
The cell below is based on the absolute directory, so you can run it any times you want,
and we will be in the same directory.

In [4]:
import os
from pathlib import Path
from google.colab import drive

drive.mount('/content/drive', force_remount = False)
# root = Path('/content/drive/My Drive')
home_dir = Path('/content')

os.chdir(home_dir)

!pwd

Mounted at /content/drive
/content


# Copy and unzip D&B

Brilliant! We have made our first step perfect!

Now, let's start modelling from scratch.
There are three bash lines after the `%%bash` magic command to achieve the following separately:
1. Remove the 'tccas_r10043' folder if it exists
2. Copy the shared tarball into user directory
3. Unzip it

---

Note:

The line 1 is just to make sure we start everything from the beginning.

Right click the mouse on the output and select `Clear Outputs` for the cell with overwhelming information you want them disappear.

In [6]:
%%bash

rm -rf '/content/tccas_r10043.tgz'
cp '/content/drive/My Drive/tccas/tccas_r10043.tgz' '/content/tccas_r10043.tgz'
tar -xvzf tccas_r10043.tgz
cd /content/tccas_r10043/; ln -fs config/mk.compile-gfortran mk.compile

tccas_r10043/LICENSE
tccas_r10043/Makefile
tccas_r10043/README
tccas_r10043/adstack/
tccas_r10043/adstack/adStack.c
tccas_r10043/adstack/LICENSE.html
tccas_r10043/adstack/Makefile
tccas_r10043/adstack/adStack.h
tccas_r10043/adstack/adComplex.h
tccas_r10043/config/
tccas_r10043/config/mk.compile-gfortran
tccas_r10043/diagout/
tccas_r10043/driver/
tccas_r10043/driver/runassi.f90
tccas_r10043/driver/runmodel.f90
tccas_r10043/driver/runcost.f90
tccas_r10043/driver/woptimum.f90
tccas_r10043/driver/setconfig.f90
tccas_r10043/forcing/
tccas_r10043/forcing/FI-Sod_staticforcing.nc
tccas_r10043/forcing/ES-LM1_dynforcing-era5_20090101-20211231_with-lwdown.nc
tccas_r10043/forcing/FI-Sod_dynforcing-era5_20090101-20211231_with-lwdown.nc
tccas_r10043/forcing/ES-LM1_staticforcing.nc
tccas_r10043/forcing/FI-Sod_dynforcing-insitu_20090101-20211231_with-insitu-lwdown.nc
tccas_r10043/forcing/ES-LM1_dynforcing-insitu_20140401-20220930_with-insitu-lwdown.nc
tccas_r10043/mini/
tccas_r10043/mini/lbfgsb.f
tcca

We have a folder called `tccas_r10043`.

For the other notebooks, you won't need to run the above cells again.
I have modulised the setup and configurations to simplify everything.

# Create folders for saving our modelling outputs

Awesome! Now let's create two folders `resources` and `analysis` to separately save data after modelling and figures/tables of our analysis

We 1) go into the `tccas_r10043` whose directory is our **working directory**;
and 2) create the new folders under it.


In [7]:
os.chdir(home_dir.joinpath("tccas_r10043"))

home_dir.joinpath("tccas_r10043/resources").mkdir(exist_ok=True)
home_dir.joinpath("tccas_r10043/analysis").mkdir(exist_ok=True)

If you don't see the two new folders, refresh the file system by clicking the button on left up corner toolbar

Now you are ready to go!

# Trial run [Optional]

This section includes a D&B model *forward run* and a *data assimilation* trial. \
Feel free to have a try and play. \
The *data assimilation* process is a little bit time consuming, click the square button on the toolbar above to stop running whenever you want, enjoy!


## Forward run

This subsection include two lines:
1) forward run with specific site name
2) copy the diagnostic outputs into the *resources* folder we created above.

---

Note:

The `scratch` target makes sure that the configuration previously used is removed. `ES-LM1` means the selected site, the other option is `FI-Sod`.\
The system has also stored a vector of synthetic observations in binary format in the file `obs.b` that it placed in the main directory.
We will use the `-s` option in the `make` command to avoid the repeated output from the compilation.

In [8]:
%%bash

make scratch xmodel -s DOMAIN=ES-LM1
cp -a diagout/. resources/ES-LM1

INFO::  prepending +++/content/tccas_r10043/util/pylcc+++ to module search path
2025-09-22 04:48:37,003 INFO::<module>:: model_setup.py::PROGRAM START::2025-09-22T04:48:37.003140
2025-09-22 04:48:37,003 INFO::<module>:: command-line: util/model_setup.py dimensions_setup --outname src/dimensions.f90 --calendar=gregorian --dates 20150101 20211231 input/staticforcing.nc input/dynforcing.nc
2025-09-22 04:48:37,021 INFO::main:: START requested subcommand +++dimensions_setup+++...
2025-09-22 04:48:37,022 INFO::subcmd_dimensions_setup:: determine temporal/spatial settings from forcing file ***input/staticforcing.nc***
2025-09-22 04:48:37,054 INFO::subcmd_dimensions_setup:: static forcing file contains grid-cells with 2 different PFTs (---[3 9]---)
2025-09-22 04:48:37,055 INFO::subcmd_dimensions_setup:: checking consistency of selected period with dynamic forcing provided (input/dynforcing.nc)
2025-09-22 04:48:37,059 INFO::subcmd_dimensions_setup:: simulation period forseen 20150101T01--202201

## Assimilation run

This subsection include four lines:
1) assimilate observation and adjust parameters to match observations
2) replace the prior parameters with the posterior parameters
3) forward run the D&B model with posteriors
2) copy the diagnostic outputs into the *resources* folder we created above.

---

Note:

The `clean` target removes object files produced from the previous run.\
Final parameter set that it identified as the cost function minimum in binary (`xopt.b`).\
Run `rm -rf tccas_r10043` if you want to delete the training folder later on (Warning: be careful).
The minimum is searched iteratively. For this training we have limited the number of iterations to 10 in the formal assimilation notebook.

In [9]:
%%bash

make scratch xassi -s DOMAIN=ES-LM1
cp xopt.b x.b
make clean xmodel -s DOMAIN=ES-LM1
cp -a diagout/. resources/ES-LM1_posterior

INFO::  prepending +++/content/tccas_r10043/util/pylcc+++ to module search path
2025-09-22 04:49:03,215 INFO::<module>:: model_setup.py::PROGRAM START::2025-09-22T04:49:03.215304
2025-09-22 04:49:03,215 INFO::<module>:: command-line: util/model_setup.py dimensions_setup --outname src/dimensions.f90 --calendar=gregorian --dates 20150101 20211231 input/staticforcing.nc input/dynforcing.nc
2025-09-22 04:49:03,220 INFO::main:: START requested subcommand +++dimensions_setup+++...
2025-09-22 04:49:03,220 INFO::subcmd_dimensions_setup:: determine temporal/spatial settings from forcing file ***input/staticforcing.nc***
2025-09-22 04:49:03,226 INFO::subcmd_dimensions_setup:: static forcing file contains grid-cells with 2 different PFTs (---[3 9]---)
2025-09-22 04:49:03,227 INFO::subcmd_dimensions_setup:: checking consistency of selected period with dynamic forcing provided (input/dynforcing.nc)
2025-09-22 04:49:03,228 INFO::subcmd_dimensions_setup:: simulation period forseen 20150101T01--202201

adStack.c: In function ‘adStack_showPeakSize’:
 1065 |   printf("Peak stack size (%1li blocks): %1llu bytes\n",
      |                                          ~~~~^
      |                                              |
      |                                              long long unsigned int
      |                                          %1lu
 1066 |          maxBlocks, maxBlocks*((long int)BLOCK_SIZE)) ;
      |                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                              |
      |                              long unsigned int
ar: creating ../libadstack-gfortran.a
ar: creating ../libmini-gfortran.a
