# Combine multiple NetCDF outputs into one file

A condensed version of the [NorESM Land Sites Platform](https://github.com/NorESMhub/noresm-land-sites-platform)'s notebook "combine_nc_files.ipynb", adjusted to work for a long simulation (3000 years)

In [3]:
# Name of case folder
case_id = "8cb1b8bb0571a2d7d64a72242a71294b_alp4-3000-gswp3-surfdatmod"
# Recommended to leave unchanged, one level above original output files to avoid long loading times and conflicts
output_path_str = f"../cases/{case_id}/archive/lnd/hist/"
save_path_str = f"/cases/{case_id}/archive/lnd/"

In [4]:
import os
cwd = os.getcwd()
print(cwd)

/home/ubuntu/notebooks


Because there are more files than the regular notebook can handle, break the files into 1000-year chunks and move them into subdirectories.

In [5]:
import shutil

num_files = 36000
num_dest_dirs = 3

# Calculate number of files to move to each directory
files_per_dir = num_files // num_dest_dirs
print("files per dir to be moved:", files_per_dir)


files per dir to be moved: 12000


From the Jupyter notebook welcome page, start a terminal and make new directories like this:
```
ubuntu@12f068803d65:~$ mkdir cases/5623552dd1a8bad16ab1c12e3fc92076_default-alp4-3000/archive/lnd/hist/1
ubuntu@12f068803d65:~$ mkdir cases/5623552dd1a8bad16ab1c12e3fc92076_default-alp4-3000/archive/lnd/hist/2
ubuntu@12f068803d65:~$ mkdir cases/5623552dd1a8bad16ab1c12e3fc92076_default-alp4-3000/archive/lnd/hist/3
```

In [6]:
# Move files to the new destination directories
for i in range(num_dest_dirs):
    src_dir = output_path_str
    dest_dir = os.path.join(output_path_str, str(i+1))
    start_idx = i * files_per_dir
    print("starting at ", start_idx)
    end_idx = start_idx + files_per_dir
    print("ending at", end_idx)

    # Move files to destination directory
    for j in range(start_idx, end_idx):
        year_month_str = f"{1901 + j // 12}-{j % 12 + 1:02d}"
        if j == start_idx:
            print("year_month_str: ", year_month_str)
        src_file = os.path.join(src_dir, f"{case_id}.clm2.h0.{year_month_str}.nc")
        dest_file = os.path.join(dest_dir, f"{case_id}.clm2.h0.{year_month_str}.nc")
        shutil.move(src_file, dest_file)

print("Done!")


starting at  0
ending at 12000
year_month_str:  1901-01
starting at  12000
ending at 24000
year_month_str:  2901-01
starting at  24000
ending at 36000
year_month_str:  3901-01
Done!


Now we need to specify new output paths and run the rest of the notebook as normal. After specifying where the files are stored, we can combine them. The `*` denotes a so-called wild-card, so this example will combine **all** files for history tape 0 (`h0`) contained in `cases/[case_id]/archive/lnd/hist/1` etc. 

Set `NC_OUT_NAME` to a descriptive name for the resulting combined single file. 

In [7]:
output_path_str_1 = f"/cases/{case_id}/archive/lnd/hist/1/"
output_path_str_2 = f"/cases/{case_id}/archive/lnd/hist/2/"
output_path_str_3 = f"/cases/{case_id}/archive/lnd/hist/3/"


In [8]:
hist_tape = "h0" # Name of the history tape to combine into a single file

os.environ['NCFILES_TO_COMBINE'] = f"*{hist_tape}*.nc" # Name of output .nc files to combine
os.environ['NC_OUT_NAME_1'] = f"{case_id}.{hist_tape}.0000-1000.nc" # Descriptive name for the resulting combined file
os.environ['NC_OUT_NAME_2'] = f"{case_id}.{hist_tape}.1001-2000.nc" 
os.environ['NC_OUT_NAME_3'] = f"{case_id}.{hist_tape}.2001-3000.nc" 

In [9]:
os.environ['CASE_ID'] = case_id
os.environ['CASE_HIST_PATH_1'] = output_path_str_1
os.environ['CASE_HIST_PATH_2'] = output_path_str_2
os.environ['CASE_HIST_PATH_3'] = output_path_str_3
os.environ['SAVE_PATH'] = save_path_str

Because of an "Argument list too long" error, try to remove parts of the filenames before concatenating. Remove the first 35 characters with the 'remove' function that needs to be installed first. Then list the first 10 files in the folder to check that it looks right. This needs to be done in a terminal outside the container. 

`sudo apt-get install rename`

`cd ../cases/5623552dd1a8bad16ab1c12e3fc92076_default-alp4-3000/archive/lnd/hist/1` and so on for folders 2,3

`find . -type f -execdir rename 's/^.{35}//' {} \;`

`ls | head -10`

Use cell magic to concatenate the files with `ncrcat`. It can take some time (several minutes) if it needs to combine many files. 3000 years * 12 months = 36 000 files is too much for it to handle in one go, so I want to split it up into three 1000-year chunks to make it manageable.

In [10]:
!ncrcat $PWD/..$CASE_HIST_PATH_1$NCFILES_TO_COMBINE $PWD/..$SAVE_PATH$NC_OUT_NAME_1

In [11]:
!ncrcat $PWD/..$CASE_HIST_PATH_2$NCFILES_TO_COMBINE $HOME$SAVE_PATH$NC_OUT_NAME_2

In [12]:
!ncrcat $PWD/..$CASE_HIST_PATH_3$NCFILES_TO_COMBINE $HOME$SAVE_PATH$NC_OUT_NAME_3