# Combine multiple NetCDF outputs into one file

To analyse multiple output `.nc` files from a simulation (open the `cases/[case_id]/archive/lnd/hist/` folder to double-check), it is recommended to concatenate these into a new single file to simplify and speed up the data analysis. For 1500 years of simulation, there are too many files to concatenate in one go. Set up three subfolders and move 500 years into each. The last 500 years of results will be used for further analysis in other notebooks. 

Text cells in this notebook follow Markdown syntax, while code cells are a mix of Python and bash code. We will use [cell magic commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html) to call an external command line tool.

In [1]:
# Name of case folder
case_id = "179e6a7e02c1245e9185efe9dbd92dde_alp4-1500-cosmo-warmed"
# Set paths to where files are stored and where to save the concatenated files.
# one level above original output files to avoid long loading times and conflicts
output_path_str = f"../cases/{case_id}/archive/lnd/hist" # 1 for first 500 years, 2 for last 1000
save_path_str = f"/cases/{case_id}/archive/lnd/"

**********************************************

list the number of files in a directory with `ls -1 | wc -l`. 


In [2]:
import shutil
import os

num_files = 18000
num_dest_dirs = 3

# Calculate number of files to move to each directory
files_per_dir = num_files // num_dest_dirs
print("files per dir to be moved:", files_per_dir)

files per dir to be moved: 6000


In [3]:
import os
cwd = os.getcwd()
print(cwd)

/home/ubuntu/notebooks


In [4]:
# Move files to the new destination directories
for i in range(num_dest_dirs):
    src_dir = output_path_str
    dest_dir = os.path.join(output_path_str, str(i+1))
    start_idx = i * files_per_dir
    print("starting at ", start_idx)
    end_idx = start_idx + files_per_dir
    print("ending at", end_idx)

    # Move files to destination directory
    for j in range(start_idx, end_idx):
        year_month_str = '{:04d}-{:02d}'.format(1 + j // 12, j % 12 + 1)
        if j == start_idx:
            print("year_month_str: ", year_month_str)
        src_file = os.path.join(src_dir, f"alp4-1500-cosmo-warmed.clm2.h0.{year_month_str}.nc")
        dest_file = os.path.join(dest_dir, f"alp4-1500-cosmo-warmed.clm2.h0.{year_month_str}.nc")
        shutil.move(src_file, dest_file)

print("Done!")

starting at  0
ending at 6000
year_month_str:  0001-01
starting at  6000
ending at 12000
year_month_str:  0501-01
starting at  12000
ending at 18000
year_month_str:  1001-01
Done!


Now you have specified where the files are stored, and we can combine them. The `*` in the following cell denotes a so-called wild-card, so this example will combine **all** files for history tape 0 (`h0`) contained in `cases/[case_id]/archive/lnd/hist/`. Adjust this if you want to combine outputs for a different history tape that you may have included when creating a case in the user interface. If you have several history tapes, you should repeat this whole notebook for additional tapes and give them meaningful names.

Set `NC_OUT_NAME` to a descriptive name for the resulting combined single file. The given example uses the case ID and a model simulation period of 1000 years.

In [5]:
output_path_str_1 = f"/cases/{case_id}/archive/lnd/hist/1/"
output_path_str_2 = f"/cases/{case_id}/archive/lnd/hist/2/"
output_path_str_3 = f"/cases/{case_id}/archive/lnd/hist/3/"


In [6]:
hist_tape = "h0" # Name of the history tape to combine into a single file

os.environ['NCFILES_TO_COMBINE'] = f"*{hist_tape}*.nc" # Name of output .nc files to combine
os.environ['NC_OUT_NAME_1'] = f"{case_id}.{hist_tape}.0000-0500.nc" # Descriptive name for the resulting combined file
os.environ['NC_OUT_NAME_2'] = f"{case_id}.{hist_tape}.0501-1000.nc" 
os.environ['NC_OUT_NAME_3'] = f"{case_id}.{hist_tape}.1001-1500.nc" 

Because of an "Argument list too long" error, try to remove parts of the filenames before concatenating. Remove the first 35 characters with the 'remove' function that needs to be installed first. Then list the first 10 files in the folder to check that it looks right. This needs to be done in a terminal outside the container.From the case folder:

`sudo apt-get install rename`

`cd archive/lnd/hist`

`find . -type f -execdir rename 's/^.{35}//' {} \;`

`ls | head -10`

In [7]:
os.environ['CASE_ID'] = case_id
os.environ['CASE_HIST_PATH_1'] = output_path_str_1
os.environ['CASE_HIST_PATH_2'] = output_path_str_2
os.environ['CASE_HIST_PATH_3'] = output_path_str_3
os.environ['SAVE_PATH'] = save_path_str

Use cell magic to concatenate the files with `ncrcat`. It can take some time (several minutes) if it needs to combine many files.

In [8]:
!ncrcat $PWD/..$CASE_HIST_PATH_1$NCFILES_TO_COMBINE $PWD/..$SAVE_PATH$NC_OUT_NAME_1

In [9]:
!ncrcat $PWD/..$CASE_HIST_PATH_2$NCFILES_TO_COMBINE $HOME$SAVE_PATH$NC_OUT_NAME_2

In [10]:
!ncrcat $PWD/..$CASE_HIST_PATH_3$NCFILES_TO_COMBINE $HOME$SAVE_PATH$NC_OUT_NAME_3