# Storing and compressing glacier directories for later use

"Glacier directories" are the fundamental data structure used by OGGM. They can be confusing at times, and can contain a high number of files, maling them hard to move between clusters or computers. This notebook explains how these directories are structured and how to store them for move and later use.

The main use-case documented by this notebook is the following workflow:
- pre-process a number of glacier directories
- store them and copy them to your storage, or moving them to another machine
- re-start from them on another machine / instance

In [17]:
# Libs
import os
import shutil

# Locals
import oggm.cfg as cfg
from oggm import utils, workflow, tasks

## The structure of the working directory

Let's open a new workflow for two Andean glaciers: Artesonraju and Shallap in Peru.

In [18]:
# Initialize OGGM and set up the default run parameters
cfg.initialize(logging_level='WARNING')
rgi_version = '62'

# Here we override some of the default parameters
# How many grid points around the glacier?
# Make it large if you expect your glaciers to grow large:
# here, 80 is more than enough
cfg.PARAMS['border'] = 80

# Local working directory (where OGGM will write its output)
WORKING_DIR = utils.gettempdir('compress_gdirs_wd')
utils.mkdir(WORKING_DIR, reset=True)
cfg.PATHS['working_dir'] = WORKING_DIR

# RGI glaciers: Artesonraju and Shallap in Peru
rgi_ids = utils.get_rgi_glacier_entities(['RGI60-16.02444', 'RGI60-16.02207'])

# Go - get the pre-processed glacier directories
gdirs = workflow.init_glacier_directories(rgi_ids, from_prepro_level=3)

2021-01-30 15:59:11: oggm.cfg: Reading default parameters from the OGGM `params.cfg` configuration file.
2021-01-30 15:59:11: oggm.cfg: Multiprocessing switched ON according to the parameter file.
2021-01-30 15:59:11: oggm.cfg: Multiprocessing: using all available processors (N=4)
2021-01-30 15:59:12: oggm.cfg: PARAMS['border'] changed from `20` to `80`.
2021-01-30 15:59:19: oggm.workflow: init_glacier_directories from prepro level 3 on 2 glaciers.
2021-01-30 15:59:19: oggm.workflow: Execute entity task gdir_from_prepro on 2 glaciers


In [19]:
gdir = gdirs[0]

In [25]:
glacier_stats = workflow.execute_entity_task(utils.glacier_statistics,
                                                 gdirs)

2021-01-30 16:06:15: oggm.workflow: Execute entity task glacier_statistics on 2 glaciers


In [31]:
import numpy as np
import pandas as pd

In [38]:
rgi_id = [gs.get('rgi_id', np.NaN) for gs in glacier_stats]
length = [gs.get('longuest_centerline_km', np.NaN) * 1e3
          for gs in glacier_stats]
area = [gs.get('rgi_area_km2', np.NaN) * 1e6 for gs in glacier_stats]
volume = [gs.get('inv_volume_km3', np.NaN) * 1e9 for gs in glacier_stats]
glacier_type = [gs.get('glacier_type', np.NaN) for gs in glacier_stats]
glacier_type = ['Glacier', 'Ice cap']
# create DataFrame
df = pd.DataFrame({'length': length, 'area': area, 'volume': volume, 'glacier_type': glacier_type},
                  index=pd.Index(rgi_id, name='rgi_id'))

In [40]:
df[df.glacier_type == 'Ice cap']

Unnamed: 0_level_0,length,area,volume,glacier_type
rgi_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
RGI60-16.02444,4447.663489,5943000.0,491074900.0,Ice cap


OGGM downloaded the pre-processed directories stored the tar files in your cache, and extracted them in your working directory. But how is this working directory structured? Let's have a look:

In [3]:
def file_tree_print(prepro_dir=False):
    # Just a utility function to show the dir structure and selected files
    print("cfg.PATHS['working_dir']/")
    tab = '  '
    for dirname, dirnames, filenames in os.walk(cfg.PATHS['working_dir']):
        for subdirname in dirnames:
            print(tab + subdirname + '/')
        for filename in filenames:
            if '.tar' in filename and 'RGI' in filename:
                print(tab + filename)
        tab += '  '

In [4]:
file_tree_print()

cfg.PATHS['working_dir']/
  per_glacier/
    RGI60-16/
      RGI60-16.02/
        RGI60-16.02207/
        RGI60-16.02444/


OK, so from the `WORKING_DIR`, OGGM creates a `per_glacier` folder (always)  where the glacier directories are stored. In order to avoid a large cluttering of the folder (and for other reasons which become apparent later), the directories are organised in regional (here `RGI60-16`) and then in  folders containing up to 1000 glaciers (here `RGI60-16.02`, i.e. for ids `RGI60-16.020000` to `RGI60-16.029999`).

Our files are located in the final folders of this tree (not shown in the tree). Fore example:

In [5]:
gdirs[0].get_filepath('dem').replace(WORKING_DIR, 'WORKING_DIR')

'WORKING_DIR/per_glacier/RGI60-16/RGI60-16.02/RGI60-16.02207/dem.tif'

Let's add some steps to our workflow, for example a spinup run that we would like to store for later: 

In [6]:
# Initialise glacier for run
workflow.execute_entity_task(tasks.init_present_time_glacier, gdirs);
# Run
workflow.execute_entity_task(tasks.run_from_climate_data, gdirs, 
                             output_filesuffix='_spinup',  # to use the files as input later on
                            );

2021-01-29 12:56:25: oggm.workflow: Execute entity task init_present_time_glacier on 2 glaciers
2021-01-29 12:56:26: oggm.workflow: Execute entity task run_from_climate_data on 2 glaciers


## Store the single glacier directories into tar files 

The `gdir_to_tar` task will compress each single glacier directory into the same folder per default (but you can actually also put the compressed files somewhere else, e.g. in a folder in your `$home`):

In [7]:
utils.gdir_to_tar?

In [8]:
workflow.execute_entity_task(utils.gdir_to_tar, gdirs, delete=False);
file_tree_print()

2021-01-29 12:56:30: oggm.workflow: Execute entity task gdir_to_tar on 2 glaciers


cfg.PATHS['working_dir']/
  per_glacier/
    RGI60-16/
      RGI60-16.02/
        RGI60-16.02207/
        RGI60-16.02444/
        RGI60-16.02207.tar.gz
        RGI60-16.02444.tar.gz


Most of the time, you will actually want to delete the orginal directories because they are not needed for this run anymore:

In [9]:
workflow.execute_entity_task(utils.gdir_to_tar, gdirs, delete=True);
file_tree_print()

2021-01-29 12:56:30: oggm.workflow: Execute entity task gdir_to_tar on 2 glaciers


cfg.PATHS['working_dir']/
  per_glacier/
    RGI60-16/
      RGI60-16.02/
        RGI60-16.02207.tar.gz
        RGI60-16.02444.tar.gz


Now the original directories are gone, and the `gdirs` objects are useless (attempting to do anything with them will lead to an error).

Since they are already available in the correct file structure, however, OGGM will know how to reconstruct them from the tar files if asked to:

In [10]:
gdirs = workflow.init_glacier_directories(['RGI60-16.02444', 'RGI60-16.02207'], from_tar=True, delete_tar=True)
file_tree_print()

2021-01-29 12:56:30: oggm.workflow: Execute entity task GlacierDirectory on 2 glaciers


cfg.PATHS['working_dir']/
  per_glacier/
    RGI60-16/
      RGI60-16.02/
        RGI60-16.02207/
        RGI60-16.02444/


These directories are now ready to be used again! To summarize: thanks to this first step, you already reduced the number of files to move around from N x M (where M is the number of files in each glacier directory) to N (where N is the number of glaciers).

You can now move this working directory somewhere else, and in another OGGM run instance, simply start from them as shown above.

## Bundle of directories

It turned out that the file structure above was a bit cumbersome to use, in particular for glacier directories that we wanted to share online. For this, we found it more convenient to bundle the directories into groups of 1000 glaciers. Fortunately, this is easy to do:

In [11]:
utils.base_dir_to_tar?

In [12]:
# Tar the individual ones first
workflow.execute_entity_task(utils.gdir_to_tar, gdirs, delete=True);
# Then tar the bundles
utils.base_dir_to_tar(WORKING_DIR, delete=True)
file_tree_print()

2021-01-29 12:56:32: oggm.workflow: Execute entity task gdir_to_tar on 2 glaciers


cfg.PATHS['working_dir']/
  per_glacier/
    RGI60-16/
      RGI60-16.02.tar


Now, the glacier directories are bundled in a file at a higher level even. This is even more convenient to move around (less files), but is not a mandatory step. The nice part about this bundling is that you can still select individual glaciers, as we will see in the next section. In the meantime, you can do: 

In [13]:
gdirs = workflow.init_glacier_directories(['RGI60-16.02444', 'RGI60-16.02207'], from_tar=True)
file_tree_print()

2021-01-29 12:56:32: oggm.workflow: Execute entity task GlacierDirectory on 2 glaciers


cfg.PATHS['working_dir']/
  per_glacier/
    RGI60-16/
      RGI60-16.02/
      RGI60-16.02.tar
        RGI60-16.02207/
        RGI60-16.02444/


Which did the trick! Note that the bundled tar files are never deleted. This is why they are useful for another purpose explained in the next section: creating your own "pre-processed directories".

## Self-made pre-processed directories for "restart" workflows

This workflow is the one used by OGGM to prepare the preprocessed directories that many of you are using. It is a variant of the workflow above, the only difference being that the directories are re-started from a file which is located elsewhere than in the working directory:

In [14]:
# Where to put the compressed dirs
PREPRO_DIR = utils.get_temp_dir('prepro_dir')
if os.path.exists(PREPRO_DIR):
    shutil.rmtree(PREPRO_DIR)

# Lets start from a clean state
utils.mkdir(WORKING_DIR, reset=True)
gdirs = workflow.init_glacier_directories(rgi_ids, from_prepro_level=3)

# Then tar the gdirs and bundle
workflow.execute_entity_task(utils.gdir_to_tar, gdirs, delete=True)
utils.base_dir_to_tar(delete=True)
file_tree_print()

2021-01-29 12:56:33: oggm.workflow: init_glacier_directories from prepro level 3 on 2 glaciers.
2021-01-29 12:56:33: oggm.workflow: Execute entity task gdir_from_prepro on 2 glaciers
2021-01-29 12:56:34: oggm.workflow: Execute entity task gdir_to_tar on 2 glaciers


cfg.PATHS['working_dir']/
  per_glacier/
    RGI60-16/
      RGI60-16.02.tar


In [15]:
# Copy the outcome in a new directory: scratch folder, new machine, etc.
shutil.copytree(os.path.join(WORKING_DIR, 'per_glacier'), PREPRO_DIR);

# Just a utility function to show the dir structure and selected files
print("PREPRO_DIR/")
tab = '  '
for dirname, dirnames, filenames in os.walk(PREPRO_DIR):
    for subdirname in dirnames:
        print(tab + subdirname + '/')
    for filename in filenames:
        if '.tar' in filename and 'RGI' in filename:
            print(tab + filename)
    tab += '  '

PREPRO_DIR/
  RGI60-16/
    RGI60-16.02.tar


OK so this `PREPRO_DIR` directory is where the files will stay for longer now. You can start from there at wish with:

In [42]:
PREPRO_DIR

'/var/folders/dc/r0qdkr9n45n4c2f2cc7v7pfr0000gn/T/OGGM/prepro_dir'

In [43]:
# Lets start from a clean state
utils.mkdir(WORKING_DIR, reset=True)
# This needs https://github.com/OGGM/oggm/pull/1158 to work
# It uses the files you prepared beforehand to start the dirs
gdirs = workflow.init_glacier_directories(rgi_ids, from_tar=PREPRO_DIR)
file_tree_print()

2021-01-30 23:34:10: oggm.workflow: Execute entity task gdir_from_tar on 2 glaciers


cfg.PATHS['working_dir']/
  per_glacier/
    RGI60-16/
      RGI60-16.02/
        RGI60-16.02207/
        RGI60-16.02444/


## What's next?

You have several options from here:
- return to the [OGGM documentation](https://docs.oggm.org), in particular [how to set up an OGGM run](https://docs.oggm.org/en/latest/run.html)
- back to the [table of contents](welcome.ipynb).
- explore other tutorials on the [OGGM-Edu](https://edu.oggm.org) platform.