# AWS S3 "file system" management, including zarr dataset management
4/23/2020. Emilio

**WARNING: This is a messy notebook, not run linearly top to bottom!** Use it for the code samples and notes it currently provides, not as a normal notebook that has been run linearly from start to end.

Sample code illustrating S3 "file system" interaction, zarr attribute editing and addition, and other dataset management operations not involving data (as opposed to metadata) already in the dataset.

In [27]:
from pathlib import Path
import matplotlib.pyplot as plt
%matplotlib inline
import s3fs
import zarr
import xarray as xr

from snowmodelzarrfs import connect_fs, get_zarrstore

In [64]:
from IPython.display import Markdown, display

def printmd(string):
    """Print out a markdown string as rendered markdown"""
    display(Markdown(string))

## Establish AWS file system connection

In [2]:
# Options: localfs, localminio_s3, aws_s3, anon_aws_s3
FS_type, bucket = "aws_s3", "snowmodel"
aws_profile_name = 'cso'

In [3]:
FS = connect_fs(FS_type, aws_profile_name=aws_profile_name)

## "File System" interactions, management

https://s3fs.readthedocs.io/en/latest/api.html

In [26]:
FS.ls('snowmodel')

['snowmodel/modeloutput',
 'snowmodel/swe_a-geo_no_rad_layers0_no_Tlapse_no_Plapse_outputs_wo_assim.zarr',
 'snowmodel/swe_a-ts_no_rad_layers0_no_Tlapse_no_Plapse_outputs_wo_assim.zarr',
 'snowmodel/swe_run_a-geo.zarr',
 'snowmodel/swe_run_a-ts.zarr']

In [55]:
FS.exists('snowmodel/swe_run_a-ts.zarr')

True

In [28]:
FS.isdir('snowmodel/swe_run_a-ts.zarr')

True

In [30]:
FS.info('snowmodel/swe_run_a-ts.zarr')

{'Key': 'snowmodel/swe_run_a-ts.zarr',
 'Size': 0,
 'StorageClass': 'DIRECTORY',
 'type': 'directory',
 'size': 0,
 'name': 'snowmodel/swe_run_a-ts.zarr'}

In [35]:
# Create a file within a folder (create the folder if it doesn't exist)
# FS.touch('snowmodel/modeloutput/testfile')

In [37]:
FS.url('snowmodel/modeloutput/testfile')

'https://snowmodel.s3.amazonaws.com/modeloutput/testfile?AWSAccessKeyId=AKIARRLKWC3WHF6KH7V7&Signature=gfkGdfbMPqu0kfVmgsKUQAdu0I0%3D&Expires=1587515234'

In [34]:
# Remove a zarr dataset (a folder) and all its children
# FS.rm('snowmodel/swe_gdat_1.zarr', recursive=True)

In [36]:
[fo['Key'] for fo in FS.listdir('snowmodel')]

['snowmodel/modeloutput',
 'snowmodel/swe_a-geo_no_rad_layers0_no_Tlapse_no_Plapse_outputs_wo_assim.zarr',
 'snowmodel/swe_a-ts_no_rad_layers0_no_Tlapse_no_Plapse_outputs_wo_assim.zarr',
 'snowmodel/swe_run_a-geo.zarr',
 'snowmodel/swe_run_a-ts.zarr']

To interact with a file on the AWS S3 bucket (open, read, write, etc), see https://s3fs.readthedocs.io/en/latest/api.html#s3fs.core.S3File

### Create new file on the S3 bucket and write Markdown content into it

In [57]:
test_md_content = """
# Document model run datasets

Model runs:
- `snowmodel/swe_a-geo_no_rad_layers0_no_Tlapse_no_Plapse_outputs_wo_assim.zarr`: geo, created by Nina
- `snowmodel/swe_a-ts_no_rad_layers0_no_Tlapse_no_Plapse_outputs_wo_assim.zarr`: ts, created by Nina
- `snowmodel/swe_run_a-geo.zarr`: geo, created by Emilio
- `snowmodel/swe_run_a-ts.zarr`: ts, created by Emilio
"""

In [60]:
with FS.open('snowmodel/modelruns.md', mode='w') as f:
    f.write(test_md_content)

Now read it back and render (formatted) the Markdown content here in the notebook. Without this special handling, it would be just a plain text string.

In [65]:
with FS.open('snowmodel/modelruns.md', mode='r') as f:
    modelruns_aws_read_md = f.read()

In [66]:
printmd(modelruns_aws_read_md)


# Document model run datasets

Model runs:
- `snowmodel/swe_a-geo_no_rad_layers0_no_Tlapse_no_Plapse_outputs_wo_assim.zarr`: geo, created by Nina
- `snowmodel/swe_a-ts_no_rad_layers0_no_Tlapse_no_Plapse_outputs_wo_assim.zarr`: ts, created by Nina
- `snowmodel/swe_run_a-geo.zarr`: geo, created by Emilio
- `snowmodel/swe_run_a-ts.zarr`: ts, created by Emilio


## Read model output zarr from AWS S3 and interact with its metadata

In [45]:
zarrds = "swe_a-ts_no_rad_layers0_no_Tlapse_no_Plapse_outputs_wo_assim.zarr"

In [46]:
zarrstore = get_zarrstore(FS, FS_type, bucket, zarrds)

In [53]:
zds = xr.open_zarr(
    store=zarrstore, 
    consolidated=True
)

Note: The global attributes below already include the additions made in the cells at the bottom of the notebook, b/c this notebook was not run in a linear sequence.

In [54]:
zds

Unnamed: 0,Array,Chunk
Bytes,24.98 GB,41.40 MB
Shape,"(1825, 2476, 1382)","(460, 150, 150)"
Count,681 Tasks,680 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 24.98 GB 41.40 MB Shape (1825, 2476, 1382) (460, 150, 150) Count 681 Tasks 680 Chunks Type float32 numpy.ndarray",1382  2476  1825,

Unnamed: 0,Array,Chunk
Bytes,24.98 GB,41.40 MB
Shape,"(1825, 2476, 1382)","(460, 150, 150)"
Count,681 Tasks,680 Chunks
Type,float32,numpy.ndarray


### Changing the attributes using zarr directly, *after* creating the zarr dataset
Use this code to modify global and variable attributes of existing zarr datasets.

In [50]:
# zarr.open_consolidated doesn't allow changing the metadata. But it can be used to add new variables
# zstore = zarr.open_consolidated(store=zarrstore, mode='r+')

zstore = zarr.open(store=zarrstore, mode='r+')

Defining calibration parameters and other model run settings. The example below uses global attributes. Another option would be to create a new, scalar variable that is populated with granular attributes. Each specific calibration parameter or model setting can be a variable attribute. For example (as a JSON):
```json
{
    "snowmodel_version": "x.y.z",
    "run_date": "2020-99-88",
    "rad": "no",
    "layers": 0,
    "Tlapse": "no",
    "Plapse": "no",
    "assimilation": "no"
}
```

In [51]:
zstore.attrs['snowmodel_version'] = "x.y.z"
zstore.attrs['run_date'] = "2020-99-88"
zstore.attrs['calibration_parameters'] = "no_rad/layers0/no_Tlapse/no_Plapse/outputs/wo_assim"

# zstore.swe.attrs['swe_append_attr'] = 'my swe appended attribute'

In [52]:
# Must pass the zarr store path, not the opened zstore
zarr.consolidate_metadata(zarrstore)

<zarr.hierarchy.Group '/'>