# trosat.cfconv module for creating CF-compliant NetCDF

The cfconv Python package is meant to simplify the definition and creation of [NetCDF4 files](https://www.unidata.ucar.edu/software/netcdf/) following the [Climate and Forecast Metadata conventions](https://cfconventions.org). In particular, it uses the [CF-JSON format](https://cf-json.org/specification) as means to store global and variable attributes and to define dimensions and variables.

## Installation and Requirements
This module requires attrdict (to also allow Javascript-like notation for accessing objects, eg., cfdict.variables instead of cfdict['variables']). An optional dependency is the jstyleson module. In this case, json files can contain comments.

The following commands can be used to install both packages:

        > pip install jstyleson
        > pip install attrdict

NB: attrdict is also available via conda-forge.

To install the module, use the following two methods:

1. Manual download/install
        > wget https://github.com/hdeneke/trosat-base/archive/master.tar.gz
        > tar -xf master.tar.gz
        > cd trosat-base-master
        > python setup.py install
    
2. Install via PIP
        > pip install git+https://github.com/hdeneke/trosat-base
   
TBD:
* register project at PyPI
* create release version
* list dependencies in setup.py

## Basic Usage
The following example illustrates the usage of this package.

First, import required modules:

In [1]:
import os
import json
import numpy as np
from trosat import cfconv as cf

Now read in a cf-json file, which returns a dictionary describing the basic structure in terms of attributes, dimensions and variables of the NetCDF file:

In [2]:
cfdict = cf.read_cfjson('example_cfmeta.c01.json')

For this example, the JSON dictionary looks as follows:

In [3]:
from json import JSONEncoder
from attrdict import AttrMap

class CFJSONEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, AttrMap):
            return dict(obj)
        return super().default(obj)

print(json.dumps(dict(cfdict),indent=4, cls=CFJSONEncoder))

{
    "attributes": {
        "conventions": "CF-1.7",
        "title": "Geolocation File for TROPOS CPP retrieval",
        "institution": "Leibniz Institute for Tropospheric Research (TROPOS)",
        "address": "Permoser Str. 15, 04318 Leipzig, Germany",
        "author": "Hartwig Deneke, mailto:deneke@tropos.de"
    },
    "dimensions": {
        "y": 720,
        "x": 1200
    },
    "variables": {
        "x": {
            "shape": [
                "x"
            ],
            "type": "f4",
            "attributes": {
                "units": "m",
                "long_name": "x coordinate of projection",
                "standard_name": "projection_x_coordinate"
            }
        },
        "lat": {
            "shape": [
                "y",
                "x"
            ],
            "type": "f4",
            "attributes": {
                "units": "degrees_north",
                "standard_name": "latitude",
                "valid_range": [
                    -9

Create a NetCDF4 file based on the definitions given in the dictionary:

In [4]:
f = cf.create_file('test.nc', cfdict=cfdict)

Print the string representation of the generated file, showing dimensions etc:

In [5]:
print(f)

<class 'trosat.cfconv.File'>
root group (NETCDF4 data model, file format HDF5):
    conventions: CF-1.7
    title: Geolocation File for TROPOS CPP retrieval
    institution: Leibniz Institute for Tropospheric Research (TROPOS)
    address: Permoser Str. 15, 04318 Leipzig, Germany
    author: Hartwig Deneke, mailto:deneke@tropos.de
    dimensions(sizes): y(720), x(1200)
    variables(dimensions): float32 [4mx[0m(x), float32 [4mlat[0m(y,x), uint16 [4melev[0m(y,x)
    groups: 



Initialize/set data, using it as standard netCDF4.Dataset object 

In [6]:
f['lat'][:] = 0.0 # or use a numpy array, e.g. = np.ones((720,1200),dtype=np.float32)

When done, close file

In [7]:
f.close()

## Extensions, Usage Tips and Tricks

### Comments
The JSON format does not allow comments, while comments are useful and encouraged for annotations of attributes etc. Therefore, the jstyleson package is used for parsing JSON if it is available on your Python installation. This might become a hard dependency in the future. Thus, if you have jstyleson installed, comments in the JSON are tolerated. Note that you can also use jstyleson to remove comments.

### Variable Encoding
As an extension of CF-JSON, it is possible to set various parameters for encoding the dataset in the netCDF4 files.  For this purpose, an "encoding" object can be added globally (default setting for all varialbes) or inside the variable object (per variable settings). The parameters are then passed as keyword arguments to the netCDF4.Dataset.createVariable function when the variables are created.

The following json-formatted "encoding" object enables GZIP compression and chunking:
    "encoding" : { "zlib":true, "complevel":6, "chunksizes": [60,60]}

NB: Note that encoding parameters are specific to the NetCDF4 library.
!!! NB: not yet fully implemented !!!

In [12]:
# Add encoding object manually to cfdict
cfdict['variables']['elev']['encoding'] = {"zlib":True, "complevel":6, "chunksizes": [60,60]}

# create file
f = cf.create_file('testz.nc', cfdict=cfdict)
print(f['elev'])
print('complevel: ', f['elev'].filters().get('complevel', False))
f.close()

<class 'netCDF4._netCDF4.Variable'>
uint16 elev(y, x)
    units: m
    long_name: elevation
    comment: Based on SRTM15_PlUS V1 15 arcsec resolution dataset
    coordinates: lon lat
    scale_factor: 1.0
    add_offset: -1000.0
unlimited dimensions: 
current shape = (720, 1200)
filling on, default _FillValue of 65535 used

complevel:  6


### netCDF4 Mode Settings
The netCDF4 Python pacakge features various modes for accessing/reading datasets. Specifically, the "scale_factor" and "add_offset" attributes together with an integer datatype can be used to convert float variables to an integer on-disk representation.



In [None]:
# store data using autoscaling
f = cf.create_file('test1.nc', cfdict=cfdict)
f.set_auto_scale(True)
x = 1000.0*np.random.rand(720,1200)
f['elev'][:,:] = x
f.close()

# store data without scaling
f = cf.create_file('test2.nc', cfdict=cfdict)
f.set_auto_scale(False)
a,b = f['elev'].scale_factor, f['elev'].add_offset
# scaling: s = a*u+b => u=(s-b)/a
f['elev'][:,:] = ((x-b)/a).round().astype(np.int)
f.close()

import netCDF4 as nc
x1 = nc.Dataset('test1.nc','r')['elev']
x2 = nc.Dataset('test2.nc','r')['elev']
assert np.any(x1[:]==x2[:])

In [None]:
x1.set_auto_scale(True)
print(x1[:])
x1.set_auto_scale(False)
print(x1[:])

In [None]:
x2.set_auto_scale(True)
print(x2[:])
x2.set_auto_scale(False)
print(x2[:])