# Example: hycom Data Processing

This notebook outlines some of the basic concepts needed for defining a C3-based achitecture to archive and work with Hycom FMRC data.

## References
https://www.hycom.org/data/gomu0pt04/expt-90pt1m000  
https://www.unidata.ucar.edu/software/tds/current/tutorial/files/FmrcPoster.pdf  
https://tds.hycom.org/thredds/catalog/GOMu0.04/expt_90.1m000/FMRC/runs/catalog.xml

## Background and Goals

The cells below download and access a single run from the hycom simulation for the Gulf of Mexico called "GOMu0.04_901m000_FMRC". FMRC means: Forcast Model Run Collection. The files retrived are in NetCDF format.  [NetCDF](https://www.unidata.ucar.edu/software/netcdf/) is a binary file format (spec/api/library) written on top of the more general [HDF5](https://www.hdfgroup.org/solutions/hdf5/) library.

### Inital Goals
1. Define a type that Mixes `File` and/ or `Client` where Hycom sim data can be collected. We have a initial prototype provisioned called `HycomFMRC`
2. Define a type to handle the data download, possibly mixing the "REST" type.
  - Do file introspection (of NetCDF/HDF5 file) to populate fields of our `HycomFMRC` type once the file is downloaded.
  - Automate retrieval using Cron etc.
3. Explore possibilities for retrieving data from files:
  - One use case: Retrieve a series of 2D slices...over time (say surface temp or something) and be able to either directly load them or stream them.
  
Generally, after solving the storage issue and figuring out source/tranform and entiy types... I am _assuming_ we will want to support the ability to retrive and/or stream data from any one of the datasets(variables) in the collection of runs _across time_.
  
### More on NetCDF
NetCDF files are HDF5 files.  These formats both have rich software ecosystem that support accessing data efficiently and are used to manage large multidimensionall datasets for many large scale HPC-based codes.  IF one were to support the use case I mentioned above using NetCDF/HDF5 only it could be accomplished as follows:
* Create a directory containing the collection of FMRC run files
* Add a "parent" file that contains a dataset that points to  each dataset in the individual run files
* Use the netcdf (or HDF5) library to open the parent file and request an array that does any sort of sliceing and dicing across all the files on desires.


## Requirements
This Notebook requires the py-hycom_1_0_0 kernel.

A prototype `HycomFMRC` type is provision wit hthe `dti-jupyter` package:

In [1]:
from datetime import date
from datetime import timedelta
import xml.etree.ElementTree as ET
import netCDF4 as nc
import requests
import pandas as pd
from pivottablejs import pivot_ui
import xmltodict
from urllib.parse import urlencode,urljoin
import pandas as pd
from IPython.display import display

ModuleNotFoundError: No module named 'xmltodict'

## Types
The following types are currently provisioned to support Hycom Data:  
(todo: run query to list all Types in hycom- package.)  
```
HycomDataset
HycomFMRC
HycomFMRCFile
GeospatialCoverage
```
Uncomment and run help command cells below for more info.

In [None]:
#help(c3.HycomDataset)

In [None]:
#help(c3.HycomFMRC)

In [None]:
#help(c3.HycomFMRCFile)

In [None]:
# Ensure we have a Dataset entry for the desired catalog
cat_url = "https://tds.hycom.org/thredds/catalog/GOMu0.04/expt_90.1m000/FMRC/runs/catalog.xml"
gom_dataset = c3.HycomDataset.upsertHycomDatasetFromCatalog(url = cat_url)

In [None]:
# Grab the HycomDataset record that was created.
objs = c3.HycomDataset.fetch().objs
if objs:
    display(pd.DataFrame(objs.toJson()))


In [None]:
# Create HycomFMRC records for every run that is currenty listed in the catalog
# This uses the...
fmrcs = gom_dataset.upsertFMRCFromDatasetCatalog()
fmrcs

In [None]:
# Grab the HycomFMRC records that were created.
objs = c3.HycomFMRC.fetch().objs
if objs:
    display(pd.DataFrame(objs.toJson()))

In [None]:
# Detail: look at the timeCoverage for a single HycomFMRC
fmrcs = c3.HycomFMRC.fetch()
fmrcs.objs[0].timeCoverage

In [None]:
# Download a datafile for each FMRC record
# Note: currently only a fetch of a single timestep is supported, but multiple 
# files can be retrived for a single HycomFMRC record.
# This demo grabs the first available forcast time for the run.
def downloadAll(fmrcs):
    
    fmrc_files = [
        fmr.downloadFMRCRunData(
            time_start = fmr.timeCoverage.start.strftime("%Y-%m-%dT%H:%M:%SZ"),
            time_end = fmr.timeCoverage.start.strftime("%Y-%m-%dT%H:%M:%SZ")
        ) for fmr in fmrcs.objs
    ]
    return fmrc_files
        
downloadAll(fmrcs)

In [None]:
# List the resulting HycomFMRCFile records
objs = c3.HycomFMRCFile.fetch().objs
if objs:
    display(pd.DataFrame(objs.toJson()))

In [None]:
files = c3.FileSystem.inst().listFiles("hycom-data")
files

In [None]:
# ToDoOpen a file to confirm...
# Question: How do I call member functions of type "File" from HycomRMRCFile?
file = c3.HycomFMRCFile.fetch().objs[0]
file.directoryUrl()

In [None]:
# Cleanup
print(f"Removed {c3.HycomFMRCFile.removeAll()} HycomFMRCFile records.")
print(f"Removed {c3.HycomFMRC.removeAll()} HycomFMRC records.")
print(f"Removed {c3.HycomDataset.removeAll()} HycomDataset records")
files = c3.FileSystem.inst().listFiles("hycom-data")
if files.files:
    print(f"Deleting {len(files.files)} files")
    c3.FileSystem.inst().deleteFilesBatch(files.files)
print("Done.")