# Using EcoFOCIpy to process raw field data

## Mooring / Timeseries Data

Basic workflow for each instrument grouping is *(initial archive level)*:
- Parse data from raw files into pandas dataframe
- output initial files (pandas->csv) **ERDDAP NRT** when no meta data is added

Convert to xarray dataframe for all following work *(working or final data level):
- ingest metadata from deployment/recovery records or cast logs
- process data beyond simple file translate
- apply any calibrations or corrections
    + field corrections
    + offsets
    + instrument compensations
    + some QC were available... this would be old-school simple bounds mostly
- adjust time bounds and sample frequency (xarray dataframe)
- save as CF netcdf via xarray: so many of the steps above are optional
    + **ERDDAP NRT or preliminary** if no corrections, offsets or time bounds are applied but none/some meta data is, this is an internally hosted dataset for primary analysis, quick review, and historical purposes.
    + **Working and awaiting QC** has no ERDDAP representation and is a holding spot, often has deck data removed.  It is usually a combination of editable csv files and archivable netcdf files
    + **ERDDAP Final** fully calibrated, qc'd and populated with meta information.  Used for distribution and science analysis

Plot for preview and QC
- preview images (indiv and/or collectively)
- manual qc process
- automated qc process ML/AI

## Example below is for TELOS Meteorological Data that has already been preprocessed into preliminary data.

Future processing of this instrument can be a simplified (no markdown) process which can be archived so that the procedure can be traced or updated

In [5]:
%matplotlib inline
import matplotlib.pyplot as plt
import yaml
import pandas as pd

import EcoFOCIpy.io.ncCFsave as ncCFsave
import EcoFOCIpy.metaconfig.load_config as load_config

The sample_data_dir should be included in the github package but may not be included in the pip install of the package

## Simple Processing - first step

In [51]:
sample_data_dir = '/Users/bell/Programs/EcoFOCIpy/'
user_data_dir = '/Users/bell/ecoraid/2021/Moorings/21bspr2a/'

In [53]:
###############################################################
# edit to point to {instrument sepcific} raw datafile 
datafile = user_data_dir+'final_data_cf/21BSPR-2A_Met.csv'
instrument = 'ATRH '
mooring_meta_file = user_data_dir+'logs/21BSPR-2A.yaml'
inst_meta_file = sample_data_dir+'staticdata/instr_metaconfig/telos_wx_cf.yaml'
institution_meta_file = sample_data_dir+'staticdata/institutional_meta_example.yaml' #include uaf?
inst_shortname = 'met'
###############################################################

#init and load data
met_wop = pd.read_csv(datafile, parse_dates=True, index_col='time (UTC)')

  met_wop = pd.read_csv(datafile, parse_dates=True, index_col='time (UTC)')


In [54]:
met_wop.index = met_wop.index.rename('date_time')

In [55]:
met_wop = met_wop.rename(columns={'BaroPres':'pressure',
                                  'Air_Temp':'temperature',
                                  'U_wind':'northward_wind',
                                  'V_wind':'eastward_wind',
                                  'RH':'relative_humidity',
                                  })
met_wop.sample()

Unnamed: 0_level_0,latitude,longitude,pressure,northward_wind,Uwind_Std,eastward_wind,Vwind_Std,relative_humidity,temperature,wind_speed,wind_from_direction
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2021-10-03 05:10:00,58.96,-166.18,1021.4,7.9,0.0,-6.6,0.0,85.92,4.93,10.294173,309.876834


In [56]:
met_wop.drop(columns=['latitude','longitude'],inplace=True)

In [57]:
#just a dictionary of dictionaries - simple
with open(mooring_meta_file) as file:
    mooring_config = yaml.full_load(file)

In [58]:
mooring_config['Instrumentation'][instrument]

{'InstType': 'ATRH',
 'SerialNo': '',
 'DesignedDepth': -1.0,
 'ActualDepth': 0.0,
 'PreDeploymentNotes': 'crossed off - 2044',
 'PostDeploymentNotes': 'Erratic (low) values in Sept/Oct for RH and Temp, likely due to submersion of instrument during fall storms',
 'Deployed': 'y',
 'Recovered': 'y'}

## Add Instrument meta information

Time, depth, lat, lon should be added regardless (always our coordinates) but for a mooring site its going to be a (1,1,1,t) dataset
The variables of interest should be read from the data file and matched to a key for naming.  That key is in the inst_config file seen below and should represent common conversion names in the raw data

In [59]:
#just a dictionary of dictionaries - simple
with open(institution_meta_file) as file:
    institution_config = yaml.full_load(file)

In [60]:
with open(inst_meta_file) as file:
    inst_config = yaml.full_load(file)

In [61]:
# Add meta data and prelim processing based on meta data
# Convert to xarray and add meta information - save as CF netcdf file
# pass -> data, instmeta, depmeta
met_wop_nc = ncCFsave.EcoFOCI_CFnc(df=met_wop, 
                                instrument_yaml=inst_config, 
                                operation_yaml=mooring_config,
                                operation_type='mooring', 
                                instrument_id=instrument, 
                                inst_shortname=inst_shortname)
met_wop

Unnamed: 0_level_0,pressure,northward_wind,Uwind_Std,eastward_wind,Vwind_Std,relative_humidity,temperature,wind_speed,wind_from_direction
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2021-05-06 00:00:00,1015.55,-5.5,0.0,0.8,0.0,86.50,3.42,5.557877,98.275893
2021-05-06 00:10:00,1015.41,-6.3,0.0,0.5,0.0,82.52,3.43,6.319810,94.537773
2021-05-06 00:20:00,1015.05,-6.9,0.0,-0.1,0.0,83.06,3.46,6.900725,89.169685
2021-05-06 00:30:00,1014.71,-7.1,0.0,-0.4,0.0,82.39,3.43,7.111259,86.775477
2021-05-06 00:40:00,1014.54,-7.5,0.0,-0.6,0.0,85.20,3.41,7.523962,85.426079
...,...,...,...,...,...,...,...,...,...
2022-01-19 16:50:00,1001.83,,,,,,,,
2022-01-19 17:00:00,1001.86,,,,,,,,
2022-01-19 17:10:00,1001.99,,,,,,,,
2022-01-19 17:20:00,1002.00,,,,,,,,


At this point, you could save your file with the `.xarray2netcdf_save()` method and have a functioning dataset.... but it would be very simple with no additional qc, meta-data, or tuned parameters for optimizing software like ferret or erddap.

In [62]:
# expand the dimensions and coordinate variables
# renames them appropriatley and prepares them for meta-filled values
met_wop_nc.expand_dimensions()

In [63]:
#build list from columsn in data - if a variable isn't in the yaml file, it will be dropped from the final data fields
met_wop_nc.variable_meta_data(variable_keys=list(met_wop.columns.values),drop_missing=True)
met_wop_nc.temporal_geospatioal_meta_data(depth='actual')
#adding dimension meta needs to come after updating the dimension values... BUG?
met_wop_nc.dimension_meta_data(variable_keys=['depth','latitude','longitude'])

The following steps can happen in just about any order and are all meta-data driven.  Therefore, they are not required to have a functioning dataset, but they are required to have a well described dataset

In [64]:
#add global attributes
met_wop_nc.deployment_meta_add()

#add instituitonal global attributes
met_wop_nc.institution_meta_add(institution_yaml=institution_config)

#add instrument global attributes
met_wop_nc.instrument_meta_data()

#add creation date/time - provenance data
met_wop_nc.provinance_meta_add()

#provide intial qc status field
met_wop_nc.qc_status(qc_status='unknown')


In [65]:
met_wop_nc.get_xdf()

In [66]:
# combine trim (not mandatory) and filename together (saves to test.nc without name)

# mooring_yaml['Instrumentation'][self.instrument_id]['DesignedDepth'])).zfill(4) #<-- alternative
filename = "".join(mooring_config['MooringID'].split('-')).lower()+'_'+inst_shortname+'.nc'
met_wop_nc.xarray2netcdf_save(xdf = met_wop_nc.autotrim_time(),
                           filename=filename,format="NETCDF3_CLASSIC")

  xdf.to_netcdf(filename,format=kwargs['format'],encoding={'time':{'units':'days since 1900-01-01'}})
