# EcoFOCIpy RCM QC example (seperated)

see [(EcoFOCIpy_19ckp12a_rcm.ipynb)](EcoFOCIpy_19ckp12a_rcm.ipynb) for initial processing and formatting.  Data would have been convertted from raw instrument format to csv files, initial dirty plots created and a meta-data rich and meta-data poor dataset would have been created.  The metadata poor csv file is used for the initial preliminary data archive, the metadata rich netcdf file is used for QC procedures and eventual final data archive.

**Two Options**
- use xarray to load the netcdf data directly (working file), or pandas to load the csv file (initial archive)
- use erddapy to load the ERDDAP hosted dataset

## Next Steps

QC of data (plot parameters with other instruments)
- be sure to updated the qc_status and the history

- **TODO** Programmatically simplify following tools

In [26]:
%matplotlib inline
import matplotlib.pyplot as plt
import yaml

import pandas as pd
import xarray as xr
import datetime
import ecofocipy.io.erddap as erddap

import ecofocipy.metaconfig.load_config as load_config

In [27]:
sample_data_dir = '/Users/bell/Programs/EcoFOCIpy/'
user_data_dir = '/Users/bell/ecoraid/2019/Moorings/19ckp12a/'

In [28]:
###############################################################
# edit to point to {instrument sepcific} raw datafile 
datafile = user_data_dir+'rawconverted/sbe16/19ckp12a_an9_0056m.trimmed.cnv'
instrument = 'RCM9 726'
mooring_meta_file = user_data_dir+'logs/19CKP-12A.yaml'
inst_meta_file = sample_data_dir+'staticdata/instr_metaconfig/rcm_cf.yaml'
institution_meta_file = sample_data_dir+'staticdata/institutional_meta_example.yaml' #include uaf?
inst_shortname = 'an9'
###############################################################

In [29]:
#just a dictionary of dictionaries - simple
with open(mooring_meta_file) as file:
    mooring_config = yaml.full_load(file)

### QC Notes from plots

- Oxygen has a period of over-rangeing in June of 2020 (add a base value of 390? or just eliminate as its a small span and its not really clear whats happening here)
- recovery date too late after instrument stopped recording data to be a useful cal point
- rough field check points look good at the large scale
- oxygen needs to be salinity/depth corrected - done in workflow above


### Post Manual QC load data and rebuild nc file

- using excel for a few points or a dynamic web map for multiple (a tool on ecofoci-field.pmel.noaa.gov) or any other method to identify spikes
**NOTE** if you use excel, be cautious about how your time is formatted (its important) - use custom formatting and make it of the form `yyyy-mm-ddTHH:MM:SSZ` to assure proper read in to xarray

*TODO:* use erddap hosted data 

In [30]:
# this reads the previous csv and assumes you've made modifications to the content but not the structure (record length or variable names)
rcm_df_qc = pd.read_csv(user_data_dir+'working/'+filename.replace('nc','trimmed.csv'), index_col=['time','depth','latitude','longitude']) #order is important

In [31]:
# this loads the initial working netcdf file
rcm_wop_nc_qc = xr.load_dataset(user_data_dir+'working/'+filename)

In [32]:
rcm_wop_nc_qc

In [33]:
#replace original data with editied data
rcm_wop_nc_qc['oxy_conc'].values=xr.Dataset.from_dataframe(rcm_df_qc)['oxy_conc']
rcm_wop_nc_qc['oxy_percentsat'].values=xr.Dataset.from_dataframe(rcm_df_qc)['oxy_percentsat']
rcm_wop_nc_qc['oxy_conc_umkg'].values=xr.Dataset.from_dataframe(rcm_df_qc)['oxy_conc_umkg']


### Update global attributes

In [34]:
rcm_wop_nc_qc.attrs.update({'QC_indicator': 'ProbablyGood'})
rcm_wop_nc_qc.attrs.update({'history':(rcm_wop_nc_qc.history + "\nOxgyen record adjusted due to overrange period (data removed): "+ str(datetime.datetime.today()))})


In [35]:
rcm_wop_nc_qc.attrs.update({'date_modified':str(datetime.datetime.today())})

In [36]:
rcm_wop_nc_qc.to_netcdf(user_data_dir+'working/'+filename,format="NETCDF3_CLASSIC",encoding={'time':{'units':'days since 1900-01-01'}})