Skip to content

Scripts for monitoring operational data at the Codiga Center

Notifications You must be signed in to change notification settings

stanfordcr2c/cr2c-monitoring

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Synopsis

The cr2c-monitoring project manages the data systems for the Bill & Cloy Codiga Resource Recovery Center (CR2C). CR2C produces three principal streams of data: laboratory data from water quality and other testing, operational data from the facility's automated sensors, and field data collected by the facility's operators in their daily checks of the plant's performance. The scripts in this repository process and output these data streams to a single data store, perform various analyses to monitor and validate the plant's performance on a daily basis, and integrate important information from all data into a single, online visualization and data querying platform.

contributors

The contributors to this project are managers and operators of the Codiga Center:

Sebastien Tilmans (Director of Operations) Jose Bolorinos (Operator)

Table of Contents

Prerequisites

To use this repository on your machine, open a terminal window, change to the directory where you would like to place the repository, and clone the repository:

cd "/mydir"
git clone https://github.com/stanfordcr2c/cr2c-monitoring

This project is based on Python 3 and makes use of Python's data management modules, including Numpy, Pandas, and sqlite3. In addition, all interactive plotting is done with Dash, a web application module developed by the Plotly Project, built on top of the Flask framework. We also make extensive use of the Google Cloud Services platform, including:

Google App Engine

Google BigQuery

Google Sheets API

All dependencies are listed in the "requirements.txt" document in the repository.

Data data-structures

Laboratory Data

Raw laboratory data are entered manually into the “CR2CMonitoringData” google spreadsheet. The cr2c_labdata scripts read these data from the google spreadsheet using the Google Sheets API, verifies that dates, stages and values have been entered correctly and computes the resulting values (for example, raw COD data gives a dilution factor and a reading from the Hach spectrometer and the code computes the resulting COD value (in mg/L)).

Cleaned lab data are inserted/updated into their respective table on the cr2c_lab_data.db data store on Codiga’s Box folder. The Laboratory Data Schematic figure below shows the structure of the laboratory data. The data store contains a table for each parameter type (eg. “COD_data” table is a table containing processed COD lab data). Each table has a column indicating:

  • The date and time of sample collection (“Date_Time”)
  • The treatment stage of the collected sample (“Stage”, this refers to the location in the treatment plant from which the sample was collected),
  • The parameter value type (“Type”, where appropriate, eg. COD, which can either be "Total", "Soluble", or "Particulate" vs PH which can only be PH...)
  • The value of the laboratory parameter (“Value”).

Laboratory Data Schematic:

Field Data

Data from monitoring forms that are filled out on site are synced to the “CR2CMonitoringProcedures” spreadsheet through the Google Forms app. In the same way as the laboratory data, these data can be downloaded from the google spreadsheet using the Google Sheets API. The cr2c_fielddata scripts clean and load the data from the Google spreadsheet where they are stored and output them to the cr2c_fielddata.db data store.

The Field Data Schematic figure below shows structure of the field data as they are currently stored in the cr2c_fielddata.db store. Somewhat analogously to a spreadsheet, each tab is a table in a data store that records the answers to a given google form (so if the name of the google form is "DailyLogSheet" the corresponding name of the table in the cr2c_fielddata.db store is "DailyLogSheet"). Each variable name in the table for a given google form is the name of the question in that form and each row is an answer to that question. For example, if the answer to the question "Operator Initials" in the "DailyLogSheet" form is "PB", this will be recorded as a value of "PB" for the variable names "Operator_Initials" in a table named "DailyLogSheet" in the cr2c_fielddata.db store. The time stamp of each form submission is automatically stored by google forms. Note: the script that processes form submissions cleans the question text string, so that the name of the variable in the form table is not identical to the text of the question in the form (see clean_varname for details).

Field Data Schematic:

Operational Data

Operational data logged by the facility's automated sensors are wired through its programmable logic controller (PLC) and transmitted to a remote data collector (i.e. server) via open platform communication (OPC). These data are then pre-processed by WonderWare's eDNA data historian, which compresses and clean's raw OPC data. Currently, these operational data are obtained by manually remoting into the data collector, running an eDNA query, and transferring the output to a local machine where further cleaning and compression are performed by the cr2c_opdata scripts. The cr2c_opdata scripts cleans operational data by removing outlying values, calculating average parameter values for a specified time period and resolution (minute-level, hourly, daily, etc), and outputting the result to the cr2c_opdata.db data store.

The Operational Data Schematic below shows the structure of the operational data as they are currently stored in the cr2c_opdata.db store. Each table in the data store corresponds to a sensor id that identifies an automated sensor at the plant which in turn is specified with a sensor type that logs water flows ("WATER"), biogas flows ("GAS"), and other parameters such as pH ("PH"), temperature ("TEMP"), and trans-membrane pressure ("TMP"). Along with the sensor id and the sensor type, each table in the data store is specific to a certain level of temporal aggregation: 1 minute, 5 minutes, 1 hour, 6 hours, 1 day, etc. To illustrate how sensor data map to a table name, 1-hour average water flows read by sensor FT200 would be stored in cr2c_opdata.db in a table called "WATER_FT200_1_HOUR_AVERAGES".

Each table in the cr2c_opdata.db store contains four variables:

  • The time stamp corresponding to the start of the time period in question ("Time")
  • The year of the time period in question ("Year")
  • The month of the time period in question ("Month")
  • The value read by the sensor (Value). Note, when querying these data, the "Value" variable name can be changed to the corresponding sensor id to permit easy merging of multiple sensor readings into a single wide table (see Operational Data: get_data documentation below for more details)

Operational Data Schematic:

Documentation

update_data

Description: Function to mannually all processes that update the cr2c-monitoring dashboard app

Arguments:

  • pydir: String, directory with client secret file and google spreadsheet ids
  • lab_update: Logical, runs labrun.process_data() method to update with new lab data
  • fld_update: Logical, runs process_data() method to update with new field measurements data
  • op_update: Logical, runs opdata_agg.run_agg() method on desired sids to calculate hourly or minute averages for new sensor data
    • hmi_path: String, path to the csv file with raw sensor data
    • hour_sids: List of sensor ids whose hourly averages we want to update
    • minute_sids: List of sensor ids whose minute averages we want to update
    • op_start_dt_str: String, format 'mm-dd-yy' giving first day for which to calculate hourly/minute averages
    • op_end_dt_str: String, format 'mm-dd-yy' giving last day for which to calculate hourly/minute averages
  • val_update: Logical, runs processes that update data validation parameters
    • biotech_params: Logical, runs cr2c_validation.get_biotech_params() and cr2c_validation.get_cod_bal() to update biotech validation parameters
    • val_sids: List of sensors for which to run validation process (as part of cr2c_validation.instr_val() function)
    • val_end_dt_str: String, format 'mm-dd-yy' giving last day to check operational data for sensor validation procedure
    • nweeks_back: Integer, number of weeks looking back when running validation processes (relative to "val_end_dt_str")

Example Caller: update_data( pydir = 'path/to/GoogleProjectsAdmin', lab_update = True, fld_update = True, op_update = True, hmi_path = 'path/to/hmi_data/hmi_data.csv', hour_sids = ['AT201','AT303','AT306','AT309'] + ['AT203','AT305','AT308','AT311'] + ['FT200','FT201','FT202','FT300','FT301','FT302','FT303','FT304','FT305','FIT600'] + ['FT700','FT702','FT704'] + ['AT202','AT304','AT310'] + ['AIT302'] + ['DPIT300','DPIT301','DPIT302'] + ['PIT205','PIT700','PIT702'] + ['LT100','LT200','LT201','LIT300','LIT301'] , minute_sids = ['AT203','AT305'] + ['FT305'] + ['AIT302'] + ['DPIT300','DPIT301'] + ['PIT700'] , op_start_dt_str = '8-22-19', op_end_dt_str = '8-31-19', val_update = False, biotech_params = False, val_sids = ['AT203','AT305','AT308','AT311','DPIT300','DPIT301','DPIT302','PIT700','PIT702','PIT704'], val_end_dt_str = '8-31-19', nweeks_back = 4

)

cr2c-utils

Description: For now just a set of general-purpose methods that are useful to any of the cr2c-monitoring scripts

get_gsheet_data(sheet_names)

Description: Retrieves data of specified tabs in a gsheets file

Arguments:

  • sheet_names: List of gsheet names from which to obtain data

Output:

  • df: Pandas dataframe with raw lab data from specified gsheet name

get_credentials(pydir)

Description: Gets valid user credentials from storage. If nothing has been stored, or if the stored credentials are invalid, the OAuth2 flow is completed to obtain the new credentials.

Arguments:

  • pydir: String, directory with client secret file and google spreadsheet ids

Output:

  • credentials: user credentials for accessing google spreadsheet file
  • spreadsheetID: id of the google spreedsheet file

get_data(datasetid, table_names, varnames = None, local = False, local_dir = None, start_dt_str = None, end_dt_str = None, output_csv = False, outdir = None )

Description: Retrieves data of specified tabs in a gsheets file

Arguments:

  • datasetid: A string giving the id of the dataset from which data are being queried, can be "labdata","fielddata","opdata","validation"
  • table_names: List of table names within each dataset from which we want to query data
  • varnames: (Optional)List of variables within each table for which we want data. Default is None
  • local: (Optional) Boolean indicating whether or not a local database is being queried, or the google BigQuery database. Default is False
  • local_dir: (Optional, Required if local = True) String giving the directory of the local database. Default is None
  • start_dt_str: (Optional) date string to filter the result by date, sets the minimum date of the resulting data. Format MUST BE 'mm-dd-yy' so 1-1-18 for January 1st, 2018
  • end_dt_str: (Optional) Same as start_dt_str but sets the maximum date of the resulting data
  • output_csv: (Optional) Logical, if True, will output a csv file for each of the ltypes specified above
  • outdir: (Optional, required if output_csv is True) String giving the directory to output the csv file(s) to Output:
  • df: A dictionary of Pandas dataframes. The entries are (key,value): (table_name,dataframe)

write_to_db(df, projectid, dataset_id, table_name, create_mode = False, local = False, local_dir = None)

Description: Writes a pandas dataframe to a database. Ensures that only new records are being added

Arguments:

  • df: Pandas dataframe to be written to database
  • projectid: String, project id (for now only have "cr2c-monitoring")
  • dataset_id: String, dataset being written to
  • table_name: String, table being written to
  • create_mode: Boolean, indicating whether a new table being created. This is useful if it is necessary to modify a table schema and rewrite all table results with new schema
  • local: (Optional) Boolean indicating whether or not a local database is being queried, or the google BigQuery database. Default is False
  • local_dir: (Optional, Required if local = True) String giving the director of the local database. Default is None

cr2c-labdata

get_data(ltypes, start_dt_str = None, end_dt_str = None, output_csv = False, outdir = None)

Description: Wrapper for querying data from the cr2c_labdata.db data store

Arguments:

  • ltypes: A list of the types of lab data the user wishes to query, these can be any of the following:
    • 'PH' - Refers to pH measurements
    • 'COD' - Refers to Chemical Oxygen Demand measurements
    • 'TSS_VSS' - Refers to Total Suspended and Volatile Suspended Solids measurements
    • 'ALKALINITY' - Refers to alkalinity measurements
    • 'VFA' - Refers to Volatile Fatty Acid measurements
    • 'GASCOMP' - Refers to gas composition measurements (% of gas that is CO2, Methane, O2, etc)
    • 'AMMONIA' - Refers to ammonia measurements
    • 'SULFATE' - Refers to sulfate measurements
    • 'TKN' - Refers to Total Kjeldahl Nitrogen measurements
    • 'BOD' - Refers to Biochemical Oxygen Demand measurements
  • start_dt_str: (Optional) date string to filter the result by date, sets the minimum date of the resulting data. Format MUST BE 'mm-dd-yy' so 1-1-18 for January 1st, 2018
  • end_dt_str: (Optional) Same as start_dt_str but sets the maximum date of the resulting data
  • output_csv: (Optional) Logical, if True, will output a csv file for each of the ltypes specified above
  • outdir: (Optional, required if output_csv is True) String giving the directory to output the csv file(s) to

Output:

  • ldata_all: A dictionary with the resulting data. The keys are each of the ltypes specified above and the values are pandas dataframes with data for the given ltype

Example Caller: get_data( ltypes = ['PH','COD'], start_dt_str = '1-1-17', end_dt_str = '12-31-17', output_csv = True, outdir = 'path/to/output/dir' )

labrun

Description: A class for managing the cleaning and processing of laboratory data Inputs:

  • verbose: (Optional) Logical (default false) indicating whether to print more error messages to log while processing lab data

labrun.get_stage_descs()

Description: Gets a more descriptive value for each of the treatment stages entered by operators into the lab data gsheet (as the variable "Stage").

labrun.manage_dups(ltype, id_vars)

Description: Manages duplicate observations by removing duplicates (with warnings) and gets observation id's for readings with duplicate Date-Stage values (if intentionally taking multiple readings)

Arguments:

  • ltype: One of the ltypes specified in get_data above whose duplicates are being managed
  • id_vars: A list of the variables that should identify a unique reading, these are usually "Date_Time", "Stage", and "Type" (if appropriate)

labrun.set_var_format(ltype, variable, var_format)

Description: Tries to format a raw input variable from gsheets data as a datetime or a floating number; outputs error message if the input data cannot be re-formatted

Arguments:

  • ltype: One of the ltypes specified in get_data above whose data are being re-formatted
  • variable: The name of the variable being re-formatted, as it appears in the gsheet
  • var_format: The desired format of variable, in reality is either "None" (if the variable is "Date") or "float", but keeping additional flexibility in case it is needed in the future.

labrun.clean_dataset(ltype, id_vars)

Description: Cleans the raw gsheets lab data by reformatting variables and removing duplicates (or tagging intentional duplicates)

Arguments:

  • ltype: One of the ltypes specified in get_data above whose data are being cleaned
  • id_vars: As in labrun.manage_dups, a list of the variables that should identify a unique reading

labrun.wide_to_long(ltype, id_vars, value_vars)

Description: Performs a pandas melt procedure on the lab data with the right column ordering

Arguments:

  • ltype: One of the ltypes specified in get_data above whose data are being cleaned
  • id_vars: As in labrun.manage_dups, a list of the variables that should identify a unique reading
  • value_vars: A list of the variables that contain the values of lab measurements taken

labrun.long_to_wide(df, id_vars)

Description: Performs sequential pandas unstack procedures to convert a long dataframe to a wide dataframe

Arguments:

  • df: The input dataframe that is to be unstacked
  • id_vars: As in labrun.manage_dups, a list of the variables that should identify a unique reading

Output:

  • A wide pandas dataframe

labrun.count_multichars(string)

Description: Counts the characters in a string that appear more than once and outputs a string of these characters

Arguments:

  • string: The input string whose characters are to be counted

Output:

  • A string of the characters that appear more than once in the input string

labrun.clean_wide_table(dfwide, value_vars, start_dt, end_dt, add_time_el)

Description: Cleans the wide table of lab data results

Arguments:

  • dfwide: A wide pandas dataframe of the cleaned lab data
  • value_vars: A list of variables that refer to lab measurements
  • start_dt: A datetime variable for the minimum desired date
  • end_dt: A datetime variable for the maximum desired date
  • add_time_el: Logical, whether or not to add the days since the reactors were seeded to the wide tables

Output:

  • A clean, wide pandas dataframe

labrun.summarize_tables(end_dt_str, ndays, add_time_el = True, outdir = None, opfile_suff = None)

Description: Combines and filters a set of wide tables and outputs the resulting table as a csv file

Arguments:

  • end_dt_str: A string for the last date for which data should be included in wide tables, format should be 'mm-dd-yy' as in '1-1-18' for Jan 1, 2018
  • ndays: Integer, the number of days to look back relative to end_dt_str
  • add_time_el: Logical, whether to include a variable in the table indicating the time elapsed since seeding the reactor
  • outdir: A string indicating the output directory of the csv file
  • opfile_suff: An optional string that will be appended to the csv output file

Output:

  • A csv file with clean and combined wide tables

labrun.process_data()

Description: The main caller that executes all methods to read data from Google Sheets, clean and reformat it. Performs necessary computations on laboratory results, converts all data to a long format, and outputs the result to the cr2c_labdata.db data store.

cr2c-opdata

get_data(stypes, sids, tperiods, ttypes, combine_all = True, year_sub = None, month_sub = None, start_dt_str = None, end_dt_str = None, output_csv = False, outdir = None)

Description: Wrapper for querying aggregated data from the cr2c_opdata.db data store. Note that for the data to be available, opdata_agg.run_agg has to have already been run for the data point in question

Arguments:

  • stypes: A list of sensor types corresponding to each sensor id given in sid below (in the same order!). These types can be any of:
    • "WATER": Sensors that measure flows of water through the treatment plant
    • "GAS": Sensors that measure flows of gas from the plant's reactors
    • "TEMP": Sensors that measure temperature of water at various points in the treatment process
    • "TMP": Sensors that measure the trans-membrane pressure in the plant's membrane bioreactors
    • "PRESSURE": Sensors the measure pressure in the reactors
    • "PH": Sensors that measure pH at various points in the treatment process
    • "DPI": Sensors that measure differential pressure in the reactors' recirculation loops
    • "LEVEL": Sensors that measure water levels at various points in the treatment process
  • sids: A list of sensor ids of length equal to stypes whose aggregated data we want
  • tperiods: A list of integers of length equal to stypes giving the lengths of the time periods for which we are obtaining aggregated data.
  • ttypes: A list of time period type strings of length equal to stypes giving the time period "type" corresponding to the time period length for which we are obtaining aggregated data. These strings can be "MINUTE" or "HOUR"
  • combine_all: (Optional) Logical indicating whether the list of tables being queried should be output as a single wide dataframe or not.
  • year_sub: (Optional) Integer indicating which year we want aggregated data for
  • month_sub: (Optional) Integer indicating which month we want data for
  • start_dt_str: (Optional) String of format 'mm-dd-yy' giving the earliest date for which we want data
  • end_dt_str: (Optional) String of format 'mm-dd-yy' giving the latest date for which we want data
  • output_csv: (Optional) Logical, if true will output aggregated data to csv file(s) (depending on whether combine_all is True or False)
  • outdir: (Optional, required if output_csv is True) String giving the directory to output csv file(s) to

Output:

  • A single dataframe or dictionary of dataframes with keys equal to the table names in the SQL file. If a single dataframe is output, all dataframes will be merged on time and the value of each sensor id will be a variable with name "sid". If a dictionary of tables, the table name will be "stype""sid""tperiod"_"ttype"_AVERAGES, so for example, WATER_FT200_6_HOUR_AVERAGES for sid "FT200", which measures flows of "WATER" averaged over 6 hour periods)

Example Caller: The following example gets 1-hour averages for sensors FT200, FT305 and FT700 for November 2017 (can be specified through the "year_sub" and "month_sub" arguments or with the "start_dt_str" and "end_dt_str" arguments) get_data( stypes = ['WATER','WATER','GAS'], sids = ['FT200','FT305','FT700'], tperiods = [1, 1, 1], ttypes = ['HOUR','HOUR','HOUR'], year_sub = 2017 month_sub = 11, start_dt_str = '1/11/17', end_dt_str = '30/11/17', output_csv = True, outdir = 'path/to/my/dir' )

get_table_names()

Description: Get a list of the table names in the cr2c_opdata.db data store Output: A list of table names

cat_dfs(ip_paths, idx_var = None, output_csv = False, outdir = None, output_dsn = None)

Description: Concatenates a list of csv files in a directory to a single dataframe Arguments:

  • ip_paths: A list of strings giving the paths to each csv file
  • idx_var: (Optional), String referring to the index variable common to all of the data frames. If given, the stacked data will be sorted by this variable
  • output_csv: (Optional), Logical, if true will output a csv of the stacked dataframes
  • outdir: (Optional, required if output_csv is True), String giving the directory to output stacked dataframes
  • output_dsn: (Optional, required if output_csv is True), String giving the name of the output stacked dataframes csv file

Output:

  • A Pandas stacked dataframe

opdata_agg

Description: A class for managing the cleaning and aggregation of operational (sensor) data Inputs:

  • start_dt_str: String, format 'mm-dd-yy', gives the first date for which aggregated data are desired
  • end_dt_str: String, format 'mm-dd-yy', gives the last date for which aggregated data are desired
  • ip_path: String, path to the csv file with raw sensor data

opdata_agg.prep_opdata(stype, sid)

Description: Reads in a csv of raw sensor data, cleans and re-formats variables, removes missing values. Arguments:

  • stype: The type of sensor whose operational data are being prepped (types can be one of stypes described in get_data above)
  • sid: The id of the sensor whose operatioal data are being prepped (data for each sensor are processed separately) Output:

opdata_agg.get_average(opdata, tperiod, ttype)

Description: Uses linear interpolation to convert a dataframe of sensor readings with arbitrary time stamps into periodic averages Arguments:

  • opdata: A clean and prepped pandas data frame with readings for a given sensor
  • tperiod: An integer giving the length of the time period for which averages are being obtained
  • ttype: A string giving the time period unit, can be either "HOUR" or "MINUTE" Output:
  • A pandas dataframe with periodic interpolated average readings for a sensor

opdata_agg.run_agg(stypes, sids, tperiods, ttypes, output_csv = False, output_sql = True, outdir = None)

Description: Runs a report to obtain aggregated data for a series of stypes, sids, tperiods and ttypes in series. Outputs the result to the cr2c_opdata.db data store, and, if requested, to a series of csv files Arguments:

  • stypes: List of strings giving type of sensor whose operational data are being prepped (types can be one of stypes described in get_data above)
  • sids: List of sensor ids of length equal to stypes whose aggregated data we want
  • tperiods: List of integers of length equal to stypes giving the lengths of the time periods for which we are obtaining aggregated data.
  • ttypes: List of time period type strings of length equal to stypes giving the time period "type" corresponding to the time period length
  • output_csv: (Optional) Logical, if true will output aggregated data to csv file(s) (depending on whether combine_all is True or False)
  • output_sql: (Optional) Logical, if true will output aggregated data to the cr2c_opdata.db data store
  • outdir: (Optional, required if output_csv is True) String giving the directory to output csv file(s) to

Example Caller:

opdata_agg.run_agg( stypes = ['WATER','WATER','GAS'], sids = ['FT200','FT305','FT700'], tperiods = [1, 1, 1], ttypes = ['HOUR','HOUR','HOUR'], output_csv = True, outdir = 'path/to/my/dir' )

cr2c-fielddata

def get_data(varNames = None, start_dt_str = None, end_dt_str = None, output_csv = False, outdir = None)

Description: Wrapper for querying aggregated data from the cr2c_fielddata.db data store. Arguments:

  • varNames: (Optional) A list of strings giving the names of the variables for which we want data. If not specified, all of the response data will be returned. All variable names recorded in the data store are capitalized so the input variable names argument is not case sensitive. Note: For any variable in the form, the variable name in the data store will have all special characters removed, and all spaces replaced with a '_'.
  • start_dt_str: (Optional) String of format 'mm-dd-yy' giving the first date for which we want data
  • end_dt_str: A string of format 'mm-dd-yy' giving the last date for which we want data
  • output_csv: (Optional) Logical, if true will output aggregated data to csv file
  • outdir: (Optional, required if output_csv is True) String giving the directory to output csv file to Output:
  • A pandas dataframe with the form data

Example Caller:

get_data( varNames = ['Operator_Initials','Barometer_Pressure_mmHg'], start_dt_str = '01-01-17', end_dt_str = '12-31-17', output_csv = True, outdir = 'path/to/my/dir' )

clean_varname(varname)

Description: Cleans a variable name in the log sheet form by eliminating all special characters and replacing spaces with a '_' and converting all characters to upper-case. Arguments:

  • varname: A variable name string

Output:

  • A clean variable name string

process_data(tableName = 'DailyLogResponses')

Description: Processes responses from daily log form by reading data in from google sheet and cleaning variable values. Outputs result to the cr2c_fielddata.db data store. Arguments:

  • tableName: (Optional) A string giving the name of the table whose responses are to be processed. This corresponds to the sheetname of the google sheets file where the google form responses are stored and the name of the google form itself

cr2c-validation

cr2c_validation

Description: A class for managing the validation exercises performed on the integrated lab, field and operational sensor data Inputs:

  • outdir: A string giving directory to output the results of validation exercises to
  • ip_path: (Optional, required if any of run_agg_feeding, run_agg_gasprod, run_agg_temp, or run_agg_press are True) A string giving the path to the dataset that contains raw sensor data that will be used to execute an opdata_agg.run_agg method, if desired.
  • run_agg_feeding: (Optional) Logical, if True will execute an opdata_agg.run_agg method on the sensors measuring reactor feeding
  • run_agg_gasprod: (Optional) Logical, if True will execute an opdata_agg.run_agg method on the sensors measuring reactor gas production
  • run_agg_temp: (Optional) Logical, if True will execute an opdata_agg.run_agg method on the sensors measuring temperature
  • run_agg_press: (Optional) Logical, if True will execute an opdata_agg.run_agg method on the sensors measuring reactor pressure

cr2c_validation.adj_Hcp(Hcp_gas, deriv_gas, temp)

Description: Computes an adjusted Henry's constant (in concentration/pressure units) for a given gas and temperature Arguments:

  • Hcp_gas: The Henry's constant of the gas at STP (in concentration/pressure units)
  • deriv_gas: The Clausius-Clapeyron constant for the gas
  • temp: The gas temperature Output:
  • Floating number giving the adjusted Henry's constant of the gas (in concentration/pressure units)

cr2c_validation.est_diss_ch4(temp, percCH4)

Description: Estimates the dissolved methane concentration for a given temperature and gas composition in a reactor (pressure of 1 atm) Arguments:

  • temp: The temperature at which to estimate the dissolve methane concentration
  • percCH4: The percentage of gas in a reactor that is composed of methane Output: A floating number giving the assumed liquid concentration of methane at a pressure of 1 atm

cr2c_validation.get_cod_bal(end_dt_str, nweeks, plot = True, table = True)

Description: Computes a Chemical Oxygen Demand (COD) balance for the treatment plant on a weekly basis using flowrate data, COD concentrations, and biogas production. Outputs plots and csv files

Arguments:

  • end_dt_str: A string of format 'mm-dd-yy' giving the date from which to start counting nweeks weeks back
  • nweeks: The number of weeks to compute the COD balance for
  • plot: (Optional) Logical, if True will output a bar chart of the COD balance
  • table: (Optional) Logical, if True will output csv file with the COD balance data

Example Caller:

cr2c_validation.get_cod_bal( '1-12-17', 12, plot = True, table = True )

cr2c_validation.get_biotech_params(end_dt_str, nWeeks, plot = True, table = True)

Description: Computes key solids wasting and growth parameters to monitor biology in reactors on a weekly basis Arguments:

  • end_dt_str: A string of format 'mm-dd-yy' giving the date from which to start counting nweeks weeks back
  • nweeks: The number of weeks to compute solids wasting and growth parameters for
  • plot: (Optional) Logical, if True will output a plot of solids wasting and growth parameters
  • table: (Optional) Logical, if True will output csv file with the solids wasting and growth parameters

Example Caller:

cr2c_val = cr2c_validation(outdir = 'path/to/out/dir') cr2c_val.get_biotech_params( '1-12-17', 12, plot = True, table = True )

cr2c_validation.instr_val(valtypes, start_dt_str, end_dt_str, op_sids, fld_varnames = None, ltypes = None, lstages = None, run_op_report = False, ip_path = None)

Description: General purpose function for validating measurements from the facility's sensors with laboratory or field measurement data logged and stored in the cr2c_labdata.db and cr2c_fielddata.db data stores.

Arguments:

  • valtypes: A list of strings corresponding to the operational sensor data types outlined in Operational Data: get_data
  • op_sids: List of strings the same length as val_types giving the ids of the sensors that are being validated
  • start_dt_str: String of format 'mm-dd-yy' giving the first date for which we want to validate sensors
  • end_dt_str: String of format 'mm-dd-yy' giving the last date for which we want to validate sensors
  • fld_varnames: (Optional, refers to field measurements that correspond to the op_sids being validated) List of tuples of strings of length equal to val_types indicating appropriate field measurement(s). For differential pressure sensors, this is typically a list of tuples of two strings: one for the upstream manometer measurement and one for the down stream manometer measurement (both are needed to validate a sensor's differential pressure reading)
  • ltypes: (Optional, refers to lab measurements that correspond to the op_sids being validated) List of strings of length equal to val_types giving laboratory data types being used for validation
  • lstages: (Optional, refers to lab measurements that correspond to the op_sids being validated) List of strings of length equal to val_types giving treatment stages from which samples were collected and laboratory measurements have been saved. These stages are long/descriptive stage names. A list of current long/descriptive stage names that could be selected for validation are
    • 'Microscreen'
    • 'Duty AFMBR MLSS'
    • 'Research AFMBR MLSS'
  • run_op_report: (Optional) Logical, if True will execute an opdata_agg.run_agg method on the sensor(s) being validated
  • ip_path: (Optional, required if run_op_report is True) A string giving the path to the dataset that contains raw sensor data that will be used to execute an opdata_agg.run_agg method, if desired.

Example Caller:

cr2c_val = cr2c_validation(outdir = 'path/to/out/dir') cr2c_val.instr_val( valtypes = ['PH','PH'], start_dt_str = '1-1-17', end_dt_str = '12-31-17', hmi_sids = ['AT203','AT305'], ltypes = ['PH','PH'], lstages = ['Microscreen','AFBR'] )

main

Description: This is the interactive data visualization and download app that queries operational data generated at the Bill and Cloy Codiga Resource Recovery Center (CR2C). Its dependencies are cr2c-utils, cr2c-labdata, cr2c-opdata,cr2c-fielddata, and cr2c-validation.The data visualized in the app include lab data, field data, sensor data and validation data.

dclass_tab(dclass)

Description: Callback function that displays dtype sub-tab when dclass tab is selected

Arguments:

  • dclass: A string, first layer of the data layout, including Lab Data, Operation Data, ...

Output:

  • cr2c_objects[dclass]['tab']: Dash Core Components Tabs object with values equal to the dtypes in the given dclass

generate_dclass_dtype_tab(dclass, dtype)

Description: Generats a callback function that displays vtype sub-tab when dtype tab is selected

Arguments:

  • dclass: A string, first layer of the data layout, including Lab Data, Operation Data, ...
  • dtype: A string, second layer of the data layout, including COD, pH, WATER, GAS, ...

Output:

  • dclass_dtype_tab: Dash Core Components Tabs object with values equal to the vtypes in the given dclass-dtype

generate_dclass_dtype_vtype_selection(dclass, dtype, vtype)

Description: Generates a callback function that displays vtype selection object when dclass-dtype-vtype tab is selected

Arguments:

  • dclass: A string, first layer of the data layout, including Lab Data, Operation Data, ...
  • dtype: A string, second layer of the data layout, including COD, pH, WATER, GAS, ...
  • vtype: A string, third layer of the data layout, including "Stage" or/and "Type", Value, ...

Output:

  • dclass_dtype_vtype_selection: Dash Core Components Checklist object with options equal to the possible values of the given vtype of the given dclass-dtype

generate_update_selection_history(selectionID, selection)

Description: Generates a callback function that prints selectionID and selection as a key-value pair stored as a string

Arguments:

  • selectionID: The ID of the selection history of the user (corresponds to the given dclass-dtype-vtype combination)
  • selection: A python list with the unique vtype values in the given dclass-dtype

Output:

  • update_selection_history: A string representation of the selectionID, selection key-value pair

generate_load_selection_value(selectionID, jhistory)

Description: Generates a callback function that loads a string representation of a key-value pair to a dictionary

Arguments:

  • selectionID: The ID of the selection history of the user (corresponds to the given dclass-dtype-vtype combination)
  • jhistory: A string representation of a python list with the unique vtype values selected from the given dclass-dtype

Output:

  • load_selection_value: A dictionary entry of the selectionID, jhistory key-value pair

load_data_selection(sel1, sel2, sel3,...)

Description: Returns a string representation of a nested dictionary with the data requested by the user.

Arguments:

  • sel1, sel2, sel3, ...: 25 possible selections of data generated by Codiga Center

Output:

  • json.dumps(dataSelected): A string representation of the nested dictionary corresponding to the data requested by the user.

render_plot(dataSelected, time_resolution, time_order, start_date, end_date)

Description: Callback function that generates a plotly plot with the data point selected by the user/

Arguments:

  • dataSelected: A nested dictionary of selected data from the user
  • time_resolution: A string for the desired temporal resolution of the desired data from the user, including "Hourly", "Daily", "Weekly", and "Monthly"
  • time_order: A string for the desired temporal order of the desired data from the user, including "Chronological", "By Hour", "By Weekday", and "By Month"
  • start_date: String of format 'mm-dd-yy' representing the first date for which data are desired, or empty string if not-applicable (no first date)
  • end_date: String of format 'mm-dd-yy' representing the last date for which data are desired, or empty string if not-applicable (no last date)

Output: A dictionary with 'data' and 'layout' entries for the 'figure' attribute of a Dash Core Components Graph Object

get_nseries(dataSelected)

Description: Count the number of dtype variables in the selected data from the user

Arguments:

  • dataSelected: A nested dictionary of selected data from the user

Output:

  • nseries: The number of dtype variables in the selected data from the user

retrieve_value(dictionary, key)

Description: Given a key, retrieve the value if the entry is in the dictionary (avoids error if the entry is not in the dictionary).

Arguments:

  • dictionary: A dictionary including different keys
  • key: A string object of a targeted key

Output:

  • dictionary[key]: The value of the targeted key in the dictionary

pad_na(df, time_var)

Description: Given a Pandas dataframe generated from the raw data, add missing values to every day without no observations. The purpose of this function is to ensure that plotly displays days without data as "gaps" between the lines in a series (otherwise a straight line will connect two non-adjacent points)

Arguments:

  • df: Pandas dataframe to be padded
  • time_var: A string, the name of time variables of the selected data from the user

Output:

  • padded_df: Pandas dataframe updated with empty entries for all days without data between the first and last days with non-missing data.

get_series(dclass, dtype, time_resolution, time_order, start_date, end_date, stages, types, sids,plotFormat)

Description: Given the input of dclass, dtype, desired time resolution, desired time order, start and end dates of desired time period, stages, types, sensor ids and the plot format, return a list of corresponding plotly go.Scatter objects.

Arguments:

  • dclass: A string, first layer of the data layout, including Lab Data, Operation Data, ...
  • dtype: A string, second layer of the data layout, including COD, pH, WATER, GAS, ...
  • time_resolution: A string for the desired temporal resolution of the desired data from the user, including "Hourly", "Daily", "Weekly", and "Monthly"
  • time_order: A string for the desired temporal order of the desired data from the user, including "Chronological", "By Hour", "By Weekday", and "By Month"
  • start_date: String of format 'mm-dd-yy' representing the first date for which data are desired, or empty string if not-applicable (no first date)
  • end_date: String of format 'mm-dd-yy' representing the last date for which data are desired, or empty string if not-applicable (no last date)
  • stages: A string of stages for variables
  • types: A string of types for variables
  • sids: A list of sensor ids of length equal to stypes whose aggregated data we want
  • plotFormat: A dictionary of plot formats

Output:

  • series: Return a list of plot formats

filter_resolve_time(dfsub, dtype, time_resolution, time_order, start_date, end_date)

Description: Given a Pandas dataframe, dtype variables, time resolution, time order, and the start and end dates of desired time period, return a list of data dictionaries which are added with missing data and resolved by the desired time resolutuon and time order

Arguments:

  • dfsub: Pandas dataframe of user's desired data
  • dtype: A string, second layer of the data layout, including COD, pH, ...
  • time_resolution: A string for the desired temporal resolution of the desired data from the user, including "Hourly", "Daily", "Weekly", and "Monthly"
  • time_order: A string for the desired temporal order of the desired data from the user, including "Chronological", "By Hour", "By Weekday", and "By Month"
  • start_date: String of format 'mm-dd-yy' representing the first date for which data are desired, or empty string if not-applicable (no first date)
  • end_date: String of format 'mm-dd-yy' representing the last date for which data are desired, or empty string if not-applicable (no last date)

Output:

  • dflist: a list of dictionaries with entries of 'data' as pandas data frames and 'timeSuffix' as strings representing the time period to be displayed in the plot legend

get_layout(dataSelected, axes_dict, time_resolution, time_order)

Description: Given the nested dictionary of desired data, dictionary of axes, desired time resolution and time order, return a graph layout object

Arguments:

  • dataSelected: A nested dictionary of selected data from the user
  • axes_dict: A dictionary of dtype keys and values corresponding to the order of the vertical axis being plotted
  • time_resolution: A string for the desired temporal resolution of the desired data from the user, including "Hourly", "Daily", "Weekly", and "Monthly"
  • time_order: A string for the desired temporal order of the desired data from the user, including "Chronological", "By Hour", "By Weekday", and "By Month"

Output:

  • go.Layout(layoutItems): A graph objects layout object

About

Scripts for monitoring operational data at the Codiga Center

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •