# QuinCe Gas Calibration Check
This notebook takes in a file exported from QuinCe and calculates various statistics on the gas calibration calculations that have been performed.

The notebook requires a file exported in **ICOS OTC** format and compares the column `CO2 Mole Fraction [umol mol-1]` with `xCO2 In Water - Calibrated In Dry Air [mol mol-1]`.

## Getting Started
The cell below sets up all the package imports and performs any other preparation tasks we need. Just run it. 

In [None]:
# Package imports
import os
import pandas as pd
import numpy as np
import statistics
from ipywidgets import interact
import ipywidgets as widgets

# Bokeh Plots
from bokeh.io import output_notebook, push_notebook, show
from bokeh.plotting import figure
output_notebook()

# Hide warnings
import warnings
warnings.filterwarnings('ignore')

# Create the data file directory if it doesn't exist
FILE_DIR='data_files'

if not os.path.exists(FILE_DIR):
    os.mkdir(FILE_DIR)


## Choose a file to check
Any file you want to check must be uploaded to the server before we can use it.

The cell below lets you choose a file from the `Validation/data_files` folder. If you don't see your file there, you can upload it using the main Jupyter hub and run the cell again.

In [None]:
available_files = [f for f in os.listdir(FILE_DIR) if f.endswith('ICOS OTC.csv')]
available_files.sort()
chosen_file = None

def load_file(filename):
    global chosen_file
    chosen_file = filename

dummy = interact(load_file, filename=available_files)


## Load and prepare data

In [None]:
in_data = pd.read_csv(os.path.join(FILE_DIR, chosen_file))
data = in_data[['Date/Time', 'CO2 Mole Fraction [umol mol-1]', 'xCO2 In Water - Calibrated In Dry Air [umol mol-1]']]
data.rename(columns = {'Date/Time':'Timestamp'}, inplace = True)
data.rename(columns = {'CO2 Mole Fraction [umol mol-1]':'Measured'}, inplace = True)
data.rename(columns = {'xCO2 In Water - Calibrated In Dry Air [umol mol-1]':'Calibrated'}, inplace = True)
data['Timestamp'] = data['Timestamp'].apply(pd.to_datetime)
data = data[pd.to_numeric(data['Calibrated'], errors='coerce').notnull()]


## Time Series
The cell below builds a simple plot of the time series, showing the measured vs calibrated CO₂ for water measurements

In [None]:
timeseries = figure(plot_width=900, plot_height=600, x_axis_type='datetime', x_axis_label='Time', y_axis_label='CO₂')
timeseries.circle(data['Timestamp'], data['Measured'], color='black', size=5, legend_label='Measured')
timeseries.circle(data['Timestamp'], data['Calibrated'], color='blue', size=5, legend_label='Calibrated')
show(timeseries)

## Measured vs Calibrated
This cell draws a scatter plot of the measured vs calibrated values. This should be very close to a linear relationship.

In [None]:
vs_plot = figure(plot_width=600, plot_height=600, x_axis_label='Measured', y_axis_label='Calibrated')
vs_plot.circle(data['Measured'], data['Calibrated'], size=5)
show(vs_plot)

## Differences
Below are statistics on the differences between the measured and calibrated CO₂ values.

In [None]:
data['Difference'] = data['Calibrated'] - data['Measured']

# Time series of differences
diff_timeseries = figure(plot_width=900, plot_height=600, x_axis_type='datetime', x_axis_label='Time', y_axis_label='Calibrated - Measured')
diff_timeseries.circle(data['Timestamp'], data['Difference'], color='black', size=5, legend_label='Calibrated - Measured')
show(diff_timeseries)

In [None]:
hist, edges = np.histogram(data['Difference'], density=True, bins=100)

print(f'Difference range {min(data["Difference"])} to {max(data["Difference"])}')
print(f'Mean difference {statistics.mean(data["Difference"])}')
print(f'Median difference {statistics.median(data["Difference"])}')

p = figure(plot_width=900, plot_height=600, x_axis_label='Difference', y_axis_label='Proprotion')
p.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:], fill_color="navy", line_color="white", alpha=0.5)
show(p)