# MiBiPreT example: Contaminant Data Screening with Amersfoort data

Diagnostic plots for data analysis on microbial biodegredation at the contaminant Amersfoort site. Author: Alraune Zech

Data based on the PhD thesis of *Johan van Leeuwen*, 2021 'Biodegredation of mono- and polyaromatic hydrocarbons in a contaminated aquifer originating from a former Pintsch gas factory site' which is equivalent to the manuscript of van Leeuwen et al., 2022 'Anaerobic degradation of benzene and other aromatic hydrocarbons in a tar-derived plume: Nitrate versus iron reducing conditions', J. of Cont. Hydrol. The data was provided by Johan van Leeuwen.
  
## Background: Amersfoort contaminant site

Close to the train station in Amersfoort, the Netherlands, the subsurface is contaminated with organic hydrocarbons forming a NAPL. Contamination originates from decades of operating manufactured gas plant, which dumped tar by-products in waste lagoons. The tar is a DNAPL and has spread into the underlying shallow unconfined aquifer. Sample wells were installed to measure various characteristics of the subsurface. The raw data contains measurements on
* environmental conditions, such as pH, redox potential, concentrations of oxygen, nitrate, etc
* contaminant concentration such as BTEX, indene, indane, naphtalene and multiple other (typically cyclic) petroleum hydrocarbons
* metabolite concentration, i.e. byproducts of degredation processes of contaminant
* isotope measurments for specific contaminants and samples
* counts of genes (RNA/DNA) of mibrobiota that is know to perform biodegredation as well as functional enzymes know to be responsible for biodegredation

**Required packages**

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

In [None]:
import mibiscreen as mbs

## Data loading

In [None]:
file_path = './amersfoort.xlsx'

Load and standardize data of contaminants:

In [None]:
contaminants_raw,units = mbs.load_excel(file_path,
                                        sheet_name = 'contaminants',
                                        verbose = False)
contaminants,units = mbs.standardize(contaminants_raw,
                                      reduce = False,
                                      verbose = False)

Load and standardize data of metabolites

## Basic analysis of contaminant concentrations per sample

### Concentrations of contaminants subsets

**Total concentration of all contaminants provided in Excel-sheet/Dataframe:**

In [None]:
contaminants_concentration = mbs.total_concentration(contaminants,
                                                     name_list = 'all',
                                                     #include_as = "concentration_contaminants",
                                                     verbose = True)

equivalent to: 

(now including results as own data column to data frame of *contaminants*)

In [None]:
mbs.total_contaminant_concentration(contaminants,
                                    include = True,
                                    verbose = True,
                                    )
#display(contaminants)

**Total concentration of contaminant subgroup: BTEX (Benzene, Toluene, Ethylbenzene, Xylene)**

In [None]:
contaminants_BTEX = mbs.total_concentration(contaminants,
                                            name_list = 'BTEX',
                                            #include_as = "concentration_BTEX",
                                            verbose = True,
                                            )

equivalent to:(now including results as own data column to data frame of *contaminants*)

In [None]:
mbs.total_contaminant_concentration(contaminants,
                                    contaminant_group = 'BTEX',
                                    include = True,
                                    verbose = False)

**Total concentration of contaminant subgroup: BTEX + Indene,Indane and Naphthalene**

In [None]:
contaminants_BTEXIIN = mbs.total_concentration(contaminants,
                                               name_list = 'BTEXIIN',
                                               #include_as = "concentration_BTEXIIN",
                                               verbose = True,
                                               )

Using wrapper functions for specific type of data, the previous commands are equivalent to: 

(now including results as own data column with standard name to data frame of *contaminants*)

In [None]:
mbs.total_contaminant_concentration(contaminants,
                                    contaminant_group = 'BTEXIIN',
                                    include = True,
                                    verbose = False)

**Summed concentration of selected subgroup of contaminants: benzene & toluene**

In [None]:
contaminants_BT = mbs.total_concentration(contaminants,
                                          name_list = ['benzene','toluene'],
                                          include_as = "concentration_BT",
                                          verbose = True,
                                          )

### Visualization of contaminant concentrations per sample

**Using mibiscreen plotting routine for concentration bar plots**

In [None]:
list_contaminants = ['concentration_contaminants','concentration_BTEXIIN',
                     'concentration_BTEX','concentration_BT','benzene']

mbs.contaminants_bar(contaminants,
                 list_contaminants,
                 list_labels = ['all','BTEXIIN','BTEX','BT','B'],
                 sort = True,
                 figsize = [5.2,3],
                 textsize = 12,
                 save_fig = 'contaminants_bar.png',
                 yscale = 'log',
                 loc='upper left',
                 title_text = False,
                 )

Producing the plot individually using `matplotlib` and `numpy` for individual adaptions:

In [None]:
sort_args = np.argsort(contaminants_concentration.values)
#sort_args = np.arange(len(contaminants_concentration.values))
plt.bar(np.arange(len(contaminants_concentration.values)),contaminants_concentration.values[sort_args],label='all')
plt.bar(np.arange(len(contaminants_BTEXIIN.values)),contaminants_BTEXIIN.values[sort_args],label='BTEXIIN')
plt.bar(np.arange(len(contaminants_BTEX.values)),contaminants_BTEX.values[sort_args],label='BTEX')
plt.bar(np.arange(len(contaminants_BT.values)),contaminants_BT.values[sort_args],label='BT')
plt.bar(np.arange(len(contaminants['toluene'].values)),contaminants['toluene'].values[sort_args],label='T')
plt.xlabel('Samples')
plt.ylabel(r'Total concentration [$\mu$g/l]')
plt.yscale('log')
plt.legend()
plt.title('Total concentration of contaminants per sample')

### Analysis of number of contaminants per sample

In [None]:
count_contaminants = mbs.total_count(contaminants,
                                     name_list = 'all_cont',
                                     verbose = True)

count_BTEXIIN = mbs.total_count(contaminants,
                                name_list = 'BTEXIIN',
                                verbose = True)

count_BTEX = mbs.total_count(contaminants,
                             name_list = 'BTEX',
                             verbose = True)

mbs.total_count(contaminants,
                name_list = ['benzene'],
                include_as = "count_benzene",
                verbose = True)

Using wrapper functions for specific type of data, the previous commands are equivalent to: 

(now including results with standard names as own data column to data frame of *contaminants*)

In [None]:
mbs.total_contaminant_count(contaminants,
                contaminant_group = 'all',
                include = True)

mbs.total_contaminant_count(contaminants,
                contaminant_group = 'BTEXIIN',
                include = True)

mbs.total_contaminant_count(contaminants,
                contaminant_group = 'BTEX',
                include = True)

### Visualization of number of contaminants per sample

In [None]:
list_counts = ['count_contaminants','count_BTEXIIN','count_BTEX','count_benzene']

mbs.contaminants_bar(contaminants,
                     list_counts,
                     list_labels = ['all','BTEXIIN','BTEX','B'],
                     sort = True,
                     figsize = [5.2,3],
                     textsize = 12,
                     ylabel = 'Total count',
                     yscale = 'linear',
                     save_fig = 'count_bar.png',
                     loc='upper left',
                     title_text = False,
                     )

Producing the plot individually using `matplotlib` and `numpy` for individual adaptions:

In [None]:
plt.figure(num=2)
plt.bar(np.arange(len(count_contaminants.values)),np.sort(count_contaminants.values),label='all')
plt.bar(np.arange(len(count_BTEXIIN.values)),np.sort(count_BTEXIIN.values),label='BTEXIIN')
plt.bar(np.arange(len(count_BTEX.values)),np.sort(count_BTEX.values),label='BTEX')
plt.xlabel('Samples')
plt.ylabel('Total number')
plt.title('Total number of contaminants per sample')
plt.legend()

## Evaluation of intervention threshold exceedance

In [None]:
data_thresh_ratio = mbs.thresholds_for_intervention_ratio(contaminants)
display(data_thresh_ratio)

In [None]:
fig,ax = mbs.threshold_ratio_bar(data_thresh_ratio,
                                 list_samples =  [31,9,11],
                                 figsize = [12,3],
                                 nrows=1,ncols=3,
                                 list_colors = ['olive','lightblue','tomato'],
                                 sharey = True,
                                 grid = True,
                                )


In [None]:
quantities = ['toluene','naphthalene','indene','pm_xylene','ethylbenzene','o_xylene','benzene']
mbs.threshold_ratio_bar(data_thresh_ratio,
                        list_samples = [9],
                        list_labels =  quantities,
                        figsize = [6,3],
                        unity_line = True,
                        title_text= 'Evaluation of threshold exceedance for BTEXIIN',
                        )