# Borneo Probabilistic Risk Assessment

This notebook performs a probabilistic risk assessment of chemicals found in oil palm plantations in Borneo.

## Setup

Load the required libraries and set some useful variables.

In [5]:
import numpy as np

compartments = ['water', 'soil']
units = {'water': 'μg/L', 'soil': 'mg/kg'}

## Parse the data

We call the `parse_data` function from the [data-parsing.ipynb](./data-parsing.ipynb) notebook to load, clean and filter the data, ready for use in the risk assessment.

In [6]:
import pandas as pd
import numpy as np

# Run the data parsing notebook, which will import the parse_data function
%run ./data-parsing.ipynb

# Path to the files
mec_filepath = '../data/Oil_palm_chemicals_literature.xlsx'
tox_filepath = '../data/Tox data_HCvalues_March.xlsx'

# Parse the data
df_mec, df_tox = parse_data(mec_filepath, tox_filepath)

## Grouping by chemical and species

We now group the data by chemical and species to get lists of chemicals and species to perform the assessment for.

In [7]:
mec_chem_names = {}
tox_chem_names = {}
species_names = {}

# Get the list of chemical and species names for soils and waters
for c in compartments:
    # Get the list of chemical names with more than one row sorted by amount of data
    mec_gb = df_mec[c].groupby('active_ingredient') \
                      .size() \
                      .sort_values(ascending=False)
    # Filter to include only chemicals that have more than one row
    mec_gb = mec_gb[mec_gb > 1]
    # Get the names as an array
    mec_chem_names[c] = mec_gb.keys().values
    
    # Do the same for the tox data
    tox_gb = df_tox[c].groupby('Chemical') \
                      .agg(size=('LC50', 'count'),
                           nunique=('LC50', 'nunique')) \
                      .sort_values(ascending=False, by='size')
    # Filter to include only chemicals that have more than one row
    tox_gb = tox_gb[(tox_gb['size'] > 1) & (tox_gb['nunique'] > 1)]
    # # Get the names as an array
    tox_chem_names[c] = tox_gb.index.values
    
    # Now for the species
    
    
# Now get the common names across the MEC and tox data. This should retain the
# order from the MEC data
common_chem_names = {c: [name for name in mec_chem_names[c] if name in tox_chem_names[c]]
                     for c in compartments}

# Now for other chemical, get the list of species that we have tox data for
for c in compartments:
    for chem in common_chem_names[c]:
        df_tox_chem = df_tox[c][df_tox[c]['Chemical'] == chem]
        df_species_gb = df_tox_chem.groupby('Species')['Species'].count()
        print(df_species_gb)

Species
Adelotus brevis                 1
Aplexa hypnorum                 1
Brachionus calyciflorus         1
Bufo marinus                    1
Caenorhabditis elegans          1
Carassius auratus               1
Catostomus latipinnis           1
Ceriodaphnia dubia              1
Ceriodaphnia reticulata         1
Chironomus sp.                  1
Chironomus zealandicus          1
Coregonus hoyi                  1
Daphnia magna                   1
Daphnia pulex                   1
Esox masquinongy                1
Fundulus heteroclitus           1
Gambusia affinis                1
Gammarus pseudolimnaeus         1
Hybognathus amarus              1
Ictalurus punctatus             1
Jordanella floridae             1
Lepomis cyanellus               1
Lepomis macrochirus             1
Limnodynastes peroni            1
Morone saxatilis                1
Notemigonus crysoleucas         1
Oncorhynchus kisutch            1
Oncorhynchus mykiss             1
Oncorhynchus tshawytscha        1
Philod