# MiBiPreT example: Data handling using Griftpark data

## Background: Griftpark contaminant site

Text to be added.

In [1]:
import mibiscreen as mbs

### Data handling

**Load in data:**

Returns loaded data as DataFrame which is reported when verbose flag is `True`. 

In [2]:
file_path = './grift_BTEXIIN.csv'
data_raw,units = mbs.load_csv(file_path,verbose=True)
#print(type(data_raw))

 Running function 'load_csv()' on data file  ./grift_BTEXIIN.csv
Units of quantities:
-------------------
  sample nr obs_well well type Depth  pH     EC Redox  pE  NPOC sulfate  ...  \
0       NaN      NaN       NaN     m NaN  uS/cm    mV NaN  mg/L    mg/L  ...   

  Propylbenzene M-Ethyltoluene O-Ethyltoluene 1,2,4-Trimethylbenzene  \
0          ug/L           ug/L           ug/L                   ug/L   

  1,2,3-Trimethylbenzene Indane Indene Naphthalene metabolites_variety  \
0                   ug/L   ug/L   ug/L        ug/L                 NaN   

  metabolites_concentration  
0                      ug/L  

[1 rows x 31 columns]
________________________________________________________________
Loaded data as pandas DataFrame:
--------------------------------
       sample nr           obs_well  well type      Depth    pH     EC Redox  \
0            NaN                NaN        NaN          m   NaN  uS/cm    mV   
1   2019-031-001        A-32mm-52,5  dsn 32 mm      -52.5  7.31  

**Check on column names:**

In [5]:
column_names_known,column_names_unknown,column_names_standard = mbs.check_columns(data_raw, verbose = True)

 Running function 'check_columns()' on data
29 quantities identified in provided data.
List of names with standard names:
----------------------------------
sample nr  -->  sample_nr
obs_well  -->  obs_well
well type  -->  well_type
Depth  -->  depth
pH  -->  pH
EC  -->  EC
Redox  -->  redoxpot
pE  -->  pE
sulfate  -->  sulfate
ammonium  -->  ammonium
sulfide  -->  sulfide
methane  -->  methane
Fe II  -->  iron2
Mn  -->  manganese
Benzene  -->  benzene
Toluene  -->  toluene
Ethylbenzene  -->  ethylbenzene
P/M Xylene  -->  pm_xylene
O Xylene  -->  o_xylene
Cumene  -->  isopropylbenzene
Propylbenzene  -->  n_propylbenzene
M-Ethyltoluene  -->  3_ethyltoluene
O-Ethyltoluene  -->  2_ethyltoluene
1,2,4-Trimethylbenzene  -->  124_trimethylbenzene
1,2,3-Trimethylbenzene  -->  123_trimethylbenzene
Indane  -->  indane
Indene  -->  indene
Naphthalene  -->  naphthalene
metabolites_variety  -->  metabolites_variety
----------------------------------

Renaming can be done by setting keyword 'standar

**Check on units:**

In [6]:
check_list = mbs.check_units(data_raw, verbose = True)

 Running function 'check_units()' on data
________________________________________________________________
 All identified quantities given in requested units.


**Check on values in columns (transformation to numerical values, handling of nan-values):**

Returns DataFrame without units containing quantities in numerical type.

In [7]:
data_pure = mbs.check_values(data_raw, verbose = True)

 Running function 'check_values()' on data
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
Quantities with values transformed to numerical (int/float):
-----------------------------------------------------------
pH
EC
Redox
pE
NPOC
sulfate
ammonium
sulfide
methane
Fe II
Mn
Benzene
Toluene
Ethylbenzene
P/M Xylene
O Xylene
Cumene
Propylbenzene
M-Ethyltoluene
O-Ethyltoluene
1,2,4-Trimethylbenzene
1,2,3-Trimethylbenzene
Indane
Indene
Naphthalene
metabolites_variety
metabolites_concentration


**Standardization of input data:**

Runs all checks on data, i.e. column names, units and values in one go and returns transformed data with standard column names and valueas in numerical type where possible. Data is not reduced to those columns containing known quantities, i.e. also columns remain with unknown column names. Unknown quantities can be contaminants/metabolites not yet included in analysis scheme or quantities with typo in the name. 

In [8]:
data,units = mbs.standardize(data_raw,reduce = False,  verbose=True)
print("Number of columns:", data.shape[1])

 Running function 'standardize()' on data
 Function performing check of data including:
  * check of column names and standardizing them.
  * check of units and outlining which to adapt.
  * check of values, replacing empty values by nan 
    and making them numeric
 Running function 'check_columns()' on data
29 quantities identified in provided data.
List of names with standard names:
----------------------------------
sample nr  -->  sample_nr
obs_well  -->  obs_well
well type  -->  well_type
Depth  -->  depth
pH  -->  pH
EC  -->  EC
Redox  -->  redoxpot
pE  -->  pE
sulfate  -->  sulfate
ammonium  -->  ammonium
sulfide  -->  sulfide
methane  -->  methane
Fe II  -->  iron2
Mn  -->  manganese
Benzene  -->  benzene
Toluene  -->  toluene
Ethylbenzene  -->  ethylbenzene
P/M Xylene  -->  pm_xylene
O Xylene  -->  o_xylene
Cumene  -->  isopropylbenzene
Propylbenzene  -->  n_propylbenzene
M-Ethyltoluene  -->  3_ethyltoluene
O-Ethyltoluene  -->  2_ethyltoluene
1,2,4-Trimethylbenzene  -->  124_

**Standardization of input data:**

Rerun data standardization without reporting. Now we reduce the data to known/identified quantities and write standard data frame to file
Provides option to save the standardized data to a csv-file.

In [9]:
file_standard = './grift_BTEXIIN_standard.csv'
data,units = mbs.standardize(data_raw,reduce = True, store_csv=file_standard, verbose=False)
print("Number of columns:", data.shape[1])

________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
Number of columns: 29
