### 1. Background

In evaluating different types of vials in another Python script (*cation_processing_script*), the assumption is made that the data is normally distributed. However, no real statistical tests are done there because there are only two measurements per variable, so the statistical power would be very low. To circumvent this problem, we can check if the data generated by the ion chromatograph is normally distributed by looking at some historical data when the LOD/LOQ was calculated. In this experiment, five measurements were taken per variable. This should allow for a Shapiro-Wilk test to check for normality. This will be conducted below.

The LOD/LOQ was determined for anions, but we can extrapolate these results to cations as well. In the future, we should repeat the LOD/LOQ experiment for cations as well and check if these too are normally distributed.

Concentration that was used in this experiment was 0.1 mg/L (standard).

### 2. Import the data from an open-access repository

To ensure open access and readability of the data, the dataset that is used below is saved to an open access repository on Zenodo. The lines of code below enable downloading this data from the repository.

In [1]:
! pip install wget

# if wget is not installed in your environment, the command above installs it in the correct place. The '!' sign tells Jupyter to run this command in the command prompt

import wget # to download from zenodo

# if wget is not found (error) then install wget from the 'powershell prompt' within Anaconda environment by typing
# 'pip install wget' in the command window, then restart the kernel of the Jupyter notebook, then it should work.

# file name and zenodo url
file_name  = "LOD_LOQ_anions.csv"
zenodo_url = "https://zenodo.org/record/5901444#.Ye_PLd8o9hE" # 5901444 are the last digits of the specific version of dataset DOI

In [2]:
# download
wget.download(zenodo_url + file_name, "./" + file_name) # input, output

'./LOD_LOQ_anions (1).csv'

### 3. Importing the relevant packages and reading csv file

In [3]:
# data should be saved as a UTF-8 compatible csv file

import pandas as pd
import numpy as np
anion_raw = pd.read_csv("./" + file_name, sep = ';', decimal=",")
pd.set_option('max_columns', None) 
anion_raw

# the 'set_option' line of code is to make sure that all the columns are displayed as Python automatically 
# limits the amount of columns shown in Jupyter
# the table below displays the raw data

Unnamed: 0,Determination start,Ident,Sample type,Method name,User (short name),Info 1,RS.01 Fluoride concentration,RS.02 Formate concentration,RS.03 Acetate concentration,RS.04 Chloride concentration,RS.05 Nitrite concentration,RS.06 Propionate concentration,RS.07 Bromide concentration,RS.08 Nitrate concentration,RS.09 Phosphate concentration,RS.10 Sulfate concentration
0,2021-07-13 12:18:36 UTC+2,S_0.1_5,Sample,ISPT-anions,Marjo,CD,0.085,Not detected,Not detected,0.085,0.065,Not detected,0.089,0.058,0.055,0.114
1,2021-07-13 11:42:57 UTC+2,S_0.1_4,Sample,ISPT-anions,Marjo,CD,0.085,Not detected,Not detected,0.084,0.065,Not detected,0.085,0.063,0.071,0.101
2,2021-07-13 11:07:16 UTC+2,S_0.1_3,Sample,ISPT-anions,Marjo,CD,0.09,Not detected,Not detected,0.085,0.072,Not detected,0.09,0.067,0.078,0.101
3,2021-07-13 10:31:36 UTC+2,S_0.1_2,Sample,ISPT-anions,Marjo,CD,0.084,Not detected,Not detected,0.082,0.065,Not detected,0.087,0.068,0.073,0.1
4,2021-07-13 09:55:55 UTC+2,S_0.1_1,Sample,ISPT-anions,Marjo,CD,0.088,Not detected,Not detected,0.081,0.066,Not detected,0.086,0.062,0.067,0.095


In [4]:
# the lines of code below create a new table which only contain the relevant info, where the values are sorted
# better for visualization and data processing afterwards

anion_parameters = anion_raw.loc[:, ['Ident', 'RS.01 Fluoride concentration', 'RS.04 Chloride concentration','RS.05 Nitrite concentration','RS.07 Bromide concentration','RS.08 Nitrate concentration','RS.09 Phosphate concentration','RS.10 Sulfate concentration']]
pd.set_option('max_rows', None)
anion_parameters

Unnamed: 0,Ident,RS.01 Fluoride concentration,RS.04 Chloride concentration,RS.05 Nitrite concentration,RS.07 Bromide concentration,RS.08 Nitrate concentration,RS.09 Phosphate concentration,RS.10 Sulfate concentration
0,S_0.1_5,0.085,0.085,0.065,0.089,0.058,0.055,0.114
1,S_0.1_4,0.085,0.084,0.065,0.085,0.063,0.071,0.101
2,S_0.1_3,0.09,0.085,0.072,0.09,0.067,0.078,0.101
3,S_0.1_2,0.084,0.082,0.065,0.087,0.068,0.073,0.1
4,S_0.1_1,0.088,0.081,0.066,0.086,0.062,0.067,0.095


### 4. Statistical analysis

Shapiro-Wilk test is ideally suited for sample sizes between 3 and 50. 

In [5]:
# We use a for loop to loop over the different ions in the columns and calculate the statistics of the Shapiro-Wilk test
# We then create an empty dataframe and store our values in this empty dataframe using the 'append' function

from scipy import stats

list_anions = ['Fluoride','Chloride','Nitrite','Bromide','Nitrate','Phosphate','Sulfate']
tbl_list = []

for anion in list_anions:

    sw_anion = stats.shapiro(anion_parameters.loc[:,[anion in i for i in anion_parameters.columns]])
    tbl_anion = pd.DataFrame({'Statistic':[sw_anion.statistic],'P-value':[sw_anion.pvalue]})
    tbl_anion.index = [anion]

    tbl_list.append(tbl_anion)


table_sw = pd.concat(tbl_list)
table_sw
table_sw.style.applymap(lambda x: 'color : red' if x<0.05 else '',subset=['P-value'])

Unnamed: 0,Statistic,P-value
Fluoride,0.881038,0.31404
Chloride,0.866836,0.253847
Nitrite,0.643807,0.002247
Bromide,0.952351,0.753972
Nitrate,0.945899,0.707887
Phosphate,0.931036,0.603454
Sulfate,0.83312,0.146784


### 5. Conclusion

The null hypothesis in the Shapiro-Wilk test states that data is normally distributed, the alternative hypothesis is that the data is not normal.

For all anions except for nitrite, we cannot reject the null hypothesis. This means that all anions in this experiment are normally distributed, except for nitrite. 

### 6. Dependencies

Below, all versions of software, hardware and Python packages will be displayed, allong with a time stamp that is generated using *Watermark* software.

In [1]:
# First, install watermark in the 'powershell prompt' tab of the Anaconda environment by typing 'pip install watermark'
# in the command window

%load_ext watermark

# python, ipython, packages, and machine characteristics
%watermark -v -m -p wget,pandas,numpy,watermark 

# date
print (" ")
%watermark -u -n -t -z 



Python implementation: CPython
Python version       : 3.8.5
IPython version      : 7.19.0

wget     : 3.2
pandas   : 1.1.3
numpy    : 1.19.2
watermark: 2.3.0

Compiler    : MSC v.1916 64 bit (AMD64)
OS          : Windows
Release     : 10
Machine     : AMD64
Processor   : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
CPU cores   : 8
Architecture: 64bit

 
Last updated: Tue Jan 25 2022 12:02:00Romance Standard Time

