### 1. Background

In evaluating different types of vials in another Python script (*cation_processing_script*), the assumption is made that the data is normally distributed. However, no real statistical tests are done there because there are only two measurements per variable, so the statistical power would be very low. To circumvent this problem, we can check if the data generated by the ion chromatograph is normally distributed by looking at some historical data when the LOD/LOQ was calculated. In this experiment, five measurements were taken per variable. This should allow for a Shapiro-Wilk test to check for normality. This will be conducted below.

Here LOD and LOQ for cations are determined.

Concentration that was used in this experiment was 0.1 mg/L (standard).

### 2. Import the data from an open-access repository

To ensure open access and readability of the data, the dataset that is used below is saved to an open access repository on Zenodo. The lines of code below enable downloading this data from the repository.

In [1]:
! pip install wget

# if wget is not installed in your environment, the command above installs it in the correct place. The '!' sign tells Jupyter to run this command in the command prompt

import wget # to download from zenodo

# if wget is not found (error) then install wget from the 'powershell prompt' within Anaconda environment by typing
# 'pip install wget' in the command window, then restart the kernel of the Jupyter notebook, then it should work.

# file name and zenodo url
file_name  = "LOD_LOQ_cations.csv"
zenodo_url = "https://zenodo.org/record/5909650#.YfJZl98o9hE" #5909650  are the last digits of the specific version of dataset DOI



In [2]:
# download
wget.download(zenodo_url + file_name, "./" + file_name) # input, output

'./LOD_LOQ_cations (1).csv'

### 3. Importing the relevant packages and reading csv file

In [4]:
# data should be saved as a UTF-8 compatible csv file

import pandas as pd
import numpy as np
cation_raw = pd.read_csv("./" + file_name, sep = ';', decimal=",")
pd.set_option('max_columns', None) 
cation_raw

# the 'set_option' line of code is to make sure that all the columns are displayed as Python automatically 
# limits the amount of columns shown in Jupyter
# the table below displays the raw data

Unnamed: 0,analysis_time,ident,sample_type,method_name,user,info,sodium_concentration,potassium_concentration,calcium_concentration,magnesium_concentration
0,2022-01-26 14:51:49 UTC+1,0.1_S5,Sample,Improved-cations,Marjo,CD_LOQ_cations,-0.098,0.12,0.033,0.114
1,2022-01-26 14:18:26 UTC+1,0.1_S4,Sample,Improved-cations,Marjo,CD_LOQ_cations,-0.099,0.122,0.039,0.108
2,2022-01-26 13:45:02 UTC+1,0.1_S3,Sample,Improved-cations,Marjo,CD_LOQ_cations,-0.097,0.122,0.021,0.1
3,2022-01-26 13:11:38 UTC+1,0.1_S2,Sample,Improved-cations,Marjo,CD_LOQ_cations,-0.099,0.128,0.014,0.099
4,2022-01-26 12:38:15 UTC+1,0.1_S1,Sample,Improved-cations,Marjo,CD_LOQ_cations,-0.118,0.126,0.011,0.092


In [5]:
# the lines of code below create a new table which only contain the relevant info, where the values are sorted
# better for visualization and data processing afterwards

cation_parameters = anion_raw.loc[:, ['ident', 'sodium_concentration', 'potassium_concentration','calcium_concentration','magnesium_concentration']]
pd.set_option('max_rows', None)
cation_parameters

Unnamed: 0,ident,sodium_concentration,potassium_concentration,calcium_concentration,magnesium_concentration
0,0.1_S5,-0.098,0.12,0.033,0.114
1,0.1_S4,-0.099,0.122,0.039,0.108
2,0.1_S3,-0.097,0.122,0.021,0.1
3,0.1_S2,-0.099,0.128,0.014,0.099
4,0.1_S1,-0.118,0.126,0.011,0.092


### 4. Statistical analysis

Shapiro-Wilk test is ideally suited for sample sizes between 3 and 50. 

In [7]:
# We use a for loop to loop over the different ions in the columns and calculate the statistics of the Shapiro-Wilk test
# We then create an empty dataframe and store our values in this empty dataframe using the 'append' function

from scipy import stats

list_cations = ['sodium','potassium','calcium','magnesium']
tbl_list = []

for cation in list_cations:

    sw_cation = stats.shapiro(cation_parameters.loc[:,[cation in i for i in cation_parameters.columns]])
    tbl_cation = pd.DataFrame({'Statistic':[sw_cation.statistic],'P-value':[sw_cation.pvalue]})
    tbl_cation.index = [cation]

    tbl_list.append(tbl_cation)


table_sw = pd.concat(tbl_list)
table_sw
table_sw.style.applymap(lambda x: 'color : red' if x<0.05 else '',subset=['P-value'])

Unnamed: 0,Statistic,P-value
sodium,0.640424,0.002046
potassium,0.913669,0.48995
calcium,0.922569,0.546648
magnesium,0.968574,0.866022


### 5. Conclusion

The null hypothesis in the Shapiro-Wilk test states that data is normally distributed, the alternative hypothesis is that the data is not normal.

For all cations except for sodium, we cannot reject the null hypothesis. This means that all cations in this experiment are normally distributed, except for sodium. 

Remark: in this experiment, the calibration curve for sodium was not ideal. The concentrations that were calculated from the peak areas were < 0. This could potentially have influenced the assessment of normality in this data.

### 6. LOD/LOQ determination

In the lines of code below, we will calculate the LOD and LOQ based on the standard deviations of the ions measured 5 times.

LOD is calculated as 3x the standard deviation.

LOQ is calculated as 10x the standard deviation.

In [8]:
# Removing the 'ident' column

cation_numbers = cation_parameters.drop(labels='ident', axis=1)
cation_numbers

Unnamed: 0,sodium_concentration,potassium_concentration,calcium_concentration,magnesium_concentration
0,-0.098,0.12,0.033,0.114
1,-0.099,0.122,0.039,0.108
2,-0.097,0.122,0.021,0.1
3,-0.099,0.128,0.014,0.099
4,-0.118,0.126,0.011,0.092


In [9]:
lo_list = []

for cation in list_cations:
    
    lod_cation = 3*np.std(cation_numbers.loc[:,[cation in i for i in cation_numbers.columns]])
    loq_cation = 10*np.std(cation_numbers.loc[:,[cation in i for i in cation_numbers.columns]])
    
    tbl_lo = pd.DataFrame({'LOD':lod_cation,'LOQ':loq_cation})
    tbl_lo.index = [cation]
    
    lo_list.append(tbl_lo)

table_lo = pd.concat(lo_list)
table_lo



Unnamed: 0,LOD,LOQ
sodium,0.023804,0.079347
potassium,0.008818,0.029394
calcium,0.0324,0.108
magnesium,0.022895,0.076315


### 7. References

1. Shrivastava, A., & Gupta, V. B. (2011). Methods for the determination of limit of detection and limit of quantitation of the analytical methods. Chronicles of young scientists, 2(1), 21-25.

### 8. Dependencies

Below, all versions of software, hardware and Python packages will be displayed, allong with a time stamp that is generated using *Watermark* software.

In [10]:
# First, install watermark in the 'powershell prompt' tab of the Anaconda environment by typing 'pip install watermark'
# in the command window

%load_ext watermark

# python, ipython, packages, and machine characteristics
%watermark -v -m -p wget,pandas,numpy,watermark 

# date
print (" ")
%watermark -u -n -t -z 



Python implementation: CPython
Python version       : 3.8.5
IPython version      : 7.19.0

wget     : 3.2
pandas   : 1.1.3
numpy    : 1.19.2
watermark: 2.3.0

Compiler    : MSC v.1916 64 bit (AMD64)
OS          : Windows
Release     : 10
Machine     : AMD64
Processor   : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
CPU cores   : 8
Architecture: 64bit

 
Last updated: Thu Jan 27 2022 09:44:56Romance Standard Time

