name: lcms_data_processing
date: 08/20/2024
version: 1.0
author: Justin Sankey

description: Takes raw liquit chromatography mass spectroscopy (LCMS) data, 
extracts relevant paramters for analysis and writes them to excel.

When you execute the notebook for the first time you need to install all required python packages.
So, type the following commands in your python console or anaconda prompt:
- pip install pandas
- pip install numpy
- pip install matplotlib
- pip install seaborn
- pip install scikit-learn
- pip install jinja2

In [13]:
# import all needed packages
import os
import re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from openpyxl import load_workbook
from openpyxl.drawing.image import Image
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

This is the only cell that you should have to make edits to.
Enter in your desired input and output file paths 
and change what you deem to be an acceptable recovery range.
*replaced .txt with /t separtaor with .csv with , separator*
*should we implement both options? What will be used?*
*New: indicate directory to save plots to*.


In [27]:
# raw data upload file path
raw_filepath = r'C:\Users\jhsan\OneDrive\Desktop\ACF_Project\Raw_Data\PFAS_ACF_Batch1.csv'
# file path for IDL and IQL data
IDL_IQL_filepath = r'C:\Users\jhsan\OneDrive\Desktop\ACF_Project\ACF\IDL_IQL.csv'

# processed data output file path
processed_filepath =r'C:\Users\jhsan\OneDrive\Desktop\ACF_Project\Processed_Data\PFAS_ACF_Batch1_processed.xlsx'

# file path to write QCS0 data to 
qcs0_filepath = r'C:\Users\jhsan\OneDrive\Desktop\QCS0_area_values\QCS0_area_values.csv' 

# directory to save plots to
plot_directory = r'C:\Users\jhsan\OneDrive\Desktop\ACF_Project\Processed_Data\Plots'

#color-coding for recoveries table
in_range = 'background-color: green'
in_range_min_val = 0.6 
in_range_max_val = 1.4
out_range = 'background-color: red'
out_range_min_val = 0.4 
out_range_max_val = 1.8
question_range = 'background-color: yellow'

In [15]:
# Load data file and remove calibration data
data = pd.read_csv(raw_filepath, delimiter=',', low_memory=False, header=0,)
data_calibration_excluded = data[(data['Sample Type'] != 'Standard')].copy()

In [16]:
# extract values of isotope dilution analysis (IDA) and save areas of intensity peaks to the variable ida_area
selected_columns_area = ['Sample Name', 'Sample Index','Acquisition Date & Time','Component Name', 'Area']
ida_area = data_calibration_excluded[selected_columns_area]
ida_area = ida_area[ida_area['Component Name'].str.contains('IDA')].copy()
ida_area.loc[:,'Sample Name Date'] = ida_area['Sample Name'].astype(str) + "_" + ida_area['Acquisition Date & Time']

# Create pivot table with Sample name as the index, component name as the column headers, and area as the values
ida_area_piv = ida_area.pivot_table(index=('Sample Name Date',), columns='Component Name', values='Area', aggfunc='first')

In [17]:
selected_columns_area = ['Sample Name', 'Sample Index','Acquisition Date & Time','Component Name', 'Area IDA']
area_ida = data_calibration_excluded[selected_columns_area]
area_ida = area_ida[~area_ida['Component Name'].str.contains('IDA|IPS|13C')]
area_ida.loc[:,'Sample Name Date'] = area_ida['Sample Name'].astype(str) + "_" + area_ida['Acquisition Date & Time']

# Create pivot table with Sample name as the index, component name as the column headers, and area as the values
area_ida_piv = area_ida.pivot_table(index=('Sample Name Date',), columns='Component Name', values='Area IDA', aggfunc='first')
area_ida_piv

Component Name,10:2 FTS_TOF MS,11ClPF3OUdS,11ClPF3OUdS_TOF MS,"3,6-OPFHpA","3,6-OPFHpA_TOF",4:2 FTS,4:2 FTS_TOF MS,5:3 FTCA,5:3 FTCA_TOF MS,6:2 FTS,...,PFTeDA,PFTeDA_TOF MS,PFTrDA,PFTrDA_TOF MS,PFUdA,PFUdA_TOF MS,d-EtFOSA_TOF MS,d-MeFOSA_TOF MS,d3-MeFOSAA_TOF MS,d5-EtFOSAA_TOF MS
Sample Name Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
230620_MS_1_6/30/2023 5:18,,668.949417,3187.583423,29367.55081,87904.05234,6392.437,38993.87,,103.649686,6234.813296,...,176.36001,103.161127,463.8541,370.2964,4127.98,6269.475,377.667361,1762.958361,24801.32935,11437.61139
230620_PB_1_6/29/2023 16:01,,,,,,,,,61.966066,,...,379.256425,,312.6663,,115.8729,,,428.453234,14214.39752,8300.091708
230622_MS_2_6/30/2023 5:35,,,2067.545284,,642.86288,,138.4622,,205.927383,56.450401,...,215.845216,100.245854,509.9297,,529.6089,993.9493,8522.3495,9390.586734,1895.895411,1715.931437
230622_MS_3_6/30/2023 5:51,,752.67058,2674.999697,37922.27233,112262.1071,8555.154,51594.52,,189.189135,9388.797708,...,188.241276,115.75965,613.6924,59.61742,4648.688,7328.056,411.020922,752.392698,27259.17911,18167.62579
230622_PB_2_6/29/2023 16:18,,,,,,1048.382,5726.77,,65.298456,416.777914,...,2752.825257,150.513087,1667.016,,340.2505,,626.739181,,48193.94705,12673.6364
230622_PB_3_6/29/2023 16:34,,,,,,,,,,18.735062,...,427.968137,,502.6732,78.48192,87.49975,,,,6144.337081,3225.10154
ACN_1_6/30/2023 4:45,,,,,,17758.86,111634.5,,344.331262,4863.702485,...,2674.832063,4250.488214,46837.51,26416.14,288180.1,193813.6,455.940204,183.66012,19644.176,17427.01443
ACN_2_6/30/2023 5:02,434.156268,,,456.523255,1371.024817,66138.31,446555.8,10.47767,2325.37074,11457.8692,...,35882.44963,49442.39049,153996.4,56170.79,329382.3,422299.9,646.882559,,31953.40986,19163.39828
ACN_F1_6/30/2023 1:26,,,,1778.306806,7968.562814,378836.3,1359948.0,,183.242033,67834.80765,...,57183.73459,58223.33964,771469.5,681909.6,1428667.0,1967390.0,297.355785,,35307.89478,11325.86466
ACN_F2_6/30/2023 1:42,,,,2720.619298,7594.118666,333554.5,1440386.0,12.636901,4201.91302,61951.00523,...,103743.9704,69198.26335,872327.6,897621.4,1257836.0,2208344.0,1266.247699,2075.613785,65105.4225,27797.99545


In [18]:
selected_columns_area = ['Sample Name', 'Sample Index','Acquisition Date & Time','Component Name', 'Area IPS']
area_ips = data_calibration_excluded[selected_columns_area]
area_ips = area_ips[~area_ips['Component Name'].str.contains('IDA|IPS|13C')]
area_ips.loc[:,'Sample Name Date'] = area_ips['Sample Name'].astype(str) + "_" + area_ips['Acquisition Date & Time']

# Create pivot table with Sample name as the index, component name as the column headers, and area as the values
area_ips_piv = area_ips.pivot_table(index=('Sample Name Date',), columns='Component Name', values='Area IPS', aggfunc='first')
area_ips_piv

Component Name,11ClPF3OUdS,11ClPF3OUdS_TOF MS,"3,6-OPFHpA","3,6-OPFHpA_TOF",4:2 FTS,4:2 FTS_TOF MS,5:3 FTCA,5:3 FTCA_TOF MS,6:2 FTS,6:2 FTS_TOF MS,...,PFTeDA,PFTeDA_TOF MS,PFTrDA,PFTrDA_TOF MS,PFUdA,PFUdA_TOF MS,d-EtFOSA_TOF MS,d-MeFOSA_TOF MS,d3-MeFOSAA_TOF MS,d5-EtFOSAA_TOF MS
Sample Name Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
230620_MS_1_6/30/2023 5:18,668.949417,668.949417,29367.55081,29367.55081,6392.437,6392.437,,,6234.813296,6234.813296,...,176.36001,176.36001,463.8541,463.8541,4127.98,4127.98,,,2356.197525,
230620_PB_1_6/29/2023 16:01,,,,,,,,,,,...,379.256425,379.256425,312.6663,312.6663,115.8729,115.8729,,,,
230622_MS_2_6/30/2023 5:35,,,,,,,,,56.450401,56.450401,...,215.845216,215.845216,509.9297,509.9297,529.6089,529.6089,133.476217,102.053077,,
230622_MS_3_6/30/2023 5:51,752.67058,752.67058,37922.27233,37922.27233,8555.154,8555.154,,,9388.797708,9388.797708,...,188.241276,188.241276,613.6924,613.6924,4648.688,4648.688,,,4059.086725,670.784276
230622_PB_2_6/29/2023 16:18,,,,,1048.382,1048.382,,,416.777914,416.777914,...,2752.825257,2752.825257,1667.016,1667.016,340.2505,340.2505,,,,
230622_PB_3_6/29/2023 16:34,,,,,,,,,18.735062,18.735062,...,427.968137,427.968137,502.6732,502.6732,87.49975,87.49975,,,,
ACN_1_6/30/2023 4:45,,,,,17758.86,17758.86,,,4863.702485,4863.702485,...,2674.832063,2674.832063,46837.51,46837.51,288180.1,288180.1,,348.510596,,
ACN_2_6/30/2023 5:02,,,456.523255,456.523255,66138.31,66138.31,10.47767,10.47767,11457.8692,11457.8692,...,35882.44963,35882.44963,153996.4,153996.4,329382.3,329382.3,226.245278,602.131027,,
ACN_F1_6/30/2023 1:26,,,1778.306806,1778.306806,378836.3,378836.3,,,67834.80765,67834.80765,...,57183.73459,57183.73459,771469.5,771469.5,1428667.0,1428667.0,,699.36786,,
ACN_F2_6/30/2023 1:42,,,2720.619298,2720.619298,333554.5,333554.5,12.636901,12.636901,61951.00523,61951.00523,...,103743.9704,103743.9704,872327.6,872327.6,1257836.0,1257836.0,329.293335,1250.57135,,


In [19]:
#IPS normalized recoveries
ips_norm_recovery = area_ida_piv/area_ips_piv
ips_norm_recovery
# color code recoveries
def color_map(val):
    if in_range_min_val <= val <= in_range_max_val:
        return in_range
    elif val < out_range_min_val or val > out_range_max_val:
        return out_range
    else:
        return question_range

# Apply the style function to the entire DataFrame
styled_ips_norm_recovery = ips_norm_recovery.style.applymap(color_map)
styled_ips_norm_recovery

Component Name,10:2 FTS_TOF MS,11ClPF3OUdS,11ClPF3OUdS_TOF MS,"3,6-OPFHpA","3,6-OPFHpA_TOF",4:2 FTS,4:2 FTS_TOF MS,5:3 FTCA,5:3 FTCA_TOF MS,6:2 FTS,6:2 FTS_TOF MS,7:3 FTCA,7:3 FTCA_TOF MS,8:2 FTS,8:2 FTS_TOF MS,9:3 FTCA_TOF MS,9ClPF3ONS,9ClPF3ONS_TOF MS,DONA,DONA_TOF MS,EtFOSA,EtFOSA_TOF MS,FBSA,FBSA_TOF MS,FHxSA,FHxSA_TOF MS,FOSA,FOSA_TOF MS,FPeSA,FPeSA_TOF MS,HFPO-DA,HFPO-DA_TOF MS,MeFBSA,MeFBSA_TOF MS,MeFOSA,MeFOSA_TOF MS,N-EtFOSAA,N-EtFOSAA_TOF MS,N-EtFOSAA_branched,N-EtFOSAA_branched_TOF MS,N-MeFOSAA,N-MeFOSAA_TOF MS,N-MeFOSAA_branched,N-MeFOSAA_branched_TOF MS,PF4OPeA,PF4OPeA_TOF,PF5OHxA,PF5OHxA_TOF,PFBA,PFBA_TOF MS,PFBS,PFBS_TOF MS,PFDA,PFDA_TOF MS,PFDS,PFDS_TOF MS,PFDoA,PFDoA_TOF MS,PFEA,PFEA_TOF MS,PFECHS,PFECHS_TOF MS,PFEESA_TOF,PFHpA,PFHpA_TOF MS,PFHpS,PFHpS_TOF MS,PFHxA,PFHxA_TOF MS,PFHxS,PFHxS_TOF MS,PFHxS_branched,PFHxS_branched_TOF MS,PFNA,PFNA_TOF MS,PFNS,PFNS_TOF MS,PFOA,PFOA_TOF MS,PFOS,PFOS_TOF MS,PFOS_branched,PFOS_branched_TOF MS,PFPeA,PFPeA_TOF MS,PFPeS,PFPeS_TOF MS,PFPrSA_TOF MS,PFTeDA,PFTeDA_TOF MS,PFTrDA,PFTrDA_TOF MS,PFUdA,PFUdA_TOF MS,d-EtFOSA_TOF MS,d-MeFOSA_TOF MS,d3-MeFOSAA_TOF MS,d5-EtFOSAA_TOF MS
Sample Name Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1
230620_MS_1_6/30/2023 5:18,,1.0,4.765059,1.0,2.993237,1.0,6.1,,,1.0,6.167332,1.0,4.357471,1.0,6.122152,,1.0,4.973258,1.0,2.350445,,,1.0,6.345116,1.0,9.449025,1.0,7.98573,,,1.0,1.684553,1.0,27.902283,,,,,,,1.0,2.767408,0.883696,3.629983,1.0,1.218637,,,1.0,1.894016,1.0,12.869275,1.0,1.988476,1.0,10.287587,1.0,1.774233,1.0,4.136538,1.0,17.011465,,1.0,1.788671,1.0,19.71299,1.0,1.741959,1.0,16.967676,0.235075,2.227979,1.0,1.687434,1.0,19.585121,1.0,2.019125,1.0,14.827661,0.403837,0.901085,1.0,2.411251,1.0,15.220727,,1.0,0.584946,1.0,0.798304,1.0,1.518775,,,10.525998,
230620_PB_1_6/29/2023 16:01,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,29.137523,,,,,,,,,,,,,,,,,,,,,1.0,2.06471,,,1.0,1.200507,,,1.0,2.916772,1.0,1.753212,,,,1.0,1.459472,1.0,2.558851,1.0,1.61151,,,,,1.0,0.175554,1.0,0.999488,1.0,1.941866,,,,,1.0,1.633514,,,,1.0,,1.0,,1.0,,,,,
230622_MS_2_6/30/2023 5:35,,,,,,,,,,1.0,3.29384,1.0,3.615623,,,,1.0,3.285073,1.0,2.022909,1.0,16.050937,1.0,4.062766,1.0,7.675823,1.0,6.720927,,,,,,,1.0,13.820337,,,,,,,,,,,,,1.0,2.124061,1.0,11.260691,1.0,2.154838,1.0,11.206529,1.0,0.961784,1.0,2.454612,,,,1.0,1.718559,1.0,12.231955,1.0,1.074463,1.0,12.024246,0.154091,1.669364,1.0,0.623476,1.0,17.422978,1.0,1.855107,1.0,5.07818,0.259115,0.609063,1.0,2.217044,1.0,13.361685,,1.0,0.464434,1.0,,1.0,1.876761,63.849199,92.016694,,
230622_MS_3_6/30/2023 5:51,,1.0,3.554011,1.0,2.960321,1.0,6.030811,,,1.0,5.357943,1.0,4.084506,1.0,5.599727,,1.0,3.917943,1.0,2.208184,,,1.0,6.060327,1.0,10.449135,1.0,8.309177,,,1.0,1.556697,,,,,1.0,9.565474,1.002778,1.518231,1.0,1.748963,0.084443,0.591831,1.0,1.225544,,,1.0,1.758845,1.0,13.355157,1.0,1.97829,1.0,11.778201,1.0,0.526703,1.0,5.203798,1.0,18.33176,,1.0,1.710648,1.0,18.341381,1.0,1.732123,1.0,15.932426,0.273179,2.463201,1.0,1.530215,1.0,16.79682,1.0,2.037899,1.0,14.159378,0.411241,0.854248,1.0,2.552269,1.0,16.7279,,1.0,0.614954,1.0,0.097145,1.0,1.576371,,,6.715594,27.084156
230622_PB_2_6/29/2023 16:18,,,,,,1.0,5.462486,,,1.0,2.754348,,,,,,,,1.0,1.263896,,,1.0,8.336985,1.0,10.152938,1.0,10.126133,1.0,1.825062,1.0,3.19636,,,,,,,,,,,,,,,,,1.0,0.974594,1.0,12.00214,1.0,1.648992,1.0,,1.0,,1.0,33.429376,,,,1.0,1.954429,1.0,16.883138,1.0,1.719492,1.0,14.372477,0.229228,1.927057,1.0,1.222132,1.0,0.365279,1.0,2.203252,1.0,8.340558,0.535257,1.063072,1.0,2.692306,1.0,9.430975,,1.0,0.054676,1.0,,1.0,,,,,
230622_PB_3_6/29/2023 16:34,,,,,,,,,,1.0,1.368603,,,,,,,,,,,,,,1.0,2.772027,1.0,5.228044,,,,,,,,,,,,,,,,,,,,,1.0,1.538629,,,1.0,1.566911,,,1.0,0.930389,1.0,36.381548,,,,1.0,1.518714,1.0,7.472788,1.0,1.89022,1.0,11.84596,0.68169,1.025727,1.0,1.227825,,,1.0,1.741659,1.0,26.811795,2.544459,0.615204,1.0,1.900826,,,,1.0,,1.0,0.156129,1.0,,,,,
ACN_1_6/30/2023 4:45,,,,,,1.0,6.286131,,,1.0,1.909702,,,1.0,3.657444,,,,,,,,1.0,5.43119,1.0,5.966098,1.0,7.178309,1.0,3.533152,1.0,1.103389,,,1.0,,,,,,,,,,1.0,1.233849,,,1.0,1.002303,1.0,14.450571,1.0,1.323543,,,1.0,0.213981,1.0,13.226458,,,,1.0,1.333379,1.0,12.817832,1.0,1.160939,1.0,9.927629,0.176427,2.475915,1.0,1.083504,1.0,5.121494,1.0,1.047008,1.0,9.037233,0.324811,0.766571,1.0,1.419324,1.0,11.476279,,1.0,1.589067,1.0,0.563995,1.0,0.672544,,0.526986,,
ACN_2_6/30/2023 5:02,,,,1.0,3.003187,1.0,6.751847,1.0,221.935857,1.0,1.339878,,,1.0,2.861025,,,,,,1.0,17.477689,1.0,2.987066,1.0,4.109938,1.0,4.376683,1.0,3.788933,1.0,0.530221,,,1.0,3.960695,,,,,,,,,1.0,1.124926,,,1.0,1.055816,1.0,11.63711,1.0,1.449728,,,1.0,0.54347,,,,,,1.0,2.00392,1.0,11.116408,1.0,1.938514,1.0,7.159334,0.203903,2.362496,1.0,1.596816,1.0,6.261307,1.0,1.095885,1.0,8.675779,0.417662,0.970017,1.0,1.829213,1.0,14.836615,,1.0,1.377899,1.0,0.364754,1.0,1.282097,2.859209,,,
ACN_F1_6/30/2023 1:26,,,,1.0,4.480983,1.0,3.589804,,,1.0,1.543207,,,1.0,1.770088,,,,,,,,1.0,3.003771,1.0,4.248204,1.0,2.985507,1.0,3.862762,1.0,0.446061,,,1.0,6.032816,,,,,,,,,1.0,1.077605,1.0,2.467163,1.0,1.137392,1.0,5.899918,1.0,2.764918,1.0,0.791278,1.0,1.393335,1.0,1.880586,1.0,10.545324,,1.0,4.112686,1.0,6.092441,1.0,3.020904,1.0,4.497339,0.296912,1.398535,1.0,3.491785,1.0,6.417252,1.0,1.831714,1.0,7.724487,0.83462,0.697495,1.0,2.222781,1.0,11.808209,,1.0,1.01818,1.0,0.88391,1.0,1.377081,,,,
ACN_F2_6/30/2023 1:42,,,,1.0,2.79132,1.0,4.318292,1.0,332.511345,1.0,1.888902,1.0,66.759278,1.0,1.826866,,,,,,1.0,2.711515,1.0,2.844236,1.0,5.456384,1.0,4.273301,1.0,4.225505,1.0,0.436561,,,1.0,9.896288,,,,,,,,,1.0,1.182828,,,1.0,1.107988,1.0,6.762666,1.0,2.854175,1.0,1.644594,1.0,1.592093,,,1.0,15.819716,,1.0,4.103843,1.0,7.729959,1.0,2.811263,1.0,5.209646,0.285383,1.824532,1.0,3.713675,1.0,7.311752,1.0,2.222657,1.0,9.239589,0.754743,0.865857,1.0,2.074775,1.0,14.011835,,1.0,0.66701,1.0,1.028996,1.0,1.755669,3.845349,1.659732,,


In [20]:
# The following file extracts quality control standard data (QSC0) from the isotope dilution analaysis (IDA) 
# and appends it to an excisting QCS0 file which is indicated in the second code block
# (*qcs0_filepath*).

# Filter rows where 'Sample Name' contains 'QCS0'
qcs0_samples = ida_area_piv.reset_index()
qcs0_samples = qcs0_samples[qcs0_samples['Sample Name Date'].str.contains('QCS0')]

# Append the filtered DataFrame to an existing CSV file
if os.path.exists(qcs0_filepath):
    # Load the existing data
    qsc0_existing_data = pd.read_csv(qcs0_filepath)
    
    # Combine existing data with new data, avoiding duplicates
    qsc0_combined_data = pd.concat([qsc0_existing_data, qcs0_samples]).drop_duplicates(subset=['Sample Name Date'])
    
    # Write back to the CSV without writing headers again
    qsc0_combined_data.to_csv(qcs0_filepath, index=False)
else:
    # If the file doesn't exist, write the data with headers
    qcs0_samples.to_csv(qcs0_filepath, index=False)

Computation of recovery rates: for each IDA row take area and divide it by average area of QCS0 values.

In [21]:
# Calculate recoveries
# Average QCS0 from saved csv file
qsc0_combined_data=pd.read_csv(qcs0_filepath, header=0, low_memory=False)
qsc0_combined_data.drop(columns=['Sample Name Date'], inplace=True)
qsc0_avg = qsc0_combined_data.mean()

# Generate base of recovery table by copying the area pivot data
recovery = ida_area_piv.copy()

# Recovery calculation (sample area/avg QCS0 area)
for index, row in ida_area_piv.iterrows():
    for col in ida_area_piv.columns:
        if ida_area_piv[col].dtype in ['float64', 'int64']:
            recovery.at[index, col] = row[col] / qsc0_avg[col]

In [22]:
# color code recoveries
def color_map(val):
    if in_range_min_val <= val <= in_range_max_val:
        return in_range
    elif val < out_range_min_val or val > out_range_max_val:
        return out_range
    else:
        return question_range

# Apply the style function to the entire DataFrame
styled_recovery = recovery.style.applymap(color_map)

Compute method detection limits based on avaerage and standard deviation of process blanks. Use instrument detection limits (IDL) for the PFAS compounds not included in the process blanks.

In [23]:
selected_columns = ['Sample Name', 'Component Name', 'Calculated Concentration']

# Isolate blank data and remove IDA/IPS values
data_isolated_blank = data_calibration_excluded[data_calibration_excluded['Sample Comment'].str.contains('Blank')]
data_isolated_blank = data_isolated_blank[selected_columns]
data_isolated_blank = data_isolated_blank[~data_isolated_blank['Component Name'].str.contains('IDA|IPS')]

# Create pivot table with Sample name as the index, component name as the column header, and concentration as the value
blank_data_piv = data_isolated_blank.pivot_table(index='Sample Name', columns='Component Name', values='Calculated Concentration', aggfunc='first')

# Isolate the process blank data
process_blank_data = blank_data_piv[blank_data_piv.index.str.contains('PB')]

# replace all <1 point values with NaN values
process_blank_data = process_blank_data.replace("<1 points", np.nan)

# Change any non numeric values to numeric
process_blank_data = process_blank_data.apply(pd.to_numeric, errors='coerce')

# calculate the average PB value excluding NaN values
process_blank_data_avg = np.nanmean(process_blank_data, axis=0)
process_blank_data.loc['PB_avg'] = process_blank_data_avg

# calculate the standard deviation
process_blank_data_stdev = process_blank_data.std(skipna=True)
process_blank_data.loc['PB_stdev'] = process_blank_data_stdev

# MDL calculation PB_avg + 3 * PB_stdev
process_blank_data.loc['MDL'] = np.nan_to_num(process_blank_data.loc['PB_avg']) + 3 * np.nan_to_num(process_blank_data.loc['PB_stdev'])
process_blank_data.loc['MDL'] = process_blank_data.loc['MDL'].replace(0, np.nan)

#Load IDL_IQL file
IDL_IQL = pd.read_csv(IDL_IQL_filepath, index_col=0, low_memory=False)

#change all non numeric values to numeric
IDL_IQL = IDL_IQL.apply(pd.to_numeric, errors='coerce')
IQL = IDL_IQL.loc[['IQL']]  
IQL = IQL.apply(pd.to_numeric, errors='coerce')

#replace all NaN values in MDL with the IQL value
common_columns = process_blank_data.columns.intersection(IQL.columns)
process_blank_data.loc['MDL', common_columns] = process_blank_data.loc['MDL', common_columns].combine_first(IQL.squeeze())



Load concentration data and apply method detection limits (MDL) to filter values below MDL.
Values below MDL are replaced with \< MDL.

Remark 1: mix of floats and strings.
Remark 2: when calculation of concentration outputs NA, it is converted to \< MDL, ist this really what we want to do?

In [24]:
#Load concentration data and exclude blanks
selected_columns = ['Sample Name', 'Component Name', 'Calculated Concentration']
data_blank_excluded = data_calibration_excluded[data_calibration_excluded['Sample Comment'] != 'Blank'][selected_columns].copy()
data_blank_excluded = data_blank_excluded[~data_blank_excluded['Component Name'].str.contains('IDA|IPS')]
data_blank_excluded = data_blank_excluded.pivot_table(
    index='Sample Name', columns='Component Name', values='Calculated Concentration', aggfunc='first'
    )
data_blank_excluded = data_blank_excluded.replace("<1 points", np.nan)

# Align the columns of PB with data_without_blank
common_columns = data_blank_excluded.columns.intersection(process_blank_data.columns)
PB_aligned = process_blank_data[common_columns]

# Convert MDL values to numeric to handle both numeric and string types
mdl_values = pd.to_numeric(PB_aligned.loc['MDL'], errors='coerce')

# Use np.where with numeric comparison
data_blank_excluded[common_columns] = np.where(
    data_blank_excluded[common_columns].apply(pd.to_numeric, errors='coerce') < mdl_values.values,
    "<MDL", data_blank_excluded[common_columns]
    )
data_blank_excluded = data_blank_excluded.fillna('<MDL')

Determine linear calibration curve based on calibration data.
*to be discussed: method for R2 computation*

In [25]:
# Function to sanitize file names
def sanitize_filename(name):
    """Removes special characters from PFAS name to create valid directory names."""    
    return re.sub(r'[\\/*?:"<>|]', "_", name)

# extract PFAS data
selected_columns = ['Sample Name', 'Component Name', 'Actual Concentration','IS Actual Concentration','Area','IS Area']
data_pfas = data[data['Sample Name'].str.contains('PFAS CS')].copy()
data_pfas = data_pfas.fillna(0)
data_pfas = data_pfas[~data_pfas['Component Name'].str.contains('IDA|IPS|13C|d3|d5|TOF')]
data_pfas['Concentration/IS Concentration'] = data_pfas['Actual Concentration']/data_pfas['IS Actual Concentration']
data_pfas['Area/IS Area'] = data_pfas['Area']/data_pfas['IS Area']

# Create a list of unique sample names and count them.
components = data_pfas['Component Name'].unique()
n_components = len(components)

image_paths = []
# Iterate over each component and create scatter plots with regression lines
for i, component in enumerate(components):
    component_data = data_pfas[data_pfas['Component Name'] == component]
    
    # Extract x and y values
    x = component_data['Concentration/IS Concentration'].values.reshape(-1, 1)
    y = component_data['Area/IS Area'].values
    
    # Perform linear regression
    model = LinearRegression()
    model.fit(x, y)
    y_pred = model.predict(x)
    r2 = r2_score(y, y_pred)

    # Regression equation
    slope = model.coef_[0]
    intercept = model.intercept_
    equation = f'y = {slope:.2f}x + {intercept:.2f}'

    plt.figure(figsize = (8,6))
    # Plot with Seaborn
    sns.regplot(
       # ax=axes[i], 
        x=x.flatten(), 
        y=y, 
        scatter=True, 
        fit_reg=True,
        line_kws={"color": "red"},  # Color of the regression line
        scatter_kws={"s": 50, "alpha": 0.7},  # Customize scatter points
        ci=95
    )
    # Set the title with the component name
    plt.title(f'{component}')
    plt.xlabel('Concentration/IS Concentration')
    plt.ylabel('Area/IS Area')
    plt.text(0.05, 0.95, f'{equation}\n$R^2$ = {r2:.2f}', 
             transform=plt.gca().transAxes, 
             fontsize=10, 
             verticalalignment='top', 
             bbox=dict(boxstyle="round,pad=0.3", edgecolor="black", facecolor="white"))
   
    # Sanitize the file name
    sanitized_component = sanitize_filename(component)
    image_path = os.path.join(plot_directory, f'{sanitized_component}.png')
    
    # Save the plot as an image
    plt.savefig(image_path)
    plt.close()
    
    image_paths.append(image_path)

Writes all relevant data to excel file and adds calibration curves.

In [28]:
# create excel with pandas excelwriter
with pd.ExcelWriter(processed_filepath, engine = 'openpyxl') as writer:
    styled_recovery.to_excel(writer, sheet_name = 'Recoveries')
    styled_ips_norm_recovery.to_excel(writer, sheet_name = 'IPS Normalized Recoveries')
    process_blank_data.to_excel(writer, sheet_name = 'MDL_Values')
    data_blank_excluded.to_excel(writer, sheet_name = 'Concentration_filtered_MDL')
    ida_area_piv.to_excel(writer, sheet_name = "Area_Pivot")
    IDL_IQL.to_excel(writer, sheet_name = "IDL_IQL")
    data_pfas.to_excel(writer, sheet_name = "Calibration Data")

workbook = load_workbook(processed_filepath)
plot_sheet = workbook.create_sheet('Calibration Curves')
    
# Insert all images into one sheet in a grid format
row_offset = 1  # Start at the first row
col_offset = 1  # Start at the first column
images_per_row = 2  # Number of images per row

for i, image_path in enumerate(image_paths):
    # Calculate the position for each image
    row_position = row_offset + (i // images_per_row) * 15  # Adjust the multiplier to control spacing
    col_position = col_offset + (i % images_per_row) * 20   # Adjust the multiplier to control spacing
    
    # Load the image
    img = Image(image_path)
    
    # Place the image at the calculated position
    cell_position = plot_sheet.cell(row=row_position, column=col_position).coordinate
    plot_sheet.add_image(img, cell_position)

workbook.save(processed_filepath)
