name: lcms_data_processing
date: 08/20/2024
version: 1.0
author: Justin Sankey

description: Quick fix for error in IPS Area of Old Method for PFHxA, PFHpA, PFOA, PFNA, and HFPO-DA. Takes raw liquid chromatography mass spectroscopy (LCMS) data, extracts relevant paramters for analysis and writes them to excel.

When you execute the notebook for the first time you need to install all required python packages.
So, type the following commands in your python console or anaconda prompt:
- pip install pandas
- pip install numpy

In [19]:
# import all needed packages
import pandas as pd
import numpy as np
from openpyxl import load_workbook

This is the only cell that you should have to make edits to.
Enter in your desired input and output file paths 
and change what you deem to be an acceptable recovery range.
'Enter in your file as the .txt form generated from the processing station'
'Give the desired file path to the folder you want the output file to generate in and then give it a unique name'

In [20]:
# raw data upload file path
raw_filepath = r'C:\Users\jhsan\OneDrive\Desktop\ACF_Project\Raw_Data\20240903_pfas_kynol_ks_single_compound.txt'

# processed data output file path
processed_filepath =r'C:\Users\jhsan\OneDrive\Desktop\ACF_Project\Processed_Data\20240903_pfas_kynol_ks_single_compound_recoveries.xlsx'

#color-coding for recoveries table(enter in your acceptable good ranges here ie. in range .6-1.4, out of range, <.4 and >1.8 with .4-.6 and 1.4-1.8 being questionable)
in_range = 'background-color: green'
in_range_min_val = 0.6 
in_range_max_val = 1.4
out_range = 'background-color: red'
out_range_min_val = 0.4 
out_range_max_val = 1.8
question_range = 'background-color: yellow'

In [3]:
# Load data file and remove calibration data
data = pd.read_csv(raw_filepath, delimiter='\t', encoding = 'utf-8',low_memory=False, header=0,)
# Step 1: Replace 'Component Group Name' values
data['Component Group Name'] = data['Component Group Name'].replace('IPS-13C2_PFOA', 'IPS-13C4_PFOA')

# Step 2: Find rows where 'Component Group Name' is 'IPS-13C4_PFOA' (after replacement)
mask = data['Component Group Name'] == 'IPS-13C4_PFOA'

# Step 3: Iterate through each of these rows
for idx, row in data[mask].iterrows():
    sample_name = row['Sample Name']
    
    # Find the corresponding row with 'Component Name' == 'IPS-13C4_PFOA' and the same 'Sample Name'
    matching_row = data[(data['Component Name'] == 'IPS-13C4_PFOA') & (data['Sample Name'] == sample_name)]
    
    if not matching_row.empty:
        # Update the 'Area IPS' with the value from 'Area' in the matching row
        data.at[idx, 'Area IPS'] = matching_row['Area'].values[0]
data_calibration_excluded = data[(data['Sample Type'] != 'Standard')].copy()
data_calibration = data[data['Sample Type']=='Standard']
pd.set_option('display.max_rows', None)  # Show all rows
pd.set_option('display.max_columns', None)  # Show all columns
pd.set_option('display.max_colwidth', None)  # Show full width of columns

In [4]:
#Area IDA values for IDA components
selected_columns_area = ['Sample Name', 'Sample Index','Acquisition Date & Time','Component Name', 'Area IDA']
area_ida = data_calibration_excluded[selected_columns_area]
area_ida = area_ida[area_ida['Component Name'].str.contains('IDA')]
area_ida.loc[:,'Sample Name Date'] = area_ida['Sample Name'].astype(str) + "_" + area_ida['Acquisition Date & Time']

# Create pivot table with Sample name as the index, component name as the column headers, and area as the values
area_ida_piv = area_ida.pivot_table(index=('Sample Name Date',), columns='Component Name', values='Area IDA', aggfunc='first')

In [5]:
#Calibration Area IDA Average for IDA components
selected_columns_area = ['Sample Name', 'Sample Index','Acquisition Date & Time','Component Name', 'Area IDA']
cal_area_ida = data_calibration[selected_columns_area]
cal_area_ida = cal_area_ida[cal_area_ida['Component Name'].str.contains('IDA')]
cal_area_ida.loc[:,'Sample Name Date'] = cal_area_ida['Sample Name'].astype(str) + "_" + cal_area_ida['Acquisition Date & Time']

# Create pivot table with Sample name as the index, component name as the column headers, and area as the values
cal_area_ida_piv = cal_area_ida.pivot_table(index=('Sample Name Date',), columns='Component Name', values='Area IDA', aggfunc='first')
#Establishes a row with average of each column
mean_cal = cal_area_ida_piv.mean(numeric_only=True)
mean_cal=pd.DataFrame(mean_cal).T
mean_cal.index=['Average']
cal_area_ida_piv = pd.concat([cal_area_ida_piv,mean_cal])

In [6]:
#IPS Area Values for IDA components
selected_columns_area = ['Sample Name', 'Sample Index','Acquisition Date & Time','Component Name', 'Area IPS']
area_ips = data_calibration_excluded[selected_columns_area]
area_ips = area_ips[area_ips['Component Name'].str.contains('IDA')]
area_ips.loc[:,'Sample Name Date'] = area_ips['Sample Name'].astype(str) + "_" + area_ips['Acquisition Date & Time']

# Create pivot table with Sample name as the index, component name as the column headers, and area as the values
area_ips_piv = area_ips.pivot_table(index=('Sample Name Date',), columns='Component Name', values='Area IPS', aggfunc='first')

In [7]:
#IPS Calibration Area and Average for IDA components
selected_columns_area = ['Sample Name', 'Sample Index','Acquisition Date & Time','Component Name', 'Area IPS']
cal_area_ips = data_calibration[selected_columns_area]
cal_area_ips = cal_area_ips[cal_area_ips['Component Name'].str.contains('IDA')]
cal_area_ips.loc[:,'Sample Name Date'] = cal_area_ips['Sample Name'].astype(str) + "_" + cal_area_ida['Acquisition Date & Time']

# Create pivot table with Sample name as the index, component name as the column headers, and area as the values
cal_area_ips_piv = cal_area_ips.pivot_table(index=('Sample Name Date',), columns='Component Name', values='Area IPS', aggfunc='first')
#establishes mean for each column
mean_cal = cal_area_ips_piv.mean(numeric_only=True)
mean_cal=pd.DataFrame(mean_cal).T
mean_cal.index=['Average']
cal_area_ips_piv = pd.concat([cal_area_ips_piv,mean_cal])


In [8]:
#calibration IDA/IPS ratio
cal_ida_ips_ratio = cal_area_ida_piv.loc['Average']/cal_area_ips_piv.loc['Average']
cal_ida_ips_ratio
#Sample IDA/IPS Ratio
sample_ida_ips_ratio = area_ida_piv/area_ips_piv

In [9]:
#IPS normalized recoveries to calibration data
ips_norm_recovery = sample_ida_ips_ratio/cal_ida_ips_ratio
ips_norm_recovery
# color code recoveries
def color_map(val):
    if in_range_min_val <= val <= in_range_max_val:
        return in_range
    elif val < out_range_min_val or val > out_range_max_val:
        return out_range
    else:
        return question_range

# Apply the style function to the entire DataFrame
styled_ips_norm_recovery = ips_norm_recovery.style.applymap(color_map)
styled_ips_norm_recovery

Component Name,IDA-13C2 4:2 FTS,IDA-13C2 6:2 FTS,IDA-13C2 8:2 FTS,IDA-13C2_PFDoA,IDA-13C2_PFTeDA,IDA-13C3_HFPO-DA,IDA-13C3_PFBS,IDA-13C3_PFHxS,IDA-13C4_PFBA,IDA-13C4_PFHpA,IDA-13C5_PFHxA,IDA-13C5_PFPeA,IDA-13C6_PFDA,IDA-13C7_PFUdA,IDA-13C8_FOSA,IDA-13C8_PFOA,IDA-13C8_PFOS,IDA-13C9_PFNA,IDA-d-EtFOSA,IDA-d-MeFOSA,IDA-d3-MeFOSAA,IDA-d5-EtFOSAA
Sample Name Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
240821 PB_09/05/2024 10:47:26,0.923776,1.125332,1.049036,0.08684,0.06586,0.862824,1.079308,1.089281,0.833143,0.927903,0.880329,0.803765,0.545612,0.312109,0.552755,0.882547,0.781124,0.754125,0.041298,0.062844,0.705357,0.625901
LB1_09/05/2024 11:04:02,1.27714,2.512225,1.252502,0.106443,0.018723,0.790796,1.256154,1.165236,0.838198,0.870636,0.844333,0.800051,0.511844,0.273537,0.748403,0.909626,0.760701,0.703573,0.032084,0.052277,0.890472,0.931408
LB2_09/05/2024 11:20:39,3.454902,5.477825,2.838859,0.048519,0.015356,0.997474,2.827415,1.884564,0.868053,0.964943,1.035862,0.923197,0.378651,0.166351,0.999662,0.835767,0.786108,0.622857,0.040581,0.090548,1.220937,1.145031
LB3_09/05/2024 11:37:14,1.638913,2.890124,1.233461,0.042214,0.010682,0.762579,1.660089,1.349791,0.689586,0.821596,0.788265,0.704736,0.360338,0.151969,0.836474,0.745496,0.627121,0.597609,0.017463,0.056945,0.889004,0.805292
LB4_09/05/2024 11:53:51,1.526287,2.268879,1.580288,0.115504,0.030681,0.759844,1.255327,1.161088,0.800189,0.881232,0.854332,0.910457,0.578188,0.362235,0.901524,0.851418,0.786045,0.739415,0.032722,0.08754,0.963985,1.002098
NC1_09/05/2024 12:10:26,1.434282,2.711852,1.498939,0.075141,0.028355,0.749048,1.349429,1.1487,0.751748,0.79459,0.741729,0.757967,0.503041,0.257642,0.737212,0.840989,0.729073,0.682096,0.02911,0.050204,0.808533,0.897808
NC2_09/05/2024 12:27:02,1.212133,1.678171,1.129804,0.094585,0.013467,0.714696,0.847693,0.823825,0.745972,0.795957,0.734525,0.825721,0.692108,0.431268,0.554943,0.781678,0.648364,0.76625,0.006559,0.016462,0.530351,0.663246
NC3_09/05/2024 12:43:36,1.217695,2.30229,1.484035,0.088976,0.058182,0.737074,1.086247,1.015701,0.808349,0.809212,0.777891,0.813499,0.525677,0.321192,0.634808,0.832477,0.687356,0.701581,0.038518,0.054884,0.815196,0.776878
NC4_09/05/2024 13:00:14,1.443563,2.848643,1.797581,0.059463,0.019041,0.879637,1.741706,1.468259,0.846606,0.98357,0.886874,0.828348,0.512032,0.20815,0.889405,0.993041,0.814093,0.675509,0.035299,0.062783,0.99906,0.969725
PFAS CS6 4ng/ml_01/30/2024 14:54:59,0.601713,0.952998,1.121257,0.57102,0.344285,0.301249,0.938827,1.02585,1.004988,0.445215,0.391927,0.972087,0.597701,0.638381,0.961774,0.518911,1.065247,0.547157,0.260189,0.364729,0.934482,0.995357


In [10]:
#Reported Recovery Pivot Table
selected_columns_area = ['Sample Name', 'Sample Index','Acquisition Date & Time','Component Name', 'Reported Recovery']
reported_recovery = data_calibration_excluded[selected_columns_area]
reported_recovery = reported_recovery[reported_recovery['Component Name'].str.contains('IDA')]
reported_recovery.loc[:,'Sample Name Date'] = reported_recovery['Sample Name'].astype(str) + "_" + reported_recovery['Acquisition Date & Time']

# Create pivot table with Sample name as the index, component name as the column headers, and area as the values
reported_recovery_piv = reported_recovery.pivot_table(index=('Sample Name Date',), columns='Component Name', values='Reported Recovery', aggfunc='first')
reported_recovery_piv=reported_recovery_piv/100

In [11]:
#Color map of Reported recoveries
styled_reported_recovery = reported_recovery_piv.style.applymap(color_map)
styled_reported_recovery

Component Name,IDA-13C2 4:2 FTS,IDA-13C2 6:2 FTS,IDA-13C2 8:2 FTS,IDA-13C2_PFDoA,IDA-13C2_PFTeDA,IDA-13C3_HFPO-DA,IDA-13C3_PFBS,IDA-13C3_PFHxS,IDA-13C4_PFBA,IDA-13C4_PFHpA,IDA-13C5_PFHxA,IDA-13C5_PFPeA,IDA-13C6_PFDA,IDA-13C7_PFUdA,IDA-13C8_FOSA,IDA-13C8_PFOA,IDA-13C8_PFOS,IDA-13C9_PFNA,IDA-d-EtFOSA,IDA-d-MeFOSA,IDA-d3-MeFOSAA,IDA-d5-EtFOSAA
Sample Name Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
240821 PB_09/05/2024 10:47:26,0.923776,1.125332,1.049036,0.098645,0.074813,0.980115,1.079308,1.089281,0.833143,1.054041,1.691243,0.803765,0.619782,0.354537,0.552755,1.002519,0.781124,0.856639,0.041298,0.062844,0.705357,0.625901
LB1_09/05/2024 11:04:02,1.27714,2.512225,1.252502,0.126068,0.022175,0.936592,1.256154,1.165236,0.838198,1.031152,1.792273,0.800051,0.60621,0.323968,0.748403,1.07733,0.760701,0.833288,0.032084,0.052277,0.890472,0.931408
LB2_09/05/2024 11:20:39,3.454902,5.477825,2.838859,0.046839,0.014825,0.962942,2.827415,1.884564,0.868053,0.931537,1.560302,0.923197,0.365543,0.160592,0.999662,0.806832,0.786108,0.601294,0.040581,0.090548,1.220937,1.145031
LB3_09/05/2024 11:37:14,1.638913,2.890124,1.233461,0.053553,0.013551,0.967414,1.660089,1.349791,0.689586,1.042284,1.77536,0.704736,0.457127,0.19279,0.836474,0.945742,0.627121,0.758132,0.017463,0.056945,0.889004,0.805292
LB4_09/05/2024 11:53:51,1.526287,2.268879,1.580288,0.135198,0.035912,0.889401,1.255327,1.161088,0.800189,1.031487,1.799936,0.910457,0.676772,0.423998,0.901524,0.996589,0.786045,0.865489,0.032722,0.08754,0.963985,1.002098
NC1_09/05/2024 12:10:26,1.434282,2.711852,1.498939,0.101305,0.038229,1.009867,1.349429,1.1487,0.751748,1.071267,1.783001,0.757967,0.6782,0.347353,0.737212,1.133822,0.729073,0.919603,0.02911,0.050204,0.808533,0.897808
NC2_09/05/2024 12:27:02,1.212133,1.678171,1.129804,0.12877,0.018334,0.973004,0.847693,0.823825,0.745972,1.083634,2.067951,0.825721,0.942252,0.587138,0.554943,1.064194,0.648364,1.04319,0.006559,0.016462,0.530351,0.663246
NC3_09/05/2024 12:43:36,1.217695,2.30229,1.484035,0.114381,0.074794,0.947528,1.086247,1.015701,0.808349,1.040264,2.124271,0.813499,0.675771,0.412901,0.634808,1.070171,0.687356,0.901901,0.038518,0.054884,0.815196,0.776878
NC4_09/05/2024 13:00:14,1.443563,2.848643,1.797581,0.067048,0.02147,0.99184,1.741706,1.468259,0.846606,1.10903,1.38627,0.828348,0.577344,0.2347,0.889405,1.11971,0.814093,0.761674,0.035299,0.062783,0.99906,0.969725
PFAS CS6 4ng/ml_01/30/2024 14:54:59,0.601713,0.952998,1.121257,1.456954,0.87844,0.768636,0.938827,1.02585,1.004988,1.135965,0.419773,0.972087,1.525032,1.628824,0.961774,1.323999,1.065247,1.396069,0.260189,0.364729,0.934482,0.995357


Writes all relevant data to excel file and adds calibration curves.

In [21]:
# create excel with pandas excelwriter
with pd.ExcelWriter(processed_filepath, engine='openpyxl') as writer:
    styled_ips_norm_recovery.to_excel(writer, sheet_name = 'Calculated Recoveries')
    styled_reported_recovery.to_excel(writer, sheet_name = 'Reported Recoveries')
workbook = load_workbook(processed_filepath)
workbook.save(processed_filepath)