name: lcms_data_processing
date: 08/20/2024
version: 1.0
author: Justin Sankey

description: Takes raw liquit chromatography mass spectroscopy (LCMS) data, 
extracts relevant paramters for analysis and writes them to excel.

When you execute the notebook for the first time you need to install all required python packages.
So, type the following commands in your python console or anaconda prompt:
- pip install pandas
- pip install numpy
- pip install matplotlib
- pip install seaborn
- pip install scikit-learn
- pip install jinja2

In [45]:
# import all needed packages
import os
import re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from openpyxl import load_workbook
from openpyxl.drawing.image import Image
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

This is the only cell that you should have to make edits to.
Enter in your desired input and output file paths 
and change what you deem to be an acceptable recovery range.
*replaced .txt with /t separtaor with .csv with , separator*
*should we implement both options? What will be used?*
*New: indicate directory to save plots to*.


In [46]:
# raw data upload file path
raw_filepath = r'C:\Users\jhsan\OneDrive\Desktop\ACF_Project\Raw_Data\20240903_pfas_kynol_ks_single_compound.csv'
# file path for IDL and IQL data
IDL_IQL_filepath = r'C:\Users\jhsan\OneDrive\Desktop\ACF_Project\ACF\IDL_IQL.csv'

# processed data output file path
processed_filepath =r'C:\Users\jhsan\OneDrive\Desktop\ACF_Project\Processed_Data\20240903_pfas_kynol_ks_single_compound_processed.xlsx'

# file path to write QCS0 data to 
qcs0_filepath = r'C:\Users\jhsan\OneDrive\Desktop\QCS0_area_values\QCS0_area_values.csv' 

# directory to save plots to
plot_directory = r'C:\Users\jhsan\OneDrive\Desktop\ACF_Project\Processed_Data\Plots'

#color-coding for recoveries table
in_range = 'background-color: green'
in_range_min_val = 0.6 
in_range_max_val = 1.4
out_range = 'background-color: red'
out_range_min_val = 0.4 
out_range_max_val = 1.8
question_range = 'background-color: yellow'

In [47]:
# Load data file and remove calibration data
data = pd.read_csv(raw_filepath, delimiter=',', low_memory=False, header=0,)
data_calibration_excluded = data[(data['Sample Type'] != 'Standard')].copy()
data_calibration = data[data['Sample Type']=='Standard']

In [48]:
# extract values of isotope dilution analysis (IDA) and save areas of intensity peaks to the variable ida_area
#selected_columns_area = ['Sample Name', 'Sample Index','Acquisition Date & Time','Component Name', 'Area']
#ida_area = data_calibration_excluded[selected_columns_area]
#ida_area = ida_area[ida_area['Component Name'].str.contains('IDA')].copy()
#ida_area.loc[:,'Sample Name Date'] = ida_area['Sample Name'].astype(str) + "_" + ida_area['Acquisition Date & Time']

# Create pivot table with Sample name as the index, component name as the column headers, and area as the values
#ida_area_piv = ida_area.pivot_table(index=('Sample Name Date',), columns='Component Name', values='Area', aggfunc='first')

In [49]:
#Area IDA values for IDA components
selected_columns_area = ['Sample Name', 'Sample Index','Acquisition Date & Time','Component Name', 'Area IDA']
area_ida = data_calibration_excluded[selected_columns_area]
area_ida = area_ida[area_ida['Component Name'].str.contains('IDA')]
area_ida.loc[:,'Sample Name Date'] = area_ida['Sample Name'].astype(str) + "_" + area_ida['Acquisition Date & Time']

# Create pivot table with Sample name as the index, component name as the column headers, and area as the values
area_ida_piv = area_ida.pivot_table(index=('Sample Name Date',), columns='Component Name', values='Area IDA', aggfunc='first')


In [50]:
#Calibration Area IDA Average for IDA components
selected_columns_area = ['Sample Name', 'Sample Index','Acquisition Date & Time','Component Name', 'Area IDA']
cal_area_ida = data_calibration[selected_columns_area]
cal_area_ida = cal_area_ida[cal_area_ida['Component Name'].str.contains('IDA')]
cal_area_ida.loc[:,'Sample Name Date'] = cal_area_ida['Sample Name'].astype(str) + "_" + cal_area_ida['Acquisition Date & Time']

# Create pivot table with Sample name as the index, component name as the column headers, and area as the values
cal_area_ida_piv = cal_area_ida.pivot_table(index=('Sample Name Date',), columns='Component Name', values='Area IDA', aggfunc='first')
#Establishes a row with average of each column
mean_cal = cal_area_ida_piv.mean(numeric_only=True)
mean_cal=pd.DataFrame(mean_cal).T
mean_cal.index=['Average']
cal_area_ida_piv = pd.concat([cal_area_ida_piv,mean_cal])


In [51]:
#IPS Area Values for IDA components
selected_columns_area = ['Sample Name', 'Sample Index','Acquisition Date & Time','Component Name', 'Area IPS']
area_ips = data_calibration_excluded[selected_columns_area]
area_ips = area_ips[area_ips['Component Name'].str.contains('IDA')]
area_ips.loc[:,'Sample Name Date'] = area_ips['Sample Name'].astype(str) + "_" + area_ips['Acquisition Date & Time']

# Create pivot table with Sample name as the index, component name as the column headers, and area as the values
area_ips_piv = area_ips.pivot_table(index=('Sample Name Date',), columns='Component Name', values='Area IPS', aggfunc='first')


In [52]:
#IPS Calibration Area and Average for IDA components
selected_columns_area = ['Sample Name', 'Sample Index','Acquisition Date & Time','Component Name', 'Area IPS']
cal_area_ips = data_calibration[selected_columns_area]
cal_area_ips = cal_area_ips[cal_area_ips['Component Name'].str.contains('IDA')]
cal_area_ips.loc[:,'Sample Name Date'] = cal_area_ips['Sample Name'].astype(str) + "_" + cal_area_ida['Acquisition Date & Time']

# Create pivot table with Sample name as the index, component name as the column headers, and area as the values
cal_area_ips_piv = cal_area_ips.pivot_table(index=('Sample Name Date',), columns='Component Name', values='Area IPS', aggfunc='first')
#establishes mean for each column
mean_cal = cal_area_ips_piv.mean(numeric_only=True)
mean_cal=pd.DataFrame(mean_cal).T
mean_cal.index=['Average']
cal_area_ips_piv = pd.concat([cal_area_ips_piv,mean_cal])


In [53]:
#calibration IDA/IPS ratio
cal_ida_ips_ratio = cal_area_ida_piv.loc['Average']/cal_area_ips_piv.loc['Average']
cal_ida_ips_ratio
#Sample IDA/IPS Ratio
sample_ida_ips_ratio = area_ida_piv/area_ips_piv


In [54]:
#IPS normalized recoveries to calibration data
ips_norm_recovery = sample_ida_ips_ratio/cal_ida_ips_ratio
ips_norm_recovery
# color code recoveries
def color_map(val):
    if in_range_min_val <= val <= in_range_max_val:
        return in_range
    elif val < out_range_min_val or val > out_range_max_val:
        return out_range
    else:
        return question_range

# Apply the style function to the entire DataFrame
styled_ips_norm_recovery = ips_norm_recovery.style.applymap(color_map)


In [55]:
#Reported Recovery Pivot Table
selected_columns_area = ['Sample Name', 'Sample Index','Acquisition Date & Time','Component Name', 'Reported Recovery']
reported_recovery = data_calibration_excluded[selected_columns_area]
reported_recovery = reported_recovery[reported_recovery['Component Name'].str.contains('IDA')]
reported_recovery.loc[:,'Sample Name Date'] = reported_recovery['Sample Name'].astype(str) + "_" + reported_recovery['Acquisition Date & Time']

# Create pivot table with Sample name as the index, component name as the column headers, and area as the values
reported_recovery_piv = reported_recovery.pivot_table(index=('Sample Name Date',), columns='Component Name', values='Reported Recovery', aggfunc='first')
reported_recovery_piv=reported_recovery_piv/100


In [56]:
#Color map of Reported recoveries
styled_reported_recovery = reported_recovery_piv.style.applymap(color_map)

In [57]:
# The following file extracts quality control standard data (QSC0) from the isotope dilution analaysis (IDA) 
# and appends it to an excisting QCS0 file which is indicated in the second code block
# (*qcs0_filepath*).

# Filter rows where 'Sample Name' contains 'QCS0'
#qcs0_samples = ida_area_piv.reset_index()
#qcs0_samples = qcs0_samples[qcs0_samples['Sample Name Date'].str.contains('QCS0')]

# Append the filtered DataFrame to an existing CSV file
#if os.path.exists(qcs0_filepath):
    # Load the existing data
   # qsc0_existing_data = pd.read_csv(qcs0_filepath)
    
    # Combine existing data with new data, avoiding duplicates
    #qsc0_combined_data = pd.concat([qsc0_existing_data, qcs0_samples]).drop_duplicates(subset=['Sample Name Date'])
    
    # Write back to the CSV without writing headers again
    #qsc0_combined_data.to_csv(qcs0_filepath, index=False)
#else:
    # If the file doesn't exist, write the data with headers
    #qcs0_samples.to_csv(qcs0_filepath, index=False)

Computation of recovery rates: for each IDA row take area and divide it by average area of QCS0 values.

In [58]:
# Calculate recoveries
# Average QCS0 from saved csv file
#qsc0_combined_data=pd.read_csv(qcs0_filepath, header=0, low_memory=False)
#qsc0_combined_data.drop(columns=['Sample Name Date'], inplace=True)
#qsc0_avg = qsc0_combined_data.mean()

# Generate base of recovery table by copying the area pivot data
#recovery = ida_area_piv.copy()

# Recovery calculation (sample area/avg QCS0 area)
#for index, row in ida_area_piv.iterrows():
    #for col in ida_area_piv.columns:
       # if ida_area_piv[col].dtype in ['float64', 'int64']:
           # recovery.at[index, col] = row[col] / qsc0_avg[col]

In [59]:
# color code recoveries
#def color_map(val):
    #if in_range_min_val <= val <= in_range_max_val:
        #return in_range
    #elif val < out_range_min_val or val > out_range_max_val:
        #return out_range
    #else:
        #return question_range

# Apply the style function to the entire DataFrame
#styled_recovery = recovery.style.applymap(color_map)

Compute method detection limits based on avaerage and standard deviation of process blanks. Use instrument detection limits (IDL) for the PFAS compounds not included in the process blanks.

In [60]:
selected_columns = ['Sample Name', 'Component Name', 'Calculated Concentration']

# Isolate blank data and remove IDA/IPS values
data_isolated_blank = data_calibration_excluded[data_calibration_excluded['Sample Comment'].str.contains('Blank')]
data_isolated_blank = data_isolated_blank[selected_columns]
data_isolated_blank = data_isolated_blank[~data_isolated_blank['Component Name'].str.contains('IDA|IPS')]

# Create pivot table with Sample name as the index, component name as the column header, and concentration as the value
blank_data_piv = data_isolated_blank.pivot_table(index='Sample Name', columns='Component Name', values='Calculated Concentration', aggfunc='first')

# Isolate the process blank data
process_blank_data = blank_data_piv[blank_data_piv.index.str.contains('PB')]

# replace all <1 point values with NaN values
process_blank_data = process_blank_data.replace("<1 points", np.nan)

# Change any non numeric values to numeric
process_blank_data = process_blank_data.apply(pd.to_numeric, errors='coerce')

# calculate the average PB value excluding NaN values
process_blank_data_avg = np.nanmean(process_blank_data, axis=0)
process_blank_data.loc['PB_avg'] = process_blank_data_avg

# calculate the standard deviation
process_blank_data_stdev = process_blank_data.std(skipna=True)
process_blank_data.loc['PB_stdev'] = process_blank_data_stdev

# MDL calculation PB_avg + 3 * PB_stdev
process_blank_data.loc['MDL'] = np.nan_to_num(process_blank_data.loc['PB_avg']) + 3 * np.nan_to_num(process_blank_data.loc['PB_stdev'])
process_blank_data.loc['MDL'] = process_blank_data.loc['MDL'].replace(0, np.nan)

#Load IDL_IQL file
IDL_IQL = pd.read_csv(IDL_IQL_filepath, index_col=0, low_memory=False)

#change all non numeric values to numeric
IDL_IQL = IDL_IQL.apply(pd.to_numeric, errors='coerce')
IQL = IDL_IQL.loc[['IQL']]  
IQL = IQL.apply(pd.to_numeric, errors='coerce')

#replace all NaN values in MDL with the IQL value
common_columns = process_blank_data.columns.intersection(IQL.columns)
process_blank_data.loc['MDL', common_columns] = process_blank_data.loc['MDL', common_columns].combine_first(IQL.squeeze())



Load concentration data and apply method detection limits (MDL) to filter values below MDL.
Values below MDL are replaced with \< MDL.

Remark 1: mix of floats and strings.
Remark 2: when calculation of concentration outputs NA, it is converted to \< MDL, ist this really what we want to do?

In [61]:
# Load concentration data and exclude blanks
selected_columns = ['Sample Name', 'Component Name', 'Calculated Concentration']
data_blank_excluded = data_calibration_excluded[data_calibration_excluded['Sample Comment'] != 'Blank'][selected_columns].copy()
data_blank_excluded = data_blank_excluded[~data_blank_excluded['Component Name'].str.contains('IDA|IPS')]
# selects only columns available in QCS0 data
data_blank_excluded_table = data_blank_excluded.copy().pivot_table(
    index='Sample Name', columns='Component Name', values='Calculated Concentration', aggfunc='first'
    )
data_blank_excluded_table = data_blank_excluded_table.replace("<1 points", np.nan)

# Align the columns of PB with data_without_blank
common_columns = data_blank_excluded_table.columns.intersection(process_blank_data.columns)
PB_aligned = process_blank_data[common_columns]

# Convert MDL values to numeric to handle both numeric and string types
mdl_values = pd.to_numeric(PB_aligned.loc['MDL'], errors='coerce')

# Use np.where with numeric comparison
data_blank_excluded_table[common_columns] = np.where(
    data_blank_excluded_table[common_columns].apply(pd.to_numeric, errors='coerce') < mdl_values.values,
    "<MDL", data_blank_excluded_table[common_columns]
    )
data_blank_excluded_table = data_blank_excluded_table.fillna('<MDL')

In [62]:
# correct channel names in original data (all of the TOF channels are labelled by _TOF MS, only 2 of them are labeled by only _TOF)
mask_names = data_blank_excluded['Component Name'].str.endswith('_TOF')
data_blank_excluded['Component Name'][mask_names] = [compound + ' MS' for compound in data_blank_excluded['Component Name'][mask_names].to_list()]

# get list of all compounds
compounds_list = data_blank_excluded['Component Name'].value_counts().index.to_list()

# sort compounds by channels
compounds_list_channel_tof = [compound for compound in compounds_list if '_TOF MS' in compound]
compounds_list_channel_default = [compound for compound in compounds_list if '_TOF MS' not in compound]

# try to assign TOF channel_name to each default channel name and keep record of TOF channel values lost in the process
compounds_sorted_default = []
compounds_sorted_tof = []
for compound in compounds_list_channel_default:
    if compound + '_TOF MS' in compounds_list_channel_tof:
        compounds_sorted_default.append(compound)
        compounds_sorted_tof.append(compound + '_TOF MS')
    else:
        print(f'{compound} is lost')

print(f'{[compound for compound in compounds_list_channel_tof if compound not in compounds_sorted_tof]} are lost')
mapper = {old_name: new_name for (old_name, new_name) in zip(compounds_sorted_tof, compounds_sorted_default)}

# create table with sample name as row, compound as header and calculated concentration as value for default channel
concentration_table_all = data_blank_excluded.pivot_table(
    values='Calculated Concentration', index=['Sample Name'], columns=['Component Name'], aggfunc='first', dropna=False
    )
# extract default channel compound columns and tof channel compound values respectively
concentration_table_default = concentration_table_all[compounds_sorted_default]
concentration_table_default = concentration_table_default.replace({None: np.nan})
concentration_table_tof = concentration_table_all[compounds_sorted_tof]
concentration_table_tof = concentration_table_tof.replace({None: np.nan})

print(concentration_table_tof, concentration_table_default)

# cleanup (string to np.nan, and everything else to float)
concentration_table_default_np = concentration_table_default.replace(
    {'<1 points': np.nan, '< 0': np.nan, 'no root': np.nan, }
    ).to_numpy().astype(float)
concentration_table_tof_np = concentration_table_tof.replace(
    {'<1 points': np.nan, '< 0': np.nan, 'no root': np.nan, }
    ).to_numpy().astype(float)

# calculate percentage difference and transform back to dataframe
ratio = np.divide(
    200 * np.subtract(concentration_table_default_np, concentration_table_tof_np),
    np.add(concentration_table_default_np, concentration_table_tof_np),
    )
concentration_table_ratio = pd.DataFrame(
    ratio, index=concentration_table_default.index, columns=concentration_table_default.columns,
    ).round(decimals=1)

# color code percentage deviation
def deviation_color_map(val):
    if val is np.nan:
        return
    if -30 <= val <= 30:
        return in_range
    elif val < -30 or val > 30:
        return out_range
    

# Apply the style function to the entire DataFrame
styled_channel_percentage_difference = concentration_table_ratio.style.applymap(deviation_color_map)


FOEA_TOF_MS is lost
['PFPrA_TOF MS', 'N-TAmP-FHxSA_TOF MS', 'N-AP-FHxSA_TOF MS', 'PFEESA_TOF MS', 'PFECHS_TOF MS', 'PF5OHxA_TOF MS', 'PF4OPeA_TOF MS', '3,6-OPFHpA_TOF MS', '11ClPF3OUdS_TOF MS', '9ClPF3ONS_TOF MS', 'DONA_TOF MS', 'HFPO-DA_TOF MS', 'EtFOSA_TOF MS', 'MeFOSA_TOF MS', 'Br-N-EtFOSAA_TOF MS', 'L-N-EtFOSAA_TOF MS', 'Br-N-MeFOSAA_TOF MS', 'L-N-MeFOSAA_TOF MS', '13C3_PFBA_TOF MS', '13C4_PFBA_TOF MS', '13C5_PFPeA_TOF MS', '13C3_PFBS_TOF MS', '13C2 6:2 FTS_TOF MS', '13C2 4:2 FTS_TOF MS', '13C3_HFPO-DA_TOF MS', 'd-EtFOSA_TOF MS', 'd-MeFOSA_TOF MS', 'd5-EtFOSAA_TOF MS', 'd3-MeFOSAA_TOF MS', '13C8_FOSA_TOF MS', '13C8_PFOS_TOF MS', '13C4_PFOS_TOF MS', '13C3_PFHxS_TOF MS', '13C2_PFTeDA_TOF MS', '13C2_PFHxA_TOF MS', '13C2_PFDoA_TOF MS', '13C7_PFUdA_TOF MS', '13C6_PFDA_TOF MS', '13C2_PFDA_TOF MS', '13C9_PFNA_TOF MS', '13C5_PFNA_TOF MS', '13C8_PFOA_TOF MS', '13C2_PFOA_TOF MS', '13C4_PFOA_TOF MS', '13C4_PFHpA_TOF MS', '13C5_PFHxA_TOF MS', '7:3 FTCA_TOF MS', '5:3 FTCA_TOF MS', '3:3 FTCA_TOF

Determine linear calibration curve based on calibration data.
*to be discussed: method for R2 computation*

In [63]:
# Function to sanitize file names
def sanitize_filename(name):
    """Removes special characters from PFAS name to create valid directory names."""    
    return re.sub(r'[\\/*?:"<>|]', "_", name)

# extract PFAS data
selected_columns = ['Sample Name', 'Component Name', 'Actual Concentration','IS Actual Concentration','Area','IS Area']
data_pfas = data[data['Sample Name'].str.contains('PFAS CS')].copy()
data_pfas = data_pfas.fillna(0)
data_pfas = data_pfas[~data_pfas['Component Name'].str.contains('IDA|IPS|13C|d3|d5')]
data_pfas_ = data_pfas[~data_pfas['Component Name'].str.contains('TOF')]
data_pfas_['Concentration/IS Concentration'] = data_pfas_['Actual Concentration']/data_pfas_['IS Actual Concentration']
data_pfas_['Area/IS Area'] = data_pfas_['Area']/data_pfas_['IS Area']

# Create a list of unique sample names and count them.
components = data_pfas_['Component Name'].unique()
n_components = len(components)

image_paths = []
# Iterate over each component and create scatter plots with regression lines
for i, component in enumerate(components):
    component_data = data_pfas_[data_pfas_['Component Name'] == component]
    
    # Extract x and y values
    x = component_data['Concentration/IS Concentration'].values.reshape(-1, 1)
    y = component_data['Area/IS Area'].values
    
    # Perform linear regression
    model = LinearRegression()
    model.fit(x, y)
    y_pred = model.predict(x)
    r2 = r2_score(y, y_pred)

    # Regression equation
    slope = model.coef_[0]
    intercept = model.intercept_
    equation = f'y = {slope:.2f}x + {intercept:.2f}'

    plt.figure(figsize = (8,6))
    # Plot with Seaborn
    sns.regplot(
       # ax=axes[i], 
        x=x.flatten(), 
        y=y, 
        scatter=True, 
        fit_reg=True,
        line_kws={"color": "red"},  # Color of the regression line
        scatter_kws={"s": 50, "alpha": 0.7},  # Customize scatter points
        ci=95
    )
    # Set the title with the component name
    plt.title(f'{component}')
    plt.xlabel('Concentration/IS Concentration')
    plt.ylabel('Area/IS Area')
    plt.text(0.05, 0.95, f'{equation}\n$R^2$ = {r2:.2f}', 
             transform=plt.gca().transAxes, 
             fontsize=10, 
             verticalalignment='top', 
             bbox=dict(boxstyle="round,pad=0.3", edgecolor="black", facecolor="white"))
   
    # Sanitize the file name
    sanitized_component = sanitize_filename(component)
    image_path = os.path.join(plot_directory, f'{sanitized_component}.png')
    
    # Save the plot as an image
    plt.savefig(image_path)
    plt.close()
    
    image_paths.append(image_path)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if sys.path[0] == '':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  del sys.path[0]


Writes all relevant data to excel file and adds calibration curves.

In [64]:
# create excel with pandas excelwriter
with pd.ExcelWriter(processed_filepath, engine='openpyxl') as writer:
    #styled_recovery.to_excel(writer, sheet_name='Recoveries')
    styled_ips_norm_recovery.to_excel(writer, sheet_name = 'Calculated Recoveries')
    styled_reported_recovery.to_excel(writer, sheet_name = 'Reported Recoveries')
    process_blank_data.to_excel(writer, sheet_name='MDL_Values')
    data_blank_excluded_table.to_excel(writer, sheet_name='Concentration_filtered_MDL')
    #ida_area_piv.to_excel(writer, sheet_name='Area_Pivot')
    cal_ida_ips_ratio.to_excel(writer, sheet_name='Calibration IDA_IPS Ratio')
    cal_area_ida_piv.to_excel(writer, sheet_name='Calibration IDA')
    cal_area_ips_piv.to_excel(writer, sheet_name='Calibration IPS')
    sample_ida_ips_ratio.to_excel(writer, sheet_name='Sample IDA_IPS Ratio')
    area_ida_piv.to_excel(writer, sheet_name='Sample IDA')
    area_ips_piv.to_excel(writer, sheet_name='Sample IPS')
    IDL_IQL.to_excel(writer, sheet_name='IDL_IQL')
    data_pfas_.to_excel(writer, sheet_name='Calibration Data')
    concentration_table_default.to_excel(
        writer, sheet_name='Calculated concentration', na_rep='=na()'
        )
    concentration_table_tof.to_excel(
        writer, sheet_name='TOF Calculated concentration', na_rep='=na()'
        )
    styled_channel_percentage_difference.to_excel(
        writer, sheet_name='Concentration Difference (%)', na_rep='=na()'
        )

workbook = load_workbook(processed_filepath)
plot_sheet = workbook.create_sheet('Calibration Curves')
    
# Insert all images into one sheet in a grid format
row_offset = 1  # Start at the first row
col_offset = 1  # Start at the first column
images_per_row = 2  # Number of images per row

for i, image_path in enumerate(image_paths):
    # Calculate the position for each image
    row_position = row_offset + (i // images_per_row) * 15  # Adjust the multiplier to control spacing
    col_position = col_offset + (i % images_per_row) * 20   # Adjust the multiplier to control spacing
    
    # Load the image
    img = Image(image_path)
    
    # Place the image at the calculated position
    cell_position = plot_sheet.cell(row=row_position, column=col_position).coordinate
    plot_sheet.add_image(img, cell_position)

workbook.save(processed_filepath)
