# Associations of correction slopes across datasets

Since ion suppression is a complicated and multifactorial process, it is hard to tackle it mechanistically. Although this method revolves around the mere area of overlap as influence parameter, the different ion-specific slopes gathered in the correction of various datasets should still contain molecular information about the susceptibility to ion suppression. An indication for that would be a correlation of the slopes for ions acquired in multiple datasets. As we analyzed two Metabolomics and two Lipidomics datasets, one can compare those separately.

In [None]:
import os
import pandas as pd
import scanpy as sc
import seaborn as sns

In [None]:
source_path = '/home/mklein/FDA_project/data'
datasets = [dir.name for dir in os.scandir(source_path) if dir.is_dir() and dir.name[0] != "."]
datasets

In [None]:
adatas = {} 
for dset in datasets:
    adata = sc.read(os.path.join(source_path, dset, 'corrected_batch_sm_matrix.h5ad'))
    df = adata.var[['corrected_only_using_pool', 'mean_correction_quantreg_slope', 'sum_correction_using_ion_pool']]
    adatas[dset] = df


All slope information is loaded from the respective annotated data matrices. As the datasets usually consist of multiple wells, the reported correction slopes are mean values of the corresponding set of wells. Ions that were only corrected using the reference pool are excluded (one could be more strict by thresholding for a max. fraction of wells corrected by reference pool, e.g. 50%).

In [None]:
df = pd.concat(adatas)
df.index.names = ['dataset', 'ion']
df.reset_index(inplace=True)
df = df[df['corrected_only_using_pool'] == False]

In [None]:
wide_df = df.pivot(index='ion', columns='dataset', values='mean_correction_quantreg_slope')

In [None]:
sns.pairplot(wide_df)
wide_df.corr(method="spearman")

No metabolites were found in both Metabolomics datasets as the ions in the coculture set were only available as sum formulas without a specific adduct. All the other datasets use sum formulas with specific adducts. However, between the Lipidomics datasets, a number of ions overlap and their slopes have a positive correlation with Spearman r = 0.583. 

In [None]:
sns.lmplot(wide_df[['Lx_Glioblastoma', 'Lx_Pancreatic_Cancer']], x='Lx_Pancreatic_Cancer', y='Lx_Glioblastoma')
wide_df[['Lx_Glioblastoma', 'Lx_Pancreatic_Cancer']].corr(method='spearman')

In order to enable a comparison of the metabolites annotated in the Metabolomics datasets, one has to strip the adducts from the ions in the Seahorse dataset. However, due to the low number of metabolites annotated for the coculture dataset (58), still only a very small set of 8 jointly annotated metabolites can be found. The corresponding slopes do not show a positive correlation.

In [None]:
df["ion_stripped"] = df['ion'].str.extract(r'([^-^+]+)')
df_stripped = df.groupby(['dataset', 'ion_stripped']).mean(numeric_only=True).reset_index()

In [None]:
wide_df_stripped = df_stripped.pivot(index='ion_stripped', columns='dataset', values='mean_correction_quantreg_slope')
wide_df_stripped = wide_df_stripped.reset_index()

In [None]:
sns.lmplot(wide_df_stripped[['Mx_Co_Cultured', 'Mx_Seahorse']], x='Mx_Co_Cultured', y='Mx_Seahorse')
wide_df_stripped[['Mx_Co_Cultured', 'Mx_Seahorse']].corr(method='spearman')