This notebook compares the Phenotype Score calculation implementation of FERMO versus Bioactivity-Based Molecular Networking (BioMN), specifically investigating the effect of sample size on the significance of the calculation (p-value).
For this demonstration, the molecular feature with m/z 563.296 at RT 21.5 min was taken, which was proven to be co-responsible for the observed phenotype by the BioMN publication.

In [3]:
!pip install -q scipy numpy

In [4]:
from scipy.stats import pearsonr

In [7]:
fermo_563_21_activ = [16.0, 57.0, 140.0, 17.0, 5.0, 10.5, 68.0]
fermo_563_21_area = [22080.0, 95660.0, 3017000.0, 146400.0, 209200.0, 250600.0, 1258000.0]

bioMN_563_21_activ = [68, 1, 4, 1, 3, 19, 8, 16, 41, 140, 17, 10.5, 5, 57]
bioMN_563_21_area = [67070302.13, 0, 0, 0, 0, 0, 0, 1975562.5, 31625.60742, 170314273.7, 8940636.965, 10201220.78, 2744461.127, 5524503.326]

pearson_s, p_val = pearsonr(fermo_563_21_area, fermo_563_21_activ)
print(f"FERMO's pearson correlation: {pearson_s}")
print(f"FERMO's p-val: {p_val}")

pearson_s, p_val = pearsonr(bioMN_563_21_area, bioMN_563_21_activ)
print(f"BioMN's pearson correlation: {pearson_s}")
print(f"BioMN's p-val: {p_val}")

FERMO's pearson correlation: 0.925249569559756
FERMO's p-val: 0.002817548091649315
BioMN's pearson correlation: 0.9156924207775412
BioMN's p-val: 4.3142761865907734e-06


To prove the effect of the increased number of samples on the significance of the calculation, lets add the difference in values to the FERMO approach

In [8]:
fermo_563_21_activ = [16.0, 57.0, 140.0, 17.0, 5.0, 10.5, 68.0, 1, 4, 1, 3, 19, 8]
fermo_563_21_area = [22080.0, 95660.0, 3017000.0, 146400.0, 209200.0, 250600.0, 1258000.0, 0, 0, 0, 0, 0, 0]
pearson_s, p_val = pearsonr(fermo_563_21_area, fermo_563_21_activ)
print(f"FERMO's pearson correlation with more values: {pearson_s}")
print(f"FERMO's p-val with more values: {p_val}")

FERMO's pearson correlation with more values: 0.9323354044529782
FERMO's p-val with more values: 3.4462663327822203e-06


Let's do it the other way around and remove the null-values from BioMN's approach

In [10]:
bioMN_563_21_activ = [68, 16, 41, 140, 17, 10.5, 5, 57]
bioMN_563_21_area = [67070302.13, 1975562.5, 31625.60742, 170314273.7, 8940636.965, 10201220.78, 2744461.127, 5524503.326]
pearson_s, p_val = pearsonr(bioMN_563_21_area, bioMN_563_21_activ)
print(f"BioMN's pearson correlation: {pearson_s}")
print(f"BioMN's p-val: {p_val}")

BioMN's pearson correlation: 0.9184969137058843
BioMN's p-val: 0.0012721242841522702
