### The aim of this analysis is to assign a chemical class to each spotted standard

First, we mapped Classyfire classification onto the standards using compound HMDB IDs. In few cases there was no entry for the compound in HMDB4 (version 2020-09-09), so the closest possible substitute was found (this is recorded).

Then, we manually went through all compounds and manually selected best class for each compound based on our chemistry knowledge. The coarse classification scheme includes 7 categories, while the fine classivication has 30+ categories that we believe are important.

### Map Classyfire classification

In [1]:
from pathlib import Path
import pandas as pd
import numpy as np

In [4]:
p_root_dir = Path.cwd().parent
p_analysis = p_root_dir / "custom_classification"
p_compounds = p_root_dir / r"metadata/compounds_ids.csv"
p_classyfire = p_analysis / "hmdb_classyfire_2020-09-09.csv"

In [3]:
compounds = pd.read_csv(p_compounds)
classyfire = pd.read_csv(p_classyfire)

comp_class = pd.merge(compounds[['internal_id', 'name_short', 'hmdb_primary',
       'is_hmdbid_matching']], classyfire, on='hmdb_primary', how='left')
comp_class.to_csv(p_analysis / 'compounds_classyfire.csv', index=False)

### View resulting custom classification

In [11]:
p_our_clas = p_analysis / "custom_classification.csv"

In [15]:
our_clas = pd.read_csv(p_our_clas)

tab = '\t'
line = '\n'

for x in our_clas.coarse_class.unique():
    
    count = sum(our_clas.coarse_class == x)
    fine = our_clas[our_clas.coarse_class == x].fine_class.unique() 
    print(f"{x}, {count}")
    
    for y in fine:
        count_fine = sum((our_clas.coarse_class == x) & (our_clas.fine_class == y))
    
        print(f"{tab}{y}, {count_fine}")
    print(line)

Amino acids, peptides, and analogues, 52
	Acidic amino acids, 9
	Arginine derivatives (guanidines), 5
	Aromatic amino acids, 6
	Histidine derivatives (imidazoles), 4
	Nonpolar amino acids, 3
	Polar amino acids, 10
	Sulphur-containing amino acids, 10
	Tryptophan derivatives (indoles), 5


Carboxylic acids, 24
	Aromatic acids, 3
	Carboxylic acid phosphate, 2
	Carboxylic acids, 6
	Hydroxy acids, 9
	Keto acid, 4


Carbohydrates, 19
	Carbohydrate amines, 4
	Carbohydrate phosphates, 10
	Carbohydrates, 5


Vitamins and cofactors, 17
	CoA and derivatives, 3
	Flavins, 3
	Folates, 2
	Vitamins and cofactors, 9


Lipids and lipid-like molecules, 39
	Fatty acyl, 5
	Glycerolipids, 3
	Glycerophospholipids, 15
	Prenol lipids, 3
	Sphingolipids, 5
	Steroids and steroid derivatives, 8


Nucleosides, nucleotides, and analogues, 32
	Nicotinamide derivatives, 4
	Nucleobases and analogs, 10
	Nucleosides, 4
	Nucleotides, 14


Amines, 14
	Other amines, 6
	Quarternary ammonium amines, 8


Thermometers, 5
	Therm