In [1]:
import hdcms as hdc
import matplotlib.pyplot as plt
from PIL import Image

# Library

Now, we will create a list of tuples (`name`, `summary_statistic`) to represent a library of mass spectra.

The resulting library will look like:
```
[
  ("01", [...]),
  ("02", [...]),
  ...,
]
```

In [15]:
compounds = {
    "03": "Acrylfentanyl, C22H26N2O", 
    "04": "p-Fluorobutyryl Fentanyl, C23H29FN2O", 
    "06": "3-Furanyl fentanyl, C24H26N2O2", 
    "10": "4'-methyl Acetyl Fentanyl, C22H28N2O",
    "11": "Carfentanil, C24H30N2O3",
    "12": "p-Methoxyfentanyl, C23H30N2O2",
    "14": "FIBF, C23H29FN2O",
    "15": "p-Fluorofentanyl, C22H27FN2O",
    "16": "Crotonyl Fentanyl, C23H28N2O",
    "25": "Cyclopropyl Fentanyl, C23H28N2O",
}
library = [(number, hdc.regex2stats1d(f"{number}-\\d+.txt", dir="./data/CM1")) for number in compounds.keys()]

Next, we'll define a search function for our library. It will return a list of tuples (`name`, `similarity_score`), sorted by similarity score. The output will look like:

```
[
  ("Acrylfentanyl", 0.0323),
  ("FIBF", 0.9881),
  ...,
]
```

In [17]:
def search(query_stat):
    results = [(compounds[number], round(hdc.compare(sum_stat, query_stat), 3)) for number, sum_stat in library]
    results.sort(key=lambda x: x[1], reverse=True)
    return results

Now we will look at how to use this search function on a previously defined summary statistic. And on a new one.

In [19]:
acrylfentanyl = hdc.regex2stats1d(f"03-\\d+.txt", dir="./data/CM1")

In [20]:
search(acrylfentanyl)

[('Acrylfentanyl, C22H26N2O', 1.0),
 ("4'-methyl Acetyl Fentanyl, C22H28N2O", 0.102),
 ('Carfentanil, C24H30N2O3', 0.093),
 ('Crotonyl Fentanyl, C23H28N2O', 0.09),
 ('Cyclopropyl Fentanyl, C23H28N2O', 0.087),
 ('p-Fluorobutyryl Fentanyl, C23H29FN2O', 0.075),
 ('p-Fluorofentanyl, C22H27FN2O', 0.073),
 ('FIBF, C23H29FN2O', 0.057),
 ('p-Methoxyfentanyl, C23H30N2O2', 0.053),
 ('3-Furanyl fentanyl, C24H26N2O2', 0.028)]

In [25]:
# formatted for latex
results = search(acrylfentanyl)
for (name, score) in results:
    print(f"{name.split(', ')[0]} &", score, '\\\\')

Acrylfentanyl & 1.0 \\
4'-methyl Acetyl Fentanyl & 0.102 \\
Carfentanil & 0.093 \\
Crotonyl Fentanyl & 0.09 \\
Cyclopropyl Fentanyl & 0.087 \\
p-Fluorobutyryl Fentanyl & 0.075 \\
p-Fluorofentanyl & 0.073 \\
FIBF & 0.057 \\
p-Methoxyfentanyl & 0.053 \\
3-Furanyl fentanyl & 0.028 \\


In [21]:
compound28 = hdc.regex2stats1d(r"28-\d+.txt", dir="./data/CM1")
search(compound28)

[('Carfentanil, C24H30N2O3', 0.066),
 ('Acrylfentanyl, C22H26N2O', 0.05),
 ('Crotonyl Fentanyl, C23H28N2O', 0.046),
 ('Cyclopropyl Fentanyl, C23H28N2O', 0.041),
 ("4'-methyl Acetyl Fentanyl, C22H28N2O", 0.04),
 ('p-Methoxyfentanyl, C23H30N2O2', 0.039),
 ('3-Furanyl fentanyl, C24H26N2O2', 0.036),
 ('p-Fluorofentanyl, C22H27FN2O', 0.035),
 ('p-Fluorobutyryl Fentanyl, C23H29FN2O', 0.031),
 ('FIBF, C23H29FN2O', 0.023)]

Notice that in each example, we get a similarity score of 1 for the compound compared with itself. This is expected, since it means they are identical.