# LOTUS - dataset overview

[LOTUS initiative](https://lotus.nprod.net/) is a database of natural products connected to organisms. Befor expanding the database, we like to have an overview about the last LOTUS datasest.

[latest LOTUS dataset (v10)](https://zenodo.org/records/7534071)

More info in the [Paper (DOI:70780)](https://elifesciences.org/articles/70780)

## load dataset

In [2]:
# Example of loading LOTUS datasets with polars (python module)
import polars as pl
import numpy as np
from defl import *  # module for handling LOTUS and MINES

df_lotus = pl.read_parquet("../data/LOTUS/230106_frozen_metadata_cleaned.parquet")
print(f"all columns of LOTUS (total: {df_lotus.shape[1]}): \n{df_lotus.columns}")

all columns of LOTUS (total: 39): 
['structure_wikidata', 'structure_inchikey', 'structure_inchi', 'structure_smiles', 'structure_molecular_formula', 'structure_exact_mass', 'structure_xlogp', 'structure_smiles_2D', 'structure_cid', 'structure_nameIupac', 'structure_nameTraditional', 'structure_stereocenters_total', 'structure_stereocenters_unspecified', 'structure_taxonomy_npclassifier_01pathway', 'structure_taxonomy_npclassifier_02superclass', 'structure_taxonomy_npclassifier_03class', 'structure_taxonomy_classyfire_chemontid', 'structure_taxonomy_classyfire_01kingdom', 'structure_taxonomy_classyfire_02superclass', 'structure_taxonomy_classyfire_03class', 'structure_taxonomy_classyfire_04directparent', 'organism_wikidata', 'organism_name', 'organism_taxonomy_gbifid', 'organism_taxonomy_ncbiid', 'organism_taxonomy_ottid', 'organism_taxonomy_01domain', 'organism_taxonomy_02kingdom', 'organism_taxonomy_03phylum', 'organism_taxonomy_04class', 'organism_taxonomy_05order', 'organism_taxono

In [5]:
# example of "limonene" entry

df = df_lotus.filter((pl.col("structure_nameTraditional") == "Limonene") & (pl.col("organism_taxonomy_09species") == "Cannabis sativa"))[0, :]

#df.write_csv("Limonene.csv", separator=",")

The 39 columns are separated in **natural products (NP)**, **chemical structure**,  **organism** and the **source**.  

In [11]:
import numpy as np
import matplotlib.pyplot as plt 
import seaborn as sns

for columnname in df_lotus.columns:
    # count the appearance of each "word" in the given column
    df_lotus_plot = df_lotus.select(pl.col(columnname).value_counts(sort=True, name="n")).unnest(columnname)

    # print the 5 most common names
    print(f"{columnname} (shape: {df_lotus_plot.shape[0]:_}) : {df_lotus_plot[0:5, :]}")

structure_wikidata (shape: 220_783) : shape: (5, 2)
┌────────────────────────────────────────┬──────┐
│ structure_wikidata                     ┆ n    │
│ ---                                    ┆ ---  │
│ str                                    ┆ u32  │
╞════════════════════════════════════════╪══════╡
│ http://www.wikidata.org/entity/Q209727 ┆ 4156 │
│ http://www.wikidata.org/entity/Q121802 ┆ 3142 │
│ http://www.wikidata.org/entity/Q425004 ┆ 3006 │
│ http://www.wikidata.org/entity/Q409478 ┆ 2390 │
│ http://www.wikidata.org/entity/Q278809 ┆ 2294 │
└────────────────────────────────────────┴──────┘
structure_inchikey (shape: 220_823) : shape: (5, 2)
┌─────────────────────────────┬──────┐
│ structure_inchikey          ┆ n    │
│ ---                         ┆ ---  │
│ str                         ┆ u32  │
╞═════════════════════════════╪══════╡
│ IPCSVZSSVZVIGE-UHFFFAOYSA-N ┆ 4156 │
│ KZJWDPNRJALLNS-VJSFXXLFSA-N ┆ 3142 │
│ HCXVJBMSMIARIN-PHZDYDNGSA-N ┆ 3006 │
│ REFJWTPEDVJJIY-UHFFFAOYSA-N ┆ 23

## Inputfile (Wikidata and SMILES)

For the pickaxe (MINEs) it is necessary to provide a SMILE and a ID.  
In this case would be used the coloumns **3D SMILES (structure_smiles)** and **structure_wikidata**.  

In [3]:
# write in bold the specific title
print("\033[1m\nuniqueness over categories:\033[0m")

# unique counts only of the "chemical structure" columns
unique_counts = df_lotus.select(
    df_lotus.columns[0:20]
    ).n_unique()
print("all structure columns:", str(unique_counts))

# unique counts only of the "organism" columns []
unique_counts = df_lotus.select(
    df_lotus.columns[21:35]
    ).n_unique()
print("all organism columns:", str(unique_counts))

# unique counts for the full dataset
unique_counts = df_lotus.select(
    df_lotus.columns[:]
    ).n_unique()
print("all columns:", str(unique_counts))

# unique counts for the pickaxe/MINEs input
print("\033[1m\nImportant Info for pickaxe (MINEs):\033[0m")
print(f'rows for LOTUS dataset: {df_lotus.shape[0]}')
print(f'\nunique "structure_smiles": {df_lotus.select(["structure_smiles"]).n_unique()}')
print(f'unique "structure_wikidata": {df_lotus.select(["structure_wikidata"]).n_unique()}')
print(f'unique "structure_smiles" and "structure_wikidata": {df_lotus.select(["structure_smiles", "structure_wikidata"]).n_unique()}')
print(f'\nunique "structure_smiles": {df_lotus.select(["structure_smiles"]).n_unique()}')
print(f'unique "structure_inchi": {df_lotus.select(["structure_inchi"]).n_unique()}')
print(f'unique "structure_smiles" and "structure_inchi": {df_lotus.select(["structure_smiles", "structure_inchi"]).n_unique()}')

[1m
uniqueness over categories:[0m
all structure columns: 257226
all organism columns: 36803
all columns: 792364
[1m
Important Info for pickaxe (MINEs):[0m
rows for LOTUS dataset: 792364

unique "structure_smiles": 220820
unique "structure_wikidata": 220783
unique "structure_smiles" and "structure_wikidata": 220834

unique "structure_smiles": 220820
unique "structure_inchi": 220823
unique "structure_smiles" and "structure_inchi": 220823


The uniqueness will be reduced from the full dataset (792'364 rows) to the pickaxe inputfile "structure_smiles" and "structure_wikidata" (220'834 rows), which is **27.9%** of the full dataset.  

This are good news, so the dataset will be reduced drasticaly.  

Which is a little unusual that we have more SMILES than wikidata links. 
This could be, because not all SMILES have a wikidata link yet, which in turn cannot be because LOTUS is ‘complete’ and has no empty fields.  

If we look at the inchi's, it is also a little bit surprising, that there should be 3 duplicates. 

In [4]:
# make a dataframe with only the two columns: "structure_smiles" and "structure_wikidata"
df_lotus_for_pickaxe_with_wikidata = df_lotus.select(["structure_smiles", "structure_wikidata"]).unique()

# search for the duplicates and print them
df_lotus_for_pickaxe_with_wikidata_duplicates = df_lotus_for_pickaxe_with_wikidata.filter(df_lotus_for_pickaxe_with_wikidata.select(['structure_smiles']).is_duplicated()).sort('structure_smiles')
print(f'{df_lotus_for_pickaxe_with_wikidata_duplicates[0:2, :]}')


# make a dataframe with only the two columns: "structure_smiles" and "structure_inchi"
df_lotus_for_pickaxe_with_inchi = df_lotus.select(["structure_smiles", "structure_inchi"]).unique()

# search for the duplicates and print them
df_lotus_for_pickaxe_with_inchi_duplicates = df_lotus_for_pickaxe_with_inchi.filter(df_lotus_for_pickaxe_with_inchi.select(['structure_smiles']).is_duplicated()).sort('structure_smiles')
print(f'{df_lotus_for_pickaxe_with_inchi_duplicates[0:2, :]}')


shape: (2, 2)
┌───────────────────┬──────────────────────────────────────────┐
│ structure_smiles  ┆ structure_wikidata                       │
│ ---               ┆ ---                                      │
│ str               ┆ str                                      │
╞═══════════════════╪══════════════════════════════════════════╡
│ C1CCC2(CCCCO2)OC1 ┆ http://www.wikidata.org/entity/Q55620521 │
│ C1CCC2(CCCCO2)OC1 ┆ http://www.wikidata.org/entity/Q804105   │
└───────────────────┴──────────────────────────────────────────┘
shape: (2, 2)
┌─────────────────────────────────────────────────┬────────────────────────────────────────────────┐
│ structure_smiles                                ┆ structure_inchi                                │
│ ---                                             ┆ ---                                            │
│ str                                             ┆ str                                            │
╞═══════════════════════════════════════════════

The real reason is, that some SMILES have two wikidata entries, which is confusing. If we check the links, they are both linked to the "same" chemical component.
This makes also sense, because the amount of this duplicated entries (28) correspond to the amount of entries of only SMILES (220820) plus the additinal entries (14).

The same problem we can see with the INCHI's. In this case, we have multiple inchi's for the same SMILE's. This is possible, because the INCHI's describes the molecule more exactly then the SMILES.

### Why do we have less wikidata links then SMILES? This is a lose of information, no?  
We can see below, that some wikidatalinks are pointing to multiple SMILE's. This can be the case, because the wikilinks sometimes are represanting chemical groupes.

In [5]:
# make a dataframe with only the two columns: "structure_smiles" and "structure_wikidata"
df_lotus_for_pickaxe_with_wikidata = df_lotus.select(["structure_smiles", "structure_wikidata"]).unique()

# search for the duplicates and print them
df_lotus_for_pickaxe_with_wikidata_duplicates = df_lotus_for_pickaxe_with_wikidata.filter(df_lotus_for_pickaxe_with_wikidata.select(['structure_wikidata']).is_duplicated()).sort('structure_wikidata')
print(f'{df_lotus_for_pickaxe_with_wikidata_duplicates[0:2, :]}')

shape: (2, 2)
┌──────────────────────────────────────────────────────┬───────────────────────────────────────────┐
│ structure_smiles                                     ┆ structure_wikidata                        │
│ ---                                                  ┆ ---                                       │
│ str                                                  ┆ str                                       │
╞══════════════════════════════════════════════════════╪═══════════════════════════════════════════╡
│ C/C=C1/CN2CC[C@@]34c5ccccc5N5[C@H](C(=O)OC)[C@H]1C[C ┆ http://www.wikidata.org/entity/Q105144092 │
│ @H]2[C@]53Oc1c4cc2c(c1O)C(=O)O[C@]13[C@@H]4C[C@H]5/C ┆                                           │
│ (=C\C)CN4CC[C@@]21c1ccccc1N3[C@@H]5C(=O)OC           ┆                                           │
│ C/C=C1/CN2CC[C@]34c5ccccc5N5[C@H](C(=O)OC)[C@H]1C[C@ ┆ http://www.wikidata.org/entity/Q105144092 │
│ H]2[C@@]53Oc1c4cc2c(c1O)C(=O)O[C@@]13[C@@H]4C[C@H]5/ ┆                     

## Inputfile (InchyKey and SMILES)

In [5]:
df_lotus.columns

['structure_wikidata',
 'structure_inchikey',
 'structure_inchi',
 'structure_smiles',
 'structure_molecular_formula',
 'structure_exact_mass',
 'structure_xlogp',
 'structure_smiles_2D',
 'structure_cid',
 'structure_nameIupac',
 'structure_nameTraditional',
 'structure_stereocenters_total',
 'structure_stereocenters_unspecified',
 'structure_taxonomy_npclassifier_01pathway',
 'structure_taxonomy_npclassifier_02superclass',
 'structure_taxonomy_npclassifier_03class',
 'structure_taxonomy_classyfire_chemontid',
 'structure_taxonomy_classyfire_01kingdom',
 'structure_taxonomy_classyfire_02superclass',
 'structure_taxonomy_classyfire_03class',
 'structure_taxonomy_classyfire_04directparent',
 'organism_wikidata',
 'organism_name',
 'organism_taxonomy_gbifid',
 'organism_taxonomy_ncbiid',
 'organism_taxonomy_ottid',
 'organism_taxonomy_01domain',
 'organism_taxonomy_02kingdom',
 'organism_taxonomy_03phylum',
 'organism_taxonomy_04class',
 'organism_taxonomy_05order',
 'organism_taxonomy_0

In [7]:
# write in bold the specific title
print("\033[1m\nuniqueness over categories:\033[0m")

# unique counts only of the "chemical structure" columns
unique_counts = df_lotus.select(
    df_lotus.columns[0:20]
    ).n_unique()
print("all structure columns:", str(unique_counts))

# unique counts only of the "organism" columns []
unique_counts = df_lotus.select(
    df_lotus.columns[21:35]
    ).n_unique()
print("all organism columns:", str(unique_counts))

# unique counts for the full dataset
unique_counts = df_lotus.select(
    df_lotus.columns[:]
    ).n_unique()
print("all columns:", str(unique_counts))

# unique counts for the pickaxe/MINEs input
print("\033[1m\nImportant Info for pickaxe (MINEs):\033[0m")
print(f'unique "structure_smiles": {df_lotus.select(["structure_smiles"]).n_unique()}')
print(f'unique "structure_inchikey": {df_lotus.select(["structure_inchikey"]).n_unique()}')
print(f'unique "structure_smiles" and "structure_inchikey": {df_lotus.select(["structure_smiles", "structure_inchikey"]).n_unique()}')

[1m
uniqueness over categories:[0m
all structure columns: 257226
all organism columns: 36803
all columns: 792364
[1m
Important Info for pickaxe (MINEs):[0m
unique "structure_smiles": 220820
unique "structure_inchikey": 220823
unique "structure_smiles" and "structure_inchikey": 220823


In [9]:
# make a dataframe with only the two columns: "structure_smiles" and "structure_inchi"
df_lotus_for_pickaxe_with_inchikey = df_lotus.select(["structure_smiles", "structure_inchikey"]).unique()

# search for the duplicates and print them
df_lotus_for_pickaxe_with_inchikey_duplicates = df_lotus_for_pickaxe_with_inchikey.filter(df_lotus_for_pickaxe_with_inchikey.select(['structure_smiles']).is_duplicated()).sort('structure_smiles')
print(f'{df_lotus_for_pickaxe_with_inchikey_duplicates}')

shape: (6, 2)
┌────────────────────────────────────────────────────────────────────┬─────────────────────────────┐
│ structure_smiles                                                   ┆ structure_inchikey          │
│ ---                                                                ┆ ---                         │
│ str                                                                ┆ str                         │
╞════════════════════════════════════════════════════════════════════╪═════════════════════════════╡
│ CC1=C\C=C\C(C)=C\C[C@H](C)NC(=O)/C(CC(C)C)=C/C(C)=C/C=C/C=C/[C@@]( ┆ SCXSAOWAWJDUND-ROXKSDJVSA-N │
│ C)(O)[C@@H](O[C@@H]2OC[C@@H](O[C@H]3C[C@@](C)(O)[C@H](N(C)C)[C@@H] ┆                             │
│ (C)O3)[C@H](O)[C@H]2N)/C=C\C=C\1                                   ┆                             │
│ CC1=C\C=C\C(C)=C\C[C@H](C)NC(=O)/C(CC(C)C)=C/C(C)=C/C=C/C=C/[C@@]( ┆ SCXSAOWAWJDUND-ZBTULVDDSA-N │
│ C)(O)[C@@H](O[C@@H]2OC[C@@H](O[C@H]3C[C@@](C)(O)[C@H](N(C)C)[C@@H] ┆       

## filter


In [3]:
df_lotus.filter(pl.col("structure_taxonomy_classyfire_04directparent") == "Flavonoid-3-O-glycosides")

structure_wikidata,structure_inchikey,structure_inchi,structure_smiles,structure_molecular_formula,structure_exact_mass,structure_xlogp,structure_smiles_2D,structure_cid,structure_nameIupac,structure_nameTraditional,structure_stereocenters_total,structure_stereocenters_unspecified,structure_taxonomy_npclassifier_01pathway,structure_taxonomy_npclassifier_02superclass,structure_taxonomy_npclassifier_03class,structure_taxonomy_classyfire_chemontid,structure_taxonomy_classyfire_01kingdom,structure_taxonomy_classyfire_02superclass,structure_taxonomy_classyfire_03class,structure_taxonomy_classyfire_04directparent,organism_wikidata,organism_name,organism_taxonomy_gbifid,organism_taxonomy_ncbiid,organism_taxonomy_ottid,organism_taxonomy_01domain,organism_taxonomy_02kingdom,organism_taxonomy_03phylum,organism_taxonomy_04class,organism_taxonomy_05order,organism_taxonomy_06family,organism_taxonomy_07tribe,organism_taxonomy_08genus,organism_taxonomy_09species,organism_taxonomy_10varietas,reference_wikidata,reference_doi,manual_validation
str,str,str,str,str,f64,f64,str,i64,str,str,i64,i64,str,str,str,i64,str,str,str,str,str,str,str,i64,i64,str,str,str,str,str,str,str,str,str,str,str,str,str
"""http://www.wikidata.org/entity/Q407857""","""IKGXIBQEEMLURG-NVPNHPEKSA-N""","""InChI=1S/C27H30O16/c1-8-17(32)20(35)22(37)26(40-8)39-7-15-18(33)21(36)23(38)27(42-15)43-25-19(34)16-13(31)5-10(28)6-14(16)41-24(25)9-2-3-11(29)12(30)4-9/h2-6,8,15,17-18,20-23,26-33,35-38H,7H2,1H3/t8-,15+,17-,18+,20+,21-,22+,23+,26+,27-/m0/s1""","""C[C@@H]1O[C@@H](OC[C@H]2O[C@@H](Oc3c(-c4ccc(O)c(O)c4)oc4cc(O)cc(O)c4c3=O)[C@H](O)[C@@H](O)[C@@H]2O)[C@H](O)[C@H](O)[C@H]1O""","""C27H30O16""",610.153385,-1.6871,"""CC1OC(OCC2OC(Oc3c(-c4ccc(O)c(O)c4)oc4cc(O)cc(O)c4c3=O)C(O)C(O)C2O)C(O)C(O)C1O""",5280805,"""2-(3,4-dihydroxyphenyl)-5,7-dihydroxy-3-[(2S,3R,4S,5S,6R)-3,4,5-trihydroxy-6-[[(2R,3R,4R,5R,6S)-3,4,5-trihydroxy-6-methyloxan-2-yl]oxymethyl]oxan-2-yl]oxychromen-4-one""","""Rutin""",10,0,"""Shikimates and Phenylpropanoids""","""Flavonoids""","""Flavonols""",3531,"""Organic compounds""","""Phenylpropanoids and polyketides""","""Flavonoids""","""Flavonoid-3-O-glycosides""","""http://www.wikidata.org/entity/Q15232815""","""Nerium indicum""",,,65881,"""Eukaryota""","""Archaeplastida""","""Streptophyta""","""Magnoliopsida""","""Gentianales""","""Apocynaceae""","""Nerieae""","""Nerium""","""Nerium oleander""",,"""http://www.wikidata.org/entity/Q104384464""","""10.1016/0031-9422(95)00837-3""",
"""http://www.wikidata.org/entity/Q407857""","""IKGXIBQEEMLURG-NVPNHPEKSA-N""","""InChI=1S/C27H30O16/c1-8-17(32)20(35)22(37)26(40-8)39-7-15-18(33)21(36)23(38)27(42-15)43-25-19(34)16-13(31)5-10(28)6-14(16)41-24(25)9-2-3-11(29)12(30)4-9/h2-6,8,15,17-18,20-23,26-33,35-38H,7H2,1H3/t8-,15+,17-,18+,20+,21-,22+,23+,26+,27-/m0/s1""","""C[C@@H]1O[C@@H](OC[C@H]2O[C@@H](Oc3c(-c4ccc(O)c(O)c4)oc4cc(O)cc(O)c4c3=O)[C@H](O)[C@@H](O)[C@@H]2O)[C@H](O)[C@H](O)[C@H]1O""","""C27H30O16""",610.153385,-1.6871,"""CC1OC(OCC2OC(Oc3c(-c4ccc(O)c(O)c4)oc4cc(O)cc(O)c4c3=O)C(O)C(O)C2O)C(O)C(O)C1O""",5280805,"""2-(3,4-dihydroxyphenyl)-5,7-dihydroxy-3-[(2S,3R,4S,5S,6R)-3,4,5-trihydroxy-6-[[(2R,3R,4R,5R,6S)-3,4,5-trihydroxy-6-methyloxan-2-yl]oxymethyl]oxan-2-yl]oxychromen-4-one""","""Rutin""",10,0,"""Shikimates and Phenylpropanoids""","""Flavonoids""","""Flavonols""",3531,"""Organic compounds""","""Phenylpropanoids and polyketides""","""Flavonoids""","""Flavonoid-3-O-glycosides""","""http://www.wikidata.org/entity/Q155833""","""Nerium oleander""",,63479,65881,"""Eukaryota""","""Archaeplastida""","""Streptophyta""","""Magnoliopsida""","""Gentianales""","""Apocynaceae""","""Nerieae""","""Nerium""","""Nerium oleander""",,"""http://www.wikidata.org/entity/Q104384464""","""10.1016/0031-9422(95)00837-3""",
"""http://www.wikidata.org/entity/Q407857""","""IKGXIBQEEMLURG-NVPNHPEKSA-N""","""InChI=1S/C27H30O16/c1-8-17(32)20(35)22(37)26(40-8)39-7-15-18(33)21(36)23(38)27(42-15)43-25-19(34)16-13(31)5-10(28)6-14(16)41-24(25)9-2-3-11(29)12(30)4-9/h2-6,8,15,17-18,20-23,26-33,35-38H,7H2,1H3/t8-,15+,17-,18+,20+,21-,22+,23+,26+,27-/m0/s1""","""C[C@@H]1O[C@@H](OC[C@H]2O[C@@H](Oc3c(-c4ccc(O)c(O)c4)oc4cc(O)cc(O)c4c3=O)[C@H](O)[C@@H](O)[C@@H]2O)[C@H](O)[C@H](O)[C@H]1O""","""C27H30O16""",610.153385,-1.6871,"""CC1OC(OCC2OC(Oc3c(-c4ccc(O)c(O)c4)oc4cc(O)cc(O)c4c3=O)C(O)C(O)C2O)C(O)C(O)C1O""",5280805,"""2-(3,4-dihydroxyphenyl)-5,7-dihydroxy-3-[(2S,3R,4S,5S,6R)-3,4,5-trihydroxy-6-[[(2R,3R,4R,5R,6S)-3,4,5-trihydroxy-6-methyloxan-2-yl]oxymethyl]oxan-2-yl]oxychromen-4-one""","""Rutin""",10,0,"""Shikimates and Phenylpropanoids""","""Flavonoids""","""Flavonols""",3531,"""Organic compounds""","""Phenylpropanoids and polyketides""","""Flavonoids""","""Flavonoid-3-O-glycosides""","""http://www.wikidata.org/entity/Q1137159""","""Daphnis nerii""",,522810,3094519,"""Eukaryota""","""Metazoa""","""Arthropoda""","""Insecta""","""Lepidoptera""","""Sphingidae""","""Macroglossini""","""Daphnis""","""Daphnis nerii""",,"""http://www.wikidata.org/entity/Q104384464""","""10.1016/0031-9422(95)00837-3""",
"""http://www.wikidata.org/entity/Q407857""","""IKGXIBQEEMLURG-NVPNHPEKSA-N""","""InChI=1S/C27H30O16/c1-8-17(32)20(35)22(37)26(40-8)39-7-15-18(33)21(36)23(38)27(42-15)43-25-19(34)16-13(31)5-10(28)6-14(16)41-24(25)9-2-3-11(29)12(30)4-9/h2-6,8,15,17-18,20-23,26-33,35-38H,7H2,1H3/t8-,15+,17-,18+,20+,21-,22+,23+,26+,27-/m0/s1""","""C[C@@H]1O[C@@H](OC[C@H]2O[C@@H](Oc3c(-c4ccc(O)c(O)c4)oc4cc(O)cc(O)c4c3=O)[C@H](O)[C@@H](O)[C@@H]2O)[C@H](O)[C@H](O)[C@H]1O""","""C27H30O16""",610.153385,-1.6871,"""CC1OC(OCC2OC(Oc3c(-c4ccc(O)c(O)c4)oc4cc(O)cc(O)c4c3=O)C(O)C(O)C2O)C(O)C(O)C1O""",5280805,"""2-(3,4-dihydroxyphenyl)-5,7-dihydroxy-3-[(2S,3R,4S,5S,6R)-3,4,5-trihydroxy-6-[[(2R,3R,4R,5R,6S)-3,4,5-trihydroxy-6-methyloxan-2-yl]oxymethyl]oxan-2-yl]oxychromen-4-one""","""Rutin""",10,0,"""Shikimates and Phenylpropanoids""","""Flavonoids""","""Flavonols""",3531,"""Organic compounds""","""Phenylpropanoids and polyketides""","""Flavonoids""","""Flavonoid-3-O-glycosides""","""http://www.wikidata.org/entity/Q4483174""","""Ferulago sylvatica""",,1333700,3888160,"""Eukaryota""","""Archaeplastida""","""Streptophyta""","""Magnoliopsida""","""Apiales""","""Apiaceae""",,"""Ferulago""","""Ferulago sylvatica""",,"""http://www.wikidata.org/entity/Q104383974""","""10.18535/IJETST/V2I8.18""",
"""http://www.wikidata.org/entity/Q105201246""","""OVSQVDMCBVZWGM-IDRAQACASA-N""","""InChI=1S/C21H20O12/c22-6-13-15(27)17(29)18(30)21(32-13)33-20-16(28)14-11(26)4-8(23)5-12(14)31-19(20)7-1-2-9(24)10(25)3-7/h1-5,13,15,17-18,21-27,29-30H,6H2/t13-,15-,17-,18-,21+/m1/s1""","""O=c1c(O[C@@H]2O[C@H](CO)[C@@H](O)[C@@H](O)[C@H]2O)c(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12""","""C21H20O12""",464.095476,-0.5389,"""O=c1c(OC2OC(CO)C(O)C(O)C2O)c(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12""",12304327,"""2-(3,4-dihydroxyphenyl)-5,7-dihydroxy-3-[(2S,3R,4R,5S,6R)-3,4,5-trihydroxy-6-(hydroxymethyl)oxan-2-yl]oxychromen-4-one""","""Quercetin 3-alloside""",5,0,"""Shikimates and Phenylpropanoids""","""Flavonoids""","""Flavonols""",3531,"""Organic compounds""","""Phenylpropanoids and polyketides""","""Flavonoids""","""Flavonoid-3-O-glycosides""","""http://www.wikidata.org/entity/Q166843""","""Psidium guajava""",,120290,339092,"""Eukaryota""","""Archaeplastida""","""Streptophyta""","""Magnoliopsida""","""Myrtales""","""Myrtaceae""","""Myrteae""","""Psidium""","""Psidium guajava""",,"""http://www.wikidata.org/entity/Q50633628""","""10.1016/J.FOODRES.2017.03.019""",
"""http://www.wikidata.org/entity/Q105201322""","""OVSQVDMCBVZWGM-SJWGPRHPSA-N""","""InChI=1S/C21H20O12/c22-6-13-15(27)17(29)18(30)21(32-13)33-20-16(28)14-11(26)4-8(23)5-12(14)31-19(20)7-1-2-9(24)10(25)3-7/h1-5,13,15,17-18,21-27,29-30H,6H2/t13-,15+,17-,18+,21+/m1/s1""","""O=c1c(O[C@@H]2O[C@H](CO)[C@H](O)[C@@H](O)[C@@H]2O)c(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12""","""C21H20O12""",464.095476,-0.5389,"""O=c1c(OC2OC(CO)C(O)C(O)C2O)c(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12""",51521831,"""2-(3,4-dihydroxyphenyl)-5,7-dihydroxy-3-[(2S,3S,4R,5R,6R)-3,4,5-trihydroxy-6-(hydroxymethyl)oxan-2-yl]oxychromen-4-one""","""2-(3,4-dihydroxyphenyl)-5,7-dihydroxy-3-[(2S,3S,4R,5R,6R)-3,4,5-trihydroxy-6-(hydroxymethyl)oxan-2-yl]oxychromen-4-one""",5,0,"""Shikimates and Phenylpropanoids""","""Flavonoids""","""Flavonols""",3531,"""Organic compounds""","""Phenylpropanoids and polyketides""","""Flavonoids""","""Flavonoid-3-O-glycosides""","""http://www.wikidata.org/entity/Q166843""","""Psidium guajava""",,120290,339092,"""Eukaryota""","""Archaeplastida""","""Streptophyta""","""Magnoliopsida""","""Myrtales""","""Myrtaceae""","""Myrteae""","""Psidium""","""Psidium guajava""",,"""http://www.wikidata.org/entity/Q50633628""","""10.1016/J.FOODRES.2017.03.019""",
"""http://www.wikidata.org/entity/Q105202673""","""OXGUCUVFOIWWQJ-LDWSSRBJSA-N""","""InChI=1S/C21H20O11/c1-7-15(26)17(28)18(29)21(30-7)32-20-16(27)14-12(25)5-9(22)6-13(14)31-19(20)8-2-3-10(23)11(24)4-8/h2-7,15,17-18,21-26,28-29H,1H3/t7-,15-,17+,18-,21-/m0/s1""","""C[C@@H]1O[C@@H](Oc2c(-c3ccc(O)c(O)c3)oc3cc(O)cc(O)c3c2=O)[C@@H](O)[C@H](O)[C@H]1O""","""C21H20O11""",448.100561,0.4887,"""CC1OC(Oc2c(-c3ccc(O)c(O)c3)oc3cc(O)cc(O)c3c2=O)C(O)C(O)C1O""",26339717,"""2-(3,4-dihydroxyphenyl)-5,7-dihydroxy-3-[(2S,3S,4R,5R,6S)-3,4,5-trihydroxy-6-methyloxan-2-yl]oxychromen-4-one""","""3-(6-Deoxy-alpha-L-glucopyranosyloxy)-2-(3,4-dihydroxyphenyl)-5,7-dihydroxy-4H-1-benzopyran-4-one""",5,0,"""Shikimates and Phenylpropanoids""","""Flavonoids""","""Flavonols""",3531,"""Organic compounds""","""Phenylpropanoids and polyketides""","""Flavonoids""","""Flavonoid-3-O-glycosides""","""http://www.wikidata.org/entity/Q166843""","""Psidium guajava""",,120290,339092,"""Eukaryota""","""Archaeplastida""","""Streptophyta""","""Magnoliopsida""","""Myrtales""","""Myrtaceae""","""Myrteae""","""Psidium""","""Psidium guajava""",,"""http://www.wikidata.org/entity/Q50633628""","""10.1016/J.FOODRES.2017.03.019""",
"""http://www.wikidata.org/entity/Q105024324""","""GZBROUOOAWUBQH-UHFFFAOYSA-N""","""InChI=1S/C27H22O15/c28-11-6-14(31)19-17(7-11)40-24(9-1-2-12(29)13(30)3-9)25(22(19)36)42-27-23(37)21(35)18(41-27)8-39-26(38)10-4-15(32)20(34)16(33)5-10/h1-7,18,21,23,27-35,37H,8H2""","""O=C(OCC1OC(Oc2c(-c3ccc(O)c(O)c3)oc3cc(O)cc(O)c3c2=O)C(O)C1O)c1cc(O)c(O)c(O)c1""","""C27H22O15""",586.09587,1.0817,"""O=C(OCC1OC(Oc2c(-c3ccc(O)c(O)c3)oc3cc(O)cc(O)c3c2=O)C(O)C1O)c1cc(O)c(O)c(O)c1""",72769157,"""[5-[2-(3,4-dihydroxyphenyl)-5,7-dihydroxy-4-oxochromen-3-yl]oxy-3,4-dihydroxyoxolan-2-yl]methyl 3,4,5-trihydroxybenzoate""","""[5-[2-(3,4-Dihydroxyphenyl)-5,7-dihydroxy-4-oxochromen-3-yl]oxy-3,4-dihydroxyoxolan-2-yl]methyl 3,4,5-trihydroxybenzoate""",4,4,"""Shikimates and Phenylpropanoids""","""Flavonoids""","""Flavonols""",3531,"""Organic compounds""","""Phenylpropanoids and polyketides""","""Flavonoids""","""Flavonoid-3-O-glycosides""","""http://www.wikidata.org/entity/Q166843""","""Psidium guajava""",,120290,339092,"""Eukaryota""","""Archaeplastida""","""Streptophyta""","""Magnoliopsida""","""Myrtales""","""Myrtaceae""","""Myrteae""","""Psidium""","""Psidium guajava""",,"""http://www.wikidata.org/entity/Q50633628""","""10.1016/J.FOODRES.2017.03.019""",
"""http://www.wikidata.org/entity/Q105217273""","""PZZRDJXEMZMZFD-ODPGBAFUSA-N""","""InChI=1S/C20H18O11/c21-8-4-11(24)14-13(5-8)30-18(7-1-2-9(22)10(23)3-7)19(16(14)27)31-20-17(28)15(26)12(25)6-29-20/h1-5,12,15,17,20-26,28H,6H2/t12-,15+,17-,20+/m0/s1""","""O=c1c(O[C@H]2OC[C@H](O)[C@@H](O)[C@@H]2O)c(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12""","""C20H18O11""",434.084911,0.1002,"""O=c1c(OC2OCC(O)C(O)C2O)c(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12""",26202188,"""2-(3,4-dihydroxyphenyl)-5,7-dihydroxy-3-[(2R,3S,4R,5S)-3,4,5-trihydroxyoxan-2-yl]oxychromen-4-one""","""2-(3,4-dihydroxyphenyl)-5,7-dihydroxy-3-[(2R,3S,4R,5S)-3,4,5-trihydroxyoxan-2-yl]oxychromen-4-one""",4,0,"""Shikimates and Phenylpropanoids""","""Flavonoids""","""Flavonols""",3531,"""Organic compounds""","""Phenylpropanoids and polyketides""","""Flavonoids""","""Flavonoid-3-O-glycosides""","""http://www.wikidata.org/entity/Q166843""","""Psidium guajava""",,120290,339092,"""Eukaryota""","""Archaeplastida""","""Streptophyta""","""Magnoliopsida""","""Myrtales""","""Myrtaceae""","""Myrteae""","""Psidium""","""Psidium guajava""",,"""http://www.wikidata.org/entity/Q50633628""","""10.1016/J.FOODRES.2017.03.019""",
"""http://www.wikidata.org/entity/Q1649777""","""OXGUCUVFOIWWQJ-HQBVPOQASA-N""","""InChI=1S/C21H20O11/c1-7-15(26)17(28)18(29)21(30-7)32-20-16(27)14-12(25)5-9(22)6-13(14)31-19(20)8-2-3-10(23)11(24)4-8/h2-7,15,17-18,21-26,28-29H,1H3/t7-,15-,17+,18+,21-/m0/s1""","""C[C@@H]1O[C@@H](Oc2c(-c3ccc(O)c(O)c3)oc3cc(O)cc(O)c3c2=O)[C@H](O)[C@H](O)[C@H]1O""","""C21H20O11""",448.100561,0.4887,"""CC1OC(Oc2c(-c3ccc(O)c(O)c3)oc3cc(O)cc(O)c3c2=O)C(O)C(O)C1O""",5280459,"""2-(3,4-dihydroxyphenyl)-5,7-dihydroxy-3-[(2S,3R,4R,5R,6S)-3,4,5-trihydroxy-6-methyloxan-2-yl]oxychromen-4-one""","""Quercitrin""",5,0,"""Shikimates and Phenylpropanoids""","""Flavonoids""","""Flavonols""",3531,"""Organic compounds""","""Phenylpropanoids and polyketides""","""Flavonoids""","""Flavonoid-3-O-glycosides""","""http://www.wikidata.org/entity/Q1813408""","""Melaleuca quinquenervia""",,164942,457009,"""Eukaryota""","""Archaeplastida""","""Streptophyta""","""Magnoliopsida""","""Myrtales""","""Myrtaceae""","""Melaleuceae""","""Melaleuca""","""Melaleuca quinquenervia""",,"""http://www.wikidata.org/entity/Q115784348""","""10.1300/J044V05N02_05""",
