How to download all the chemical compound and their related data of an organism from LOTUS ? #27

ap1438 · 2022-06-10T15:05:33Z

So, i have an organism and i want to download all the chemical compounds related to that organism with their smile ID and the species that produce those chemical compounds.

So what i did was just search in the web page and found all the entries of chemical compounds related to that organism. And downloaded the SDF file which was the only downloading option available. And later converted it to excel format.

But what i realized was that file was missing compound names.

So what i wanted was Compound name, Smile ID, Species it is present.

Is is possible to get it as such from the LOTUS database by any means ?

Adafede · 2022-06-12T04:20:08Z

Hi!

Thank you for your issue.

Actually, we support a lot of custom searches (see https://lotus.naturalproducts.net/documentation) but not the specific one you requested.

We might provide a SPARQL endpoint in the future to handle such requests but in the meantime, querying Wikidata directly seems a good option.

I prepared a query you can easily adapt for you: https://w.wiki/5GSw. You can directly download the results as a tabular file there.

Another option could be to use https://pubchem.ncbi.nlm.nih.gov/classification/#hid=115 and search there directly, they offer CSV download also.

More generally, the compounds' names are automatically generated so we would advise being very cautious with them.

Best

ap1438 · 2022-06-13T04:49:41Z

Thank you for your quick response and valuable suggestion.
As i see the code and downloaded the data the fields molecular formulae was missing.
So, i tried to modify the code and download the molecular formulae also.
But i don't know why it shows query time limit reached.
So, I tried this code

https://w.wiki/5GgJ

Can you check and guide me where did i go wrong.

Adafede · 2022-06-13T05:09:31Z

You were almost there!

I think the query you want is: https://w.wiki/5Ggd

Your was querying again against whole Wikidata for molecules

ap1438 · 2022-06-13T05:54:33Z

Thanks for the correction and insights.

ap1438 · 2022-06-16T13:59:43Z

Search for "Gentiana" returned 483 natural products in LOTUS Database search in LOTUS webpage.
BUT wiki data query returns 768 .
Why is this much difference.

Can you please let me know the reason behind the difference?

Adafede · 2022-06-16T14:19:06Z

Hi,

Not exactly, the query I wrote you gives structure-organism pairs. So the same structure can appear multiple times. If you want to reduce it to distinct structures, here: https://w.wiki/5J73.

Hope this clarifies

ap1438 · 2022-06-16T16:57:31Z

Thank you

alrichardbollans · 2023-07-04T10:37:47Z

I'm trying to do something similar and following your examples, when I run:

SELECT DISTINCT ?structure ?structureLabel ?structure_smiles ?structureCAS ?structureINCHIKEY ?organism ?organism_name WHERE {
  VALUES ?taxon {
    wd:Q21754                                    # You can remove the Qxxxxxx and hit Ctrl+space, type the first letters and it should autocomplete
  }
  ?organism (wdt:P171*) ?taxon;                   # Include children taxa
                        wdt:P225 ?organism_name.  # Get organism name
  ?structure wdt:P233 ?structure_smiles;          # Get the SMILES
             (p:P703/ps:P703) ?organism.          # Found in given taxon/taxa

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 100000

I get 20968 results, however when I try to include CASID and INCHIKEY information with the following:

SELECT DISTINCT ?structure ?structureLabel ?structure_smiles ?structureCAS ?structureINCHIKEY ?organism ?organism_name WHERE {
  VALUES ?taxon {
    wd:Q21754                                    # You can remove the Qxxxxxx and hit Ctrl+space, type the first letters and it should autocomplete
  }
  ?organism (wdt:P171*) ?taxon;                   # Include children taxa
                        wdt:P225 ?organism_name.  # Get organism name
  ?structure wdt:P233 ?structure_smiles;          # Get the SMILES
             (p:P703/ps:P703) ?organism;          # Found in given taxon/taxa
             wdt:P231 ?structureCAS;          # Get the CAS
             wdt:P235 ?structureINCHIKEY.          # Get the INCHIKEY

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 100000

I only get 7967 results. I imagine this might be because the latter query doesn't return instances without a CAS ID or INCHIKEY. Is it possible to return all metabolites found in taxa and leave missing values for the properties as NaN?

ap1438 closed this as completed Jun 13, 2022

ap1438 reopened this Jun 16, 2022

ap1438 closed this as completed Jun 16, 2022

alrichardbollans mentioned this issue Jul 4, 2023

Returning all metabolites in a given clade, including possibly missing properties #61

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to download all the chemical compound and their related data of an organism from LOTUS ? #27

How to download all the chemical compound and their related data of an organism from LOTUS ? #27

ap1438 commented Jun 10, 2022

Adafede commented Jun 12, 2022

ap1438 commented Jun 13, 2022 •

edited

Loading

Adafede commented Jun 13, 2022

ap1438 commented Jun 13, 2022

ap1438 commented Jun 16, 2022 •

edited

Loading

Adafede commented Jun 16, 2022

ap1438 commented Jun 16, 2022

alrichardbollans commented Jul 4, 2023 •

edited

Loading

How to download all the chemical compound and their related data of an organism from LOTUS ? #27

How to download all the chemical compound and their related data of an organism from LOTUS ? #27

Comments

ap1438 commented Jun 10, 2022

Adafede commented Jun 12, 2022

ap1438 commented Jun 13, 2022 • edited Loading

Adafede commented Jun 13, 2022

ap1438 commented Jun 13, 2022

ap1438 commented Jun 16, 2022 • edited Loading

Adafede commented Jun 16, 2022

ap1438 commented Jun 16, 2022

alrichardbollans commented Jul 4, 2023 • edited Loading

ap1438 commented Jun 13, 2022 •

edited

Loading

ap1438 commented Jun 16, 2022 •

edited

Loading

alrichardbollans commented Jul 4, 2023 •

edited

Loading