🌟 We're delighted to have you explore our computational workflow. This guide will walk you through the installation, setup, and execution of the ENPKG full workflow. Interested in the science behind ENPKG? Check out the paper (https://doi.org/10.1021/acscentsci.3c00800) ! It's packed with insights and methodologies that power this workflow.
First, clone the repository to your local machine:
git clone https://github.com/enpkg/enpkg_full.git
Navigate to the newly created folder:
cd enpkg_full
We offer both Mamba
or Poetry
installation solutions, see below:
Start your journey by setting up the required environment. It's a breeze (or not) with Mamba! See the Mamba documentation for more details.
mamba env create -f environment.yml
First, see the Poetry documentation for more details.
poetry install
Once the environment is ready, bring it to life with this simple command:
conda activate enpkg_full
poetry shell
Check details at https://boecker-lab.github.io/docs.sirius.github.io/install/
To get the latest version for your platform, run the install_sirius.sh
script specifying the path chosen for the installation. For example, from the root of the repository:
bash src/install_sirius.sh /home/username/sirius
Once Sirius is installed, you will need to precise the path to the executable see section Editing config files.
Setting up the environment variables. To login to Sirius, you will need to set up the following environment variables (SIRIUS_USERNAME and SIRIUS_PASSWORD). You can do so launching the following command:
bash src/setup_sirius_env.sh
You will need to edit the following parameters files:
- Parameters at user.yaml
All parameters are commented and should be self-explanatory.
For example you can enter the record_id and record_name of a Zenodo dataset on line. As it is set up here, this will download a small test dataset (https://doi.org/10.5281/zenodo.10018590).
From the root of the repository, run:
sh workflow/00_workflow_all.sh
On the previous test dataset, this should take about 10 minutes to run.
You can use GraphDB to explore the generated Knowledge Graph. To do so, you will need to install GraphDB (https://graphdb.ontotext.com/download/) and import the generated .ttl files. Make sure to read the latest Graph DB documentation (https://graphdb.ontotext.com/documentation/) to get started.
Facing an Issue? Encountering a glitch or have a suggestion? Your input is crucial for us. Here’s how you can help:
- Report Issues: Use the 'Issues' tab in our GitHub repo to report any problems or ideas.
- Detailed Descriptions Help: Include as much detail as possible - error messages, steps to reproduce, and screenshots are all super helpful.
- Stay Updated: We’ll keep you in the loop as we work on fixing the issue or considering your suggestion.
A Knowledge Graph, has been build on a collection of 1600 tropical plants extracts (https://doi.org/10.1093/gigascience/giac124). This KG also integrates data from a metabolomics study led over 337 medicinal plants of the Korean Pharmacopeia (https://doi.org/10.1038/s41597-022-01662-2). It can be explored following these links.
- The ENPKG graph is available at the following address https://enpkg.commons-lab.org/graphdb/. No need for login !
- The SPARQL research interface can be reached at https://enpkg.commons-lab.org/graphdb/sparql. Make sure to check the paper for examples of queries.
- The ENPKG vocabulary is described at https://enpkg.commons-lab.org/doc/index.html
This set of script represents a pipeline calling multiple tools. Please make sure to cite the original authors of the tools used in this workflow.
Allard et al. 2016, Analytical Chemistry
Davies et al. 2015, Nucleic Acids Research
Djoumbou et al. 2016, Journal of Cheminformatics
Dührkop et al. 2015, Nature Methods
Dührkop et al. 2019, Nature Methods
Dührkop et al. 2020, Nature Biotechnology
Gaudry et al. 2022, Frontiers in Bioinformatics
Hoffmann et al. 2022, Nature Biotechnology
Huber et al. 2020, Journal of Open Source Software
Kim et al. 2021, Journal of Natural Products
Ludwig et al. 2020, Nature Machine Intelligence
McTavish et al. 2021, Systematic Biology
Rutz et al. 2016, Frontiers in Plant Science
Rutz et al. 2022, Elife
To cite the ENPKG workflow, please use the following reference Gaudry et al. 2023.
This workflow describes a pilot application aiming to transition from classical metabolomics datasets to Linked Open Data in such datasets. It is currently being ported to the more generic EMIKG framework (https://github.com/earth-metabolome-initiative/emikg) that we are developping in the frame of the Earth Metabolome Initiative (http://www.earthmetabolome.org/). Stay tuned.