-
Notifications
You must be signed in to change notification settings - Fork 0
Processing your own referenced structure organism pairs
Any user wanting to add referenced structure-organism pairs to Wikidata can do it. We also provide ways to upload data in batches and those are detailed in https://github.com/lotusnprod/lotus-wikidata-interact.
Before doing so, we strongly suggest you run your data through our processing pipeline in order to format it correctly. An example file showing the minimal input you need to have is given in data/example/custom_example.tsv.
Once you have the referenced pairs collected in the same format, you can then:
conda env create --file environment.yml
conda activate lotus_env
Then, briefly format your data:
cd src
Rscript 1_gathering/db/custom/standardizing.R <path_to_your_file>
# Rscript 1_gathering/db/custom/standardizing.R ../data/example/custom_example.tsv
Finally, you can process it in order to improve the quality of your data and add additional metadata required within Wikidata.
# cd .. (if you continue directly from the previous step)
make MODE=custom lotus-bloom
You'll then find your Wikidata-ready file in data/interim/tables_custom/4_analyzed/validated_referenced_structure_organism_pairs.tsv.gz
. This is the file you will then use as input for the Wikidata import step.
This wiki and the LOTUS README document contains a lot of information, please take your time and read these instructions carefully.
We strongly advise new users to start by the Getting started.
If you have any questions about the Lotus usage or want to share some information with the Lotus community, please go to one of the following places: