Skip to content

Processing your own referenced structure organism pairs

Adafede edited this page Dec 23, 2021 · 5 revisions

Any user wanting to add referenced structure-organism pairs to Wikidata can do it. We also provide ways to upload data in batches and those are detailed in https://github.com/lotusnprod/lotus-wikidata-interact.

Before doing so, we strongly suggest you run your data through our processing pipeline in order to format it correctly. An example file showing the minimal input you need to have is given in data/example/custom_example.tsv.

Once you have the referenced pairs collected in the same format, you can then:

conda env create --file environment.yml
conda activate lotus_env

Then, briefly format your data:

cd src
Rscript 1_gathering/db/custom/standardizing.R <path_to_your_file>
# Rscript 1_gathering/db/custom/standardizing.R ../data/example/custom_example.tsv 

Finally, you can process it in order to improve the quality of your data and add additional metadata required within Wikidata.

# cd .. (if you continue directly from the previous step)
make MODE=custom lotus-bloom

You'll then find your Wikidata-ready file in data/interim/tables_custom/4_analyzed/validated_referenced_structure_organism_pairs.tsv.gz. This is the file you will then use as input for the Wikidata import step.

⚠️ Data quality is of utmost importance. You are the only one responsible for the initial data quality. If your original referenced pairs are wrong, you will end up with wrong results.