ORCID metadata enrichment pipeline - grabs claims from the API and enriches ADS storage/index.
How it works:
1. periodically check ADS API (using a special OAuth token that gives access to ORCID updates)
1. fetches the claims and puts them into the RabbitMQ queue
1. a worker grabs the claim and enriches it with information about the author (querying both
public ORCID API for the author's name and ADS API for variants of the author name)
1. given the info above, it updates MongoDB (collection orcid_claims) - it marks the claim
either as 'verified' (if it comes from a user with an account in BBB) or 'unverified'
(it is the responsibility of the ADS Import pipeline to pick orcid claims and send them to
SOLR for indexing)
- check-orcidid: compares our stored version of the ORCID works against that of ORCID; records claims with appropriate statuses; sends tasks (individual claims) to record-claim queue
- record-claim: receives single claim; checks existence of bibcode, updates claim with more author information; passes claim to match-claim queue
- match-claim: verifies (or rejects) claims from record-claim, records approved claims; passes approved claims to output-results queue
- output-results: sends results to another pipeline to be incorporated into the record
- check-updates: checks ORCID microservice for updated profiles; if it finds any, sends them to check-orcidid
- vim ADSOrcid/local_config.py #edit, edit
vagrant up db rabbitmq app
vagrant ssh app
cd /vagrant
This will start the pipeline inside the app
container - if you have configured endpoints and
access tokens correctly, it starts fetching data from orcid.
We are using 'docker' provider (ie. instead of virtualbox VM, you run the processes in docker).
On some systems, it is necessary to do: export VAGRANT_DEFAULT_PROVIDER=docker
or always
specify `--provider docker' when you run vagrant.
The directory is synced to /vagrant/ on the guest.
If you (also) hate when stuff is unnecessarily complicated, then you can also run/develop locally (using whatever editor/IDE/debugger you like)
- virtualenv python
- source python/bin/activate
- pip install -r requirements.txt
- pip install -r dev-requirements.txt
- vagrant
up db rabbitmq
This will setup python virtualenv
and the database + rabbitmq. You can run the pipeline and
tests locally.
vagrant up rabbitmq
The RabbitMQ will be on localhost:6672. The administrative interface on localhost:25672.
vagrant up db
MongoDB is on localhost:37017, PostgreSQL on localhost:6432
vagrant up prod
It will automatically download/install the latest release from the github (no, not your local changes - only from github).
If you /ADSOrcid/prod_config.py is available, it will copy and use it in place of
local_config.py
No ports are exposed, no SSH access is possible. New releases will deployed automatically.
Typical installation:
vim ADSOrcid/prod_config.py
# edit, edit...vagrant up prod
- cd manifests/production/app
- docker build --name ADSOrcid -t ADSOrcid .
- cd ../../..
- vim prod_config.py # edit, edit...
- dockerun -d -v .:/vagrant/ --name ADSOrcid ADSOrcid /sbin/my_init
Here are some useful commands:
-
restart service
docker exec ADSOrcid sv restart app
-
tail log from one of the workers
docker exec ADSOrcid tail -f /app/logs/ClaimsImporter.log
Kelly, Roman