This is a one off script to compare content between two deployed versions of EPP after content ingestion has been automated.
If you stumble upon this, it's probably only useful as a reference of what we did and how we did it.
To create your reference images run:
yarn generate-reference
To compare the production instance of EPP with the reference images:
yarn run-test
To generate references and test image comparisons against all the manuscripts, run:
yarn content-migration-run
After the above is run you can check to see how many have passed and failed and then dig into the batch reports:
for file in $(ls ./backstop_data_run/**/**/jsonReport.json | sort -V); do fail_count=$(jq '[.tests[] | select((.pair.diff.rawMisMatchPercentage // 0) > 0)] | length' "$file"); pass_count=$(jq '[.tests[] | select((.pair.diff.rawMisMatchPercentage // 0) == 0)] | length' "$file"); fail_no_diff=$(jq '[.tests[] | select(.status == "fail" and (.pair.diff.rawMisMatchPercentage // 0) == 0)] | length' "$file"); match=$(echo "$file" | grep -oP '\d+-of-\d+' | head -n 1); echo "$match: $pass_count passes, $fail_count fails, $fail_no_diff false fails"; done
To reduce the feedback loop and if you want to focus on specific manuscripts then you can create a manuscripts.json file in the root of this project. See this example:
{
"manuscripts": {
"86961v2": {}
}
}
To simply monitor whether the expected manuscripts are available without performing a visual comparison:
yarn content-status
This process can take a while, so it is best output to a file and analysed afterwards:
yarn content-status > content-status.txt
Then finding all the URLs for the failed entries using the output file and jq
jq '.log|select(.|length>0)[].path' content-status.txt
- Visit Data Hub DocMaps API in lookerstudio.google.com
- Expand menu:
- Choose export:
- Rename to
docmap-mecas.csv
and confirm export:
- Move downloaded
docmap-mecas.csv
to root of this repo - You will need to repeat this step to compare against the latest values in DataHub
If you want to run this on a subset of manuscripts then make sure manuscripts.json file is in the root of this repo or otherwise remove it.
- Run
yarn meca-status > meca-status.txt
- You can monitor progress with
tail -f meca-status.txt
- To see how many meca's match in docmaps and enhanced-preprints-data:
cat meca-status.txt | grep -E ',match,'
cat meca-status.txt | grep -E ',match,' | wc -l
- To see how many meca's are different in docmaps to enhanced-preprints-data:
Published only:
cat meca-status.txt | grep -E ',published,.+,different,'
cat meca-status.txt | grep -E ',published,.+,different,' | wc -l
All:
cat meca-status.txt | grep -E ',different,'
cat meca-status.txt | grep -E ',different,' | wc -l
- To see how many meca's are missing in Docmaps:
Published only:
cat meca-status.txt | grep -E ',published,.+,missing,'
cat meca-status.txt | grep -E ',published,.+,missing,' | wc -l
All:
cat meca-status.txt | grep -E ',missing,'
cat meca-status.txt | grep -E ',missing,' | wc -l
To monitor whether the reviewed-preprints API that serves Journal is displays all expected results:
yarn journal-api-status