Skip to content

Monthly URI Testing

Karen Majewicz edited this page Apr 24, 2023 · 1 revision

Background

On a Monthly basis, we perform URI link-checking to ensure the data in the Geoportal stays current. Each record or document in the Geoportal can contain multiple URIs: for metadata representations, for object downloads, for IIIF tiles, etc. Currently, ~16,000 docs contain roughly ~48,000 URIs.

Process

  1. Check prior states

RAILS_ENV=production bundle exec rake geoportal:uri_states

  1. Purge URIs

RAILS_ENV=production bundle exec rake geoportal:uri_purge

  1. Process ALL URIs

RAILS_ENV=production bundle exec rake geoportal:uri_process_all

  1. Check run states

RAILS_ENV=production bundle exec rake geoportal:uri_states

  1. Re-run incomplete states

RAILS_ENV=production bundle exec rake geoportal:uri_queue_incomplete_states

Check that all background jobs have completed. When enqueued is 0 everything has processed.

RAILS_ENV=production bundle exec rake geoportal:sidekiq_stats

  1. Final run states

RAILS_ENV=production bundle exec rake geoportal:uri_states

  1. Produce report

RAILS_ENV=production bundle exec rake geoportal:uri_report

Download at something like: http://geo.btaa.org/2018-08-22_09-57-37.uri_report.csv

Import results to Google Spreadsheet

At the completion of the task, you'll have a "results.csv" file containing ~15,000 URIs and their result status. Each time I create a new Google Spreadsheet, and import this csv file, to share the data with the BTAA Geoportal folk.

They prefer to see the results in a pivot table. Here are past example spreadsheets to see: