Skip to content

Troubleshooting issues when ingesting items via etd_loader or ingest_etd

Dan Kerchner edited this page Aug 21, 2019 · 4 revisions

General tip:

  • If the stack trace from etd_loader doesn't provide as much information as you'd like, you can try scraping the rake command from the console - look for Command is:.

As an example:

INFO:__main__:Importing etdadmin_upload_686240.zip
INFO:__main__:Unzipping /data/etd-loader/etd_to_be_imported/etdadmin_upload_686240.zip to /tmp/tmp78c6eu7c
INFO:__main__:Importing 686240. ETD file is /tmp/tmp78c6eu7c/Hardy_gwu_0075M_14840.pdf and attachements are []
INFO:__main__:Command is: rake RAILS_ENV=production gwss:ingest_etd -- --manifest=/tmp/tmp78c6eu7c/metadata.json --primaryfile=/tmp/tmp78c6eu7c/Hardy_gwu_0075M_14840.pdf --depositor=openaccess@gwu.edu
DEPRECATION WARNING: human_readable_type= is deprecated and will be removed from a future release (human_readable_type is deprecated. Set the i18n key for activefedora.models.#{model_name.i18n_key} instead. This will be removed in Hyrax 3). (called from <class:GwEtd> at /opt/scholarspace/scholarspace-hyrax/app/models/gw_etd.rb:11)
DEPRECATION WARNING: human_readable_type= is deprecated and will be removed from a future release (human_readable_type is deprecated. Set the i18n key for activefedora.models.#{model_name.i18n_key} instead. This will be removed in Hyrax 3). (called from <class:GwWork> at /opt/scholarspace/scholarspace-hyrax/app/models/gw_work.rb:11)
INFO:__main__:Repository id for 686240 is dr26xz169
INFO:__main__:Adding 686240 with dr26xz169
INFO:__main__:Importing etdadmin_upload_686809.zip
INFO:__main__:Unzipping /data/etd-loader/etd_to_be_imported/etdadmin_upload_686809.zip to /tmp/tmpuaehg2gy
INFO:__main__:Importing 686809. ETD file is /tmp/tmpuaehg2gy/Stiegler_gwu_0075A_14848.pdf and attachements are []
INFO:__main__:Command is: rake RAILS_ENV=production gwss:ingest_etd -- --manifest=/tmp/tmpuaehg2gy/metadata.json --primaryfile=/tmp/tmpuaehg2gy/Stiegler_gwu_0075A_14848.pdf --depositor=openaccess@gwu.edu

and the rest of the stack trace follows. You would then:

cd \opt\scholarspace\scholarspace-hyrax

rake RAILS_ENV=production gwss:ingest_etd -- --manifest=/tmp/tmpuaehg2gy/metadata.json --primaryfile=/tmp/tmpuaehg2gy/Stiegler_gwu_0075A_14848.pdf --depositor=openaccess@gwu.edu

and you'll often get a more informative stack trace.

Specific Scenarios:

Symptom rake aborted! Ldp::BadRequest: RDF was not parsable: [line: 8, col: 1 ] Broken token (newline): ;
Probable root cause Control characters such as tabs, etc. in metadata fields in the XML. Ref https://github.com/gwu-libraries/scholarspace-hyrax/issues/96
Recommended solution Unzip the zip file, edit the control characters out, update the zip with the new XML (use zip -u), then re-run etd_loader.py
Symptom TBD
Probable root cause TBD
Recommended solution TBD