Skip to content

Uploading content using etd loader

Dan Kerchner edited this page Jun 10, 2018 · 16 revisions
  • Ensure that the base_path directory configured in config.py is owned by the scholarspace user (including the contents of base_path)

  • Import content with etd-loader

    % sudo su - scholarspace
    % cd /opt/etd-loader
    % source ENV/bin/activate
    % python etd_loader.py --only retrieve
    

If you get an error similar to paramiko.ssh_exception.SSHException: No hostkey for host <HOST> found, you should be able to resolve this by successfully sftp-ing to the host once.

  • Run the rake task to ingest

    % sudo su - scholarspace
    % cd /opt/etd-loader
    % source bin/ENV/activate
    % nohup python etd_loader.py --only import &>etdload.out  &
    

    Monitor etdload.out for problems. If an ETD cannot be loaded successfully, remove its .zip file from the etd_to_be_imported directory, and re-run nohup python etd_loader.py --only import &>etdload.out & which will continue loading ETDs from where it left off.

  • Create MARC file

   % sudo su - scholarspace
   % cd /opt/etd-loader
   % source bin/ENV/activate
   % python etd_loader.py --only marc
  • Run the rake task to reindex everything. This job can take a while; running it with nohup will allow it to continue running even if you close your shell session.
   % cd /opt/scholarspace/scholarspace-hyrax
   % rvmsudo rake gwss:reindex_everything RAILS_ENV=production &

Issues with item uploads

  • Some errors are recoverable. When an item upload results in: EOFError: end of file reached then occasionally, this upload may succeed with another try. This error almost always crashes solr, requiring solr to be restarted on the Fedora/Solr server.

How to update items already in ScholarSpace

TBD