A web application for ingest, curation, search, and display of digital assets. Powered by Hydra technologies (Rails, Hydra-head, Blacklight, Solr, Fedora Commons, etc.)
Ruby HTML XSLT Shell CSS JavaScript
Latest commit 7dbe8a0 Feb 27, 2017 @cam156 cam156 committed on GitHub Merge pull request #669 from psu-stewardship/save-status
Display message after clicking Save
Permalink
Failed to load latest commit information.
app Display message after clicking Save, ref #643 Feb 24, 2017
config Cleanup creator and other facets prior to indexing, ref #632 Feb 21, 2017
db Using db-backed Noid minter Jan 10, 2017
fits @ 207c812 Tests should run on Travis-CI Dec 12, 2013
fits_conf Tests should run on Travis-CI Dec 12, 2013
lib Merge pull request #673 from psu-stewardship/i#413 Feb 24, 2017
public Remove landing page and redirect to /contact, fixes #251 Feb 15, 2016
script Updating to Sufia 7.0.0.beta2 Jun 8, 2016
solr/config Get running under Sufia 7.0.0.beta1 and Solr 5 Apr 12, 2016
spec Merge pull request #673 from psu-stewardship/i#413 Feb 24, 2017
tasks Task to clean out the dev environment Jan 11, 2017
.fcrepo_wrapper Testing with Fedora 4.7.1 Feb 3, 2017
.gitignore Organize all sample configs in the own directory Aug 26, 2016
.gitmodules Update to latest Sufia, remove jetty submodule, refs #9422 Jul 17, 2014
.rspec Reconfiguring test suite to use Rspec rails_helper file Oct 13, 2016
.rubocop.yml Cleanup creator and other facets prior to indexing, ref #632 Feb 21, 2017
.rubocop_todo.yml Pin rubocop, rubocop-rspec; added rubocop_todo list Apr 5, 2016
.ruby-version Updating to Ruby 2.3.3 Dec 11, 2016
.solr_wrapper Updating solr_wrapper config to match version that will be on product… Feb 9, 2017
.travis.yml Caching solr downloads on Travis Jan 23, 2017
CONTRIBUTING.md Restore documentation about contributing to ScholarSphere Dec 12, 2013
Capfile Updating Cap 3.5 config May 3, 2016
Gemfile Getting the latest version of Hydra works so that we do not just read… Jan 20, 2017
Gemfile.lock Getting the latest version of Hydra works so that we do not just read… Jan 20, 2017
LICENSE.md license snippet moved into source files Oct 8, 2012
README.md Adding Waffle badge Aug 31, 2016
Rakefile Get running under Sufia 7.0.0.beta1 and Solr 5 Apr 12, 2016
config.ru Updating Rubocop Feb 15, 2016

README.md

ScholarSphere Version Build Status Dependency StatusStories in Ready

ScholarSphere is Penn State's self- and proxy-deposit repository for access to and preservation of scholarly works and data. It is built atop Sufia, a Hydra/Rails-based component.

ScholarSphere is being developed as part of Penn State's Digital Stewardship Program. Development on ScholarSphere began as part of the prototype CAPS project. Code and documentation are freely available via Github.

For more information, read the ScholarSphere development docs.

License

ScholarSphere is available under the Apache 2.0 license. Read the copyright statement and license.

Install

Infrastructural components

  • Ruby 2.0 (we use RVM and rbenv to manage our Rubies)
  • Fedora (if you don't have access to an instance, use the built-in hydra-jetty)
  • Solr (if you don't have access to an instance, use the built-in hydra-jetty)
  • A relational database (SQLite and MySQL have been tested)
  • Redis (for activity streams and background jobs)

Install system dependencies

  • libmysqlclient-dev (if running MySQL as RDBMS)
  • libsqlite3-dev (if running SQLite as RDBMS)
  • libmagick-dev (or libmagickcore-dev on Ubuntu 12.10)
  • libmagickwand-dev
  • clamav
  • clamav-daemon
  • libclamav-dev
  • ghostscript (required to create thumbnails from pdfs)
  • FITS -- put it in a directory on your PATH, or just use the included git submodule
  • phantomjs -- if you're running the test suite, you'll need phantomjs (headless webkit browser) on your PATH for the feature specs

Get the ScholarSphere code

git clone https://github.com/psu-stewardship/scholarsphere.git

Install gems

bundle install

Copy config samples

cp config/devise.yml.sample config/devise.yml
cp config/database.yml.sample config/database.yml
cp config/fedora.yml.sample config/fedora.yml
cp config/solr.yml.sample config/solr.yml
cp config/redis.yml.sample config/redis.yml
cp config/hydra-ldap.yml.sample config/hydra-ldap.yml

If you're using SQLite, a vanilla Redis installation, and the Hydra-Jetty Solr and Fedora components (see below), you should not need to tweak the database.yml, fedora.yml, solr.yml, or redis.yml files.

If you're planning to use LDAP for user account information and groups, you will need to know some information about your LDAP service, which will go into hydra-ldap.yml.

Create database

rake db:create

Generate a new secret token

rake scholarsphere:generate_secret

Migrate relational database

rake db:migrate

To use the built-in Fedora and Solr instances, get the bundled hydra-jetty, configure it, & fire it up

rake jetty:clean
rake sufia:jetty:config
rake jetty:start

Start the resque-pool workers (needed for characterization, audit, and resolrization services)

resque-pool --daemon --environment development start

Run the app server (the bundled app server is Unicorn)

rails server

Browse to http://localhost:3000/ and you should see ScholarSphere!

Usage Notes

Enabling Zotero integration

To enable integration with Zotero (more about that feature), here are the required steps:

  1. Register an OAuth client for ScholarSphere with Zotero. Note the client key and secret for later.
  2. Install and start arkivo-sufia in a server environment. Note the server hostname and IP address arkivo-sufia is running on for later.
  3. Create Arkivo tokens for all existing users in your application database via (assuming production environment) RAILS_ENV=production rake sufia:user:tokens
  4. Set environment variables for the Zotero OAuth client key and secret you generated in step 1 above, called ZOTERO_CLIENT_KEY and ZOTERO_CLIENT_SECRET. The config/zotero.yml file depends on these environment variables. (If you'd prefer not to manage these via env vars, you are also welcome to handle zotero.yml in a different way, e.g., the way we handle our other configs that are hidden from version control.) Make sure that these variables are available to the user running the ScholarSphere rails server.
  5. Copy config/arkivo.yml.sample to config/arkivo.yml and set it up with the hostname and IP address you noted in step 2 above. (This is how we handle most of our production configs already, so this just mirrors current practice.)
  6. Edit config/initializers/arkivo_constraint.rb to allow connections to the Arkivo API. If you're just testing, you can have the matches? method return true but do not do this in production! This effectively allows any client unauthenticated access to an API that permits adding, modifying, and removing content. In production, you can use the routing constraint to ensure that the API is accessible only to the specific IP address of the host running the Arkivo service. You can do that by having the matches? method return something like request.remote_ip == '10.0.0.3'. (If you're not comfortable having a back-end IP address stored in a file that is under version control, you can also make use of an environment variable here, e.g., request.remote_ip == ENV['ARKIVO_HOST_IP'], in which case you'll need to make sure the server environment has that variable set to the proper value.)
  7. Restart the Rails server and all background jobs, and you should now be able to OAuth to Zotero via the Edit Profile screen, at which point the magic should start happening.

Auditing All Datastreams

To audit the digital signatures of every version of every object in the repository, run the following command

script/audit_repository

You'll probably want to schedule this regularly (e.g., via cron) in production environments. Note that this does not force an audit -- it respects the value of max_days_between_audits in application.rb. Also note that if you want to run this on any environment other than development, you will need to call the script with RAILS_ENV=environment in front.

Re-solrize All Objects

If for some reason you need to force all objects to be re-solrizer, perhaps because you have updated which fields are facetable and which are not, ScholarSphere contains a rake task that kicks off a re-solrization asynchronously via a Resque job.

 rake scholarsphere:resolrize

Note that if you want to run this on any environment other than development, you will need to call the script with RAILS_ENV=environment in front.

Characterize All Uncharacterized Datastreams

In the event that some objects have not undergone characterization (for whatever reason), there is a rake task that sweeps through the entire repository looking for objects that lack a characterization datastream. For each object that lacks this datastream, a CharacterizationJob that will characterize and thumbnailize the object is queued up.

 rake scholarsphere:characterize

Note that if you want to run this on any environment other than development, you will need to call the script with RAILS_ENV=environment in front.

Export Metadata as RDF/XML

There is a rake task that exports the metadata of every object that is readable by the public to the RDF/XML format. This might be useful as an export mechanism, e.g., to Summon or a similar discovery system.

 rake scholarsphere:export:rdfxml

Note that if you want to run this on any environment other than development, you will need to call the script with RAILS_ENV=environment in front.

Harvesting Authorities Locally

ScholarSphere supports "authority suggestion," a feature that links controlled vocabularies to descriptive metadata elements. This provides functionality both for mapping string values to URIs and for populating dropdowns in metadata form fields, e.g., if a user types "Cro" into a subject field, they might see a list that includes "Croatian independence," the subject they were going to type out.

In order to avoid network latency, these vocabularies are harvested in advance and stuffed into the ScholarSphere relational database for easy and quick lookups.

To get a sense for how this works, pull in database fixtures containing pre-harvested authorities

rake db:data:load

To harvest more authorities:

  1. Harvest the authority (See available harvest tasks via rake -T scholarsphere:harvest) -- N.B. depending on the size of the vocabulary, this may take a very long time, especially if you're using a slower database such as SQLite.
  2. (OPTIONAL) Generate fixtures so other instances don't need to re-harvest (See available database tasks via rake -T db)
  3. Register the vocabulary with a domain term in generic_file_rdf_datastream.rb (See the bottom of the file for examples)

Contribute