Libris XL
Clone or download
niklasl Fix fix-rda-links whelktool script
- Correct base for rda terms
- Use new findCanonicalId utility function
- Factor out caching of correctId
Latest commit 48dd0c1 Oct 19, 2018
Permalink
Failed to load latest commit information.
apix_export Factor out embellish boilerplate to Whelk Apr 13, 2018
apix_server Revert "Remove left over symlink." Oct 4, 2018
batchimport Make batch import integration tests less dangerous to run by using a … Oct 17, 2018
gradle/wrapper Update Gradle to 4.5.1 and adapt subprojects Feb 7, 2018
harvesters Set up whelk-core as subproject Feb 7, 2018
importers The Java Date constructor takes milliseconds since EPOCH not seconds … Oct 10, 2018
librisxl-tools Fix recreate_storage to actually drop all existing tables and indexes. Oct 11, 2018
marc_export Fix a marc_export bug, where the itemOf link to a deleted bib record … Oct 17, 2018
oaipmh Fix XML apis versioning everywhere else where it could (hopefully) po… Oct 4, 2018
rest Remove reference to dynamic version of isbn-tools. Oct 10, 2018
transform Make global changes not save/write records that have not been changed. Oct 8, 2018
whelk-core Merge pull request #234 from libris/feature/remodel-termComponentList Oct 19, 2018
whelktool Fix fix-rda-links whelktool script Oct 19, 2018
.gitignore Add Session.vim to .gitignore Oct 19, 2018
GRAPHSTORE.md Add example of querying using curl to GRAPHSTORE.md Oct 19, 2018
LDDB.md Add snippets of output to example commands Sep 4, 2018
LICENSE.txt Added license file Jun 3, 2016
README.md Update README. Mar 29, 2018
gradlew Add gradle wrapper and update .gitignore + README.md Dec 22, 2017
gradlew.bat Add gradle wrapper and update .gitignore + README.md Dec 22, 2017
gretty.plugin Change deprecated gradle plugin jetty to gretty Dec 22, 2017
secret.properties.in Change ES port to the "web" port 9200 in .in file Feb 21, 2018
system_overview.dot Edit labels and dots in system overview Jan 18, 2018

README.md

Libris XL



Parts

The project consists of:

  • Applications
    • apix_export/ Exports data from Libris XL back to Voyager (the old system).
    • harvesters/ An OAIPMH harvester. Servlet web application.
    • importers/ Java application to load or reindex data into the system.
    • oaipmh/ Servlet web application. OAIPMH service for Libris XL
    • rest/ A servlet web application. The REST and other HTTP APIs
  • Tools
    • librisxl-tools/ Configuration and scripts used for setup, maintenance and operations.

Related external repositories:

  • The applications above depend on the Whelk Core repository.

  • Core metadata to be loaded is managed in the definitions repository.

  • Also see LXLViewer, our application for viewing and editing the datasets through the REST API.

Dependencies

  1. Gradle

    No setup required. Just use the checked-in gradle wrapper to automatically get the specified version of Gradle and Groovy.

  2. Elasticsearch

    For OS X:

    $ brew install elasticsearch
    

    For Debian, follow instructions on https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-repositories.html first, then:

    apt-get install elasticsearch
    

    For Windows, download and install: https://www.elastic.co/downloads/past-releases/elasticsearch-2-4-1

    NOTE: You will also need to set cluster.name in /etc/elasticsearch/elasticsearch.yml to something unique on the network. This name is later specified when you configure the system. Don't forget to restart Elasticsearch after the change.

    For Elasticsearch version 2.2 and greater, you must also install the delete-by-query plugin. This functionality was removed in ElasticSearch 2.0 and needs to be added as a plugin:

    $ /path/to/elasticsearch/bin/plugin install delete-by-query
    

    NOTE: You will need to reinstall the plugin whenever you upgrade ElasticSearch.

  3. PostgreSQL (version 9.4 or later)

    # OS X
    $ brew install postgresql
    # Debian
    $ apt-get install postgresql postgresql-client
    

    Windows: Download and install https://www.postgresql.org/download/windows/

Setup

Configuring secrets

Use librisxl/secret.properties.in as a starting point:

$ cd $LIBRISXL
$ cp secret.properties.in secret.properties
$ vim secret.properties

Setting up PostgreSQL

  1. Ensure PostgreSQL is started

    E.g.:

    $ postgres -D /usr/local/var/postgres
    
  2. Create database

    # You might need to become the postgres user (e.g. sudo -u postgres bash) first
    $ createdb whelk_dev
    

    (Optionally, create a database user)

    $ psql whelk_dev
    psql (9.5.4)
    Type "help" for help.
    
    whelk=# CREATE SCHEMA whelk_dev;
    CREATE SCHEMA
    whelk=# CREATE USER whelk PASSWORD 'whelk';
    CREATE ROLE
    whelk=# GRANT ALL ON SCHEMA whelk_dev TO whelk;
    GRANT
    whelk=# GRANT ALL ON ALL TABLES IN SCHEMA whelk_dev TO whelk;
    GRANT
    whelk=# \q
    

Setting up Elasticsearch

TODO: This is now generated! This step can probably be omitted. (See the devops repo or the setup-dev-whelk.sh script for details.)

Create index and mappings:

$ cd $LIBRISXL
$ curl -XPOST http://localhost:9200/whelk_dev -d@librisxl-tools/elasticsearch/libris_config.json

NOTE: Windows users can install curl by:

$ choco install curl

Import test data

Check out the Devops repository, which is private (ask a team member for access and put it in the same directory as the 'librisxl' repo):

For *NIX:

$ cd devops
$ pip install -r requirements.txt
$ fab conf.xl_local app.whelk.reload_example_data:force=True

Running

To start the CRUD part of the whelk, run the following commands:

*NIX-systems:

$ cd $LIBRISXL/rest
$ export JAVA_OPTS="-Dfile.encoding=utf-8"
$ ../gradlew -Dxl.secret.properties=../secret.properties appRun

Windows:

$ cd $LIBRISXL/rest
$ setx JAVA_OPTS "-Dfile.encoding=utf-8"
$ ../gradlew.bat -Dxl.secret.properties=../secret.properties appRun

The system is then available on http://localhost:8180.

Maintenance

Automated setup using setup-dev-whelk.sh - DEPRECATED - use the above devops-repository method instead.

For very specific purposes where you only want to do certain parts of the data loading process this script is sometimes useful, but it is not the correct, nor the "offical" way of loading example data. It's used like this:

$ ./librisxl-tools/scripts/setup-dev-whelk.sh -n <database name> \
    [-C <createdb user>] [-D <database user>] [-F]

Where <database name> is used both for PostgreSQL and ElasticSearch, <createdb user> is the user to run createdb/dropdb as using sudo (optional), and <database user> is the PostgreSQL user (also optional). -F (also optional) tells the script to rebuild everything, which is handy if the different parts have become stale.

E.g.:

$ ./librisxl-tools/scripts/setup-dev-whelk.sh -n whelk_dev \
     -C postgres -D whelk -F

Development Workflow

If you need to work locally (e.g. in this or the "definitions" repo) and perform specific tests, you can use this workflow:

$ (cd ../definitions && .venv/bin/python datasets.py -l)
$ (cd importers/ && ../gradlew jar -DuseLocalDeps)
$ ./librisxl-tools/scripts/setup-dev-whelk.sh -n whelk_dev

Important: ensure the name of the whelk (here "whelk_dev") is the same as the one configured in your local ./secret.properties config file.

Explanation: Since we don't use -F to force rebuilding data and the importer, the first two commands do that. Depending on what you're doing, you can omit either one (or both if you're developing in this, the librisxl repo.)

Clearing out existing definitions

To clear out any existing definitions (before reloading them), run this script (or see the source for details):

$ ./librisxl-tools/scripts/manage-whelk-storage.sh -n whelk_dev --nuke-definitions

New Elasticsearch config

If a new index is to be set up, and unless you run locally in a pristine setup, or use the recommended devops-method for loading data you need to PUT the config to the index, like:

$ curl -XPUT http://localhost:9200/indexname_versionnumber \
    -d @librisxl-tools/elasticsearch/libris_config.json

Create an alias for your index

$ curl -XPOST http://localhost:9200/_aliases \
    -d  '{"actions":[{"add":{"index":"indexname_versionnumber","alias":"indexname"}}]}'

(To replace an existing setup with entirely new configuration, you need to delete the index curl -XDELETE http://localhost:9200/<indexname>/ and read all data again (even locally).)

Format updates

If the MARC conversion process has been updated and needs to be run anew, the only option is to reload the data from vcopy using the importers application.

Statistics

Produce a stats file (here for bib) by running:

$ cd importers && ../gradlew build
$ RECTYPE=bib && time java -Dxl.secret.properties=../secret.properties -Dxl.mysql.properties=../mysql.properties -jar build/libs/vcopyImporter.jar vcopyjsondump $RECTYPE | grep '^{' | pypy ../librisxl-tools/scripts/get_marc_usage_stats.py $RECTYPE /tmp/usage-stats-$RECTYPE.json