Skip to content

Commit

Permalink
Merge pull request #78 from scossu/development
Browse files Browse the repository at this point in the history
Development
  • Loading branch information
scossu committed Oct 8, 2018
2 parents 75fcf0e + df69b87 commit f2edbb5
Show file tree
Hide file tree
Showing 86 changed files with 88,471 additions and 2,244 deletions.
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -103,5 +103,11 @@ venv.bak/
# mypy
.mypy_cache/

# Pytest
.pytest_cache/

# Default LAKEsuperior data directories
/data
#/lakesuperior/store/base_lmdb_store.c
#/lakesuperior/store/ldp_rs/lmdb_triplestore.c
!ext/lib
13 changes: 9 additions & 4 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
sudo: false
language: python
python:
- "3.5"
- "3.6"
matrix:
include:
- python: 3.6
- python: 3.7
dist: xenial
sudo: true

install:
- pip install -e .
script:
Expand All @@ -15,6 +20,6 @@ deploy:
on:
tags: true
branch: master
python: "3.5"
python: "3.6"
distributions: "bdist_wheel"

1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
include README.rst
include LICENSE
include fcrepo
graft lakesuperior/data/bootstrap
graft lakesuperior/endpoints/templates
graft lakesuperior/etc.defaults
19 changes: 12 additions & 7 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
LAKEsuperior
Lakesuperior
============

|build status| |docs| |pypi|
|build status| |docs| |pypi| |codecov|

LAKEsuperior is an alternative `Fedora
Lakesuperior is an alternative `Fedora
Repository <http://fedorarepository.org>`__ implementation.

Fedora is a mature repository software system historically adopted by
Expand All @@ -14,7 +14,7 @@ any type of binary files and their metadata in Linked Data format.
Guiding Principles
------------------

LAKEsuperior aims at being an uncomplicated, efficient Fedora 4
Lakesuperior aims at being an uncomplicated, efficient Fedora 4
implementation.

Its main goals are:
Expand All @@ -33,9 +33,9 @@ Key features
- Very stable persistence layer based on
`LMDB <https://symas.com/lmdb/>`__ and filesystem. Fully
ACID-compliant writes guarantee consistency of data.
- Term-based search (*planned*) and SPARQL Query API + UI
- Term-based search and SPARQL Query API + UI
- No performance penalty for storing many resources under the same
container
container, or having one resource link to many URIs
- Extensible provenance metadata tracking
- Multi-modal access: HTTP (REST), command line interface and native Python
API.
Expand Down Expand Up @@ -65,7 +65,12 @@ including installation instructions.
.. |docs| image:: https://readthedocs.org/projects/lakesuperior/badge/
:alt: Documentation Status
:target: https://lakesuperior.readthedocs.io/en/latest/?badge=latest

.. |pypi| image:: https://badge.fury.io/py/lakesuperior.svg
:alt: PyPI Package
:target: https://badge.fury.io/py/lakesuperior

.. |codecov| image:: https://codecov.io/gh/scossu/lakesuperior/branch/master/graph/badge.svg
:alt: Code coverage
:target: https://codecov.io/gh/scossu/lakesuperior

11 changes: 9 additions & 2 deletions conftest.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import logging
import pytest

from os import makedirs, path
Expand Down Expand Up @@ -35,17 +36,19 @@ def db(app):
'''
Set up and tear down test triplestore.
'''
makedirs(data_dir, exist_ok=True)
env.app_globals.rdfly.bootstrap()
env.app_globals.nonrdfly.bootstrap()
print('Initialized data store.')
env.app_globals.rdf_store.open_env(
env.app_globals.rdf_store.env_path)

yield env.app_globals.rdfly

# TODO improve this by using tempfile.TemporaryDirectory as a context
# manager.
print('Removing fixture data directory.')
rmtree(data_dir)
env.app_globals.rdf_store.close_env()
env.app_globals.rdf_store.destroy()


@pytest.fixture
Expand All @@ -56,3 +59,7 @@ def rnd_img():
return random_image(8, 256)


@pytest.fixture(autouse=True)
def disable_logging():
"""Disable logging in all tests."""
logging.disable(logging.INFO)
31 changes: 16 additions & 15 deletions docs/about.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
About LAKEsuperior
About Lakesuperior
==================

LAKEsuperior is an alternative `Fedora
Lakesuperior is an alternative `Fedora
Repository <http://fedorarepository.org>`__ implementation.

Fedora is a mature repository software system historically adopted by
Expand All @@ -12,7 +12,7 @@ any type of binary files and their metadata in Linked Data format.
Guiding Principles
------------------

LAKEsuperior aims at being an uncomplicated, efficient Fedora 4
Lakesuperior aims at being an uncomplicated, efficient Fedora 4
implementation.

Its main goals are:
Expand All @@ -33,54 +33,55 @@ Key features
- Very stable persistence layer based on
`LMDB <https://symas.com/lmdb/>`__ and filesystem. Fully
ACID-compliant writes guarantee consistency of data.
- Term-based search (*planned*) and SPARQL Query API + UI
- Term-based search and SPARQL Query API + UI
- No performance penalty for storing many resources under the same
container; no
`kudzu <https://www.nature.org/ourinitiatives/urgentissues/land-conservation/forests/kudzu.xml>`__
container; no `kudzu
<https://www.nature.org/ourinitiatives/urgentissues/land-conservation/forests/kudzu.xml>`__
pairtree segmentation [#]_
- Extensible :doc:`provenance metadata <model>` tracking
- :doc:`Multi-modal access <architecture>`: HTTP
(REST), command line interface and native Python API.
- Fits in a pocket: you can carry 50M triples in an 8Gb memory stick.
- Fits in a pocket: you can carry 50M triples in an 8Gb memory stick [#]_.

Implementation of the official `Fedora API
specs <https://fedora.info/spec/>`__ (Fedora 5.x and beyond) is not
foreseen in the short term, however it would be a natural evolution of
this project if it gains support.

Please make sure you read the :doc:`Delta
document <fcrepo4_deltas>` for divergences with the
official Fedora4 implementation.
Please make sure you read the :doc:`Delta document <fcrepo4_deltas>` for
divergences with the official Fedora4 implementation.

Target Audience
---------------

LAKEsuperior is for anybody who cares about preserving data in the long
Lakesuperior is for anybody who cares about preserving data in the long
term.

Less vaguely, LAKEsuperior is targeted at who needs to store large
Less vaguely, Lakesuperior is targeted at who needs to store large
quantities of highly linked metadata and documents.

Its Python/C environment and API make it particularly well suited for
academic and scientific environments who would be able to embed it in a
Python application as a library or extend it via plug-ins.

LAKEsuperior is able to be exposed to the Web as a `Linked Data
Lakesuperior is able to be exposed to the Web as a `Linked Data
Platform <https://www.w3.org/TR/ldp-primer/>`__ server. It also acts as
a SPARQL query (read-only) endpoint, however it is not meant to be used
as a full-fledged triplestore at the moment.

In its current status, LAKEsuperior is aimed at developers and hands-on
In its current status, Lakesuperior is aimed at developers and hands-on
managers who are interested in evaluating this project.

Status and development
----------------------

LAKEsuperior is in **alpha** status. Please see the `project
Lakesuperior is in **alpha** status. Please see the `project
issues <https://github.com/scossu/lakesuperior/issues>`__ list for a
rudimentary road map.

--------------

.. [#] However if your client splits pairtrees upstream, such as Hyrax does,
that obviously needs to change to get rid of the path segments.
.. [#] Your mileage may vary depending on the variety of your triples.
2 changes: 1 addition & 1 deletion docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ API Documentation
Main Interface
--------------

The LAKEsuperior API modules of most interest for a client are:
The Lakesuperior API modules of most interest for a client are:

- :mod:`lakesuperior.api.resource`
- :mod:`lakesupeiror.api.query`
Expand Down
14 changes: 7 additions & 7 deletions docs/architecture.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
LAKEsuperior Architecture
Lakesuperior Architecture
=========================

LAKEsuperior is written in Python. It is not excluded that parts of the
Lakesuperior is written in Python. It is not excluded that parts of the
code may be rewritten in `Cython <http://cython.readthedocs.io/>`__ for
performance.

Multi-Modal Access
------------------

LAKEsuperior services and data are accessible in multiple ways:
Lakesuperior services and data are accessible in multiple ways:

- Via HTTP. This is the canonical way to interact with LDP resources
and conforms quite closely to the Fedora specs (currently v4).
Expand All @@ -17,18 +17,18 @@ LAKEsuperior services and data are accessible in multiple ways:
- Via a Python API. This method allows to use Python scripts to access
the same methods available to the two methods above in a programmatic
way. It is possible to write Python plugins or even to embed
LAKEsuperior in a Python application, even without running a web
Lakesuperior in a Python application, even without running a web
server.

Architecture Overview
---------------------

.. figure:: assets/lakesuperior_arch.png
:alt: LAKEsuperior Architecture
:alt: Lakesuperior Architecture

LAKEsuperior Architecture
Lakesuperior Architecture

The LAKEsuperior REST API provides access to the underlying Python API.
The Lakesuperior REST API provides access to the underlying Python API.
All REST and CLI operations can be replicated by a Python program
accessing this API.

Expand Down
Binary file modified docs/assets/lakesuperior_arch.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
78 changes: 57 additions & 21 deletions docs/cli.rst
Original file line number Diff line number Diff line change
@@ -1,22 +1,29 @@
Command Line Reference
======================

LAKEsuperior comes with some command-line tools aimed at several purposes.
Lakesuperior comes with some command-line tools aimed at several purposes.

If LAKEsuperior is installed via ``pip``, all tools can be invoked as normal
If Lakesuperior is installed via ``pip``, all tools can be invoked as normal
commands (i.e. they are in the virtualenv ``PATH``).

The tools are currently not directly available on Docker instances (*TODO add
instructions and/or code changes to access them*).

``lsup-server``
---------------

Single-threaded server. Use this for testing, debugging, or to multiplex via
WSGI in a Windows environment. For non-Windows production environments, use
``fcrepo``.

``fcrepo``
----------

This is the main server command. It has no parameters. The command spawns
Gunicorn workers (as many as set up in the configuration) and can be sent in
the background, or started via init script.

The tool must be run in the same virtual environment LAKEsuperior
The tool must be run in the same virtual environment Lakesuperior
was installed in (if it was)—i.e.::

source <virtualenv root>/bin/activate
Expand All @@ -27,25 +34,34 @@ In the case an init script is used, ``coilmq`` (belonging to a 3rd party
package) needs to be launched as well; unless a message broker is already set
up, or if messaging is disabled in the configuration.

**Note:** This command is not available in Windows because GUnicorn is not
available in Windows. Windows users should look for alternative WSGI servers
to run the single-threaded service (``lsup-server``) over multiple processes
and/or threads.

**Note:** This is the only command line tool that is not added to the ``PATH``
environment variable in Unix systems (beecause it is not cross-platform). It
must be invoked by using its full path.

``lsup-admin``
--------------

``lsup-admin`` is the principal repository management tool. It is
self-documented, so this is just a redundant overview::

$ lsup-admin
Usage: lsup-admin [OPTIONS] COMMAND [ARGS]...
$ lsup-admin
Usage: lsup-admin [OPTIONS] COMMAND [ARGS]...

Options:
--help Show this message and exit.
Options:
--help Show this message and exit.

Commands:
bootstrap Bootstrap binary and graph stores.
check_fixity [STUB] Check fixity of a resource.
check_refint Check referential integrity.
cleanup [STUB] Clean up orphan database items.
migrate Migrate an LDP repository to LAKEsuperior.
stats Print repository statistics.
Commands:
bootstrap Bootstrap binary and graph stores.
check_fixity [STUB] Check fixity of a resource.
check_refint Check referential integrity.
cleanup [STUB] Clean up orphan database items.
migrate Migrate an LDP repository to Lakesuperior.
stats Print repository statistics.

All entries marked ``[STUB]`` are not yet implemented, however the
``lsup_admin <command> --help`` command will issue a description of what
Expand All @@ -59,18 +75,38 @@ native Python API.
``lsup-benchmark``
------------------

``lsup-benchmark`` is used to run performance tests in a predictable way.

The command has no options but prompts the user for a few settings
interactively (N.B. this may change in favor of parameters).
This command is used to run performance tests in a predictable way.

The command line options can be queried with the ``--help`` option::

Usage: lsup-benchmark [OPTIONS]

Options:
-e, --endpoint TEXT LDP endpoint. Default: http://localhost:8000/ldp
-c, --count INTEGER Number of resources to ingest. Default: {def_ct}
-p, --parent TEXT Path to the container resource under which the new
resources will be created. It must begin with a
slash (`/`) character. Default: /pomegranate
-d, --delete-container Delete container resource and its children if
already existing. By default, the container is not
deleted and new resources are added to it.
-m, --method TEXT HTTP method to use. Case insensitive. Either PUT
or POST. Default: PUT
-s, --graph-size INTEGER Number of triples in each graph. Default: 200
-t, --resource-type TEXT Type of resources to ingest. One of `r` (only LDP-
RS, i.e. RDF), `n` (only LDP-NR, i.e. binaries),
or `b` (50/50% of both). Default: r
-p, --plot Plot a graph of ingest timings. The graph figure
is displayed on screen with basic manipulation and
save options.
--help Show this message and exit.

The benchmark tool is able to create RDF sources, or non-RDF, or an equal mix
of them, via POST or PUT, in the currently running LAKEsuperior server. It
runs single-threaded.
of them, via POST or PUT, in a given lDP endpoint. It runs single threaded.

The RDF sources are randomly generated graphs of consistent size and
complexity. They include a mix of in-repository references, literals, and
external URIs. Each graph has 200 triples.
external URIs. Each graph has 200 triples by default.

The non-RDF sources are randomly generated 1024x1024 pixel PNG images.

Expand Down

0 comments on commit f2edbb5

Please sign in to comment.