ShEx interpreter for ShEx 2.0
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
ancilliary
notebooks
pyshex
tests
.gitignore
README.md
license.txt
requirements.txt
setup.py
tox.ini

README.md

Python implementation of ShEx 2.0

Pyversions

PyPi

DOI

This package is a reasonably literal implementation of the Shape Expressions Language 2.0. It can parse and "execute" ShExC and ShExJ source.

Revisions

  • 0.2.dev3 -- added SchemaEvaluator and other tweaks. There are still some unit tests that fail -- beware
  • 0.3.0 -- Fix several issues. Still does not pass all unit tests -- see test_manifest.py for details
  • 0.4.0 -- Added sparql_slurper capabilities.
  • 0.4.1 -- Resolves several issues with reactome and disease test cases
  • 0.4.2 -- Fix issues #13 (missing start) and #14 (Inconsistent shape causes loop)
  • 0.4.3 -- Fix issues #16 and #15 and some refactoring
  • 0.5.0 -- First cut at returning fail reasons... some work still needed
  • 0.5.1 -- Update shexc parser to include multi-line comments and bug fixes
  • 0.5.2 -- Issue with installer - missed the parse_tree package
  • 0.5.3 -- make sparql_slurper a dependency
  • 0.5.4 -- Fixed long recursion issue with blood pressure example
  • 0.5.5 -- Fixed zero cardinality issue (#20)
  • 0.5.6 -- Added CLI entry point and cleaned up error reporting
  • 0.5.7 -- Throw an error on an invalid focus node (#23)
  • 0.5.9 -- Candidate for ShEx 2.1
  • 0.5.10 -- Fixed evaluator to load files, strings, etc. as ShEx
  • 0.5.11 -- Added Collections Flattening graph option to evaluator.

Installation

pip install PyShEx

Note: If you need to escape single quotes in RDF literals, you will need to install the bleeding edge of rdflib:

pip uninstall rdflib
pip install git+https://github.com/rdflib/rdflib

Unfortunately, however, rdflib-jsonld is NOT compatible with the bleeding edge rdflib, so you can't use a json-ld parser in this situation.

evalshex CLI

> shexeval -h
usage: shexeval [-h] [-f FORMAT] [-s START] [-fn FOCUS] [-d] [-ss] [-cf]
                rdf shex

positional arguments:
  rdf                   Input RDF file or SPARQL endpoint if slurper option
                        set
  shex                  ShEx specification

optional arguments:
  -h, --help            show this help message and exit
  -f FORMAT, --format FORMAT
                        Input RDF Format
  -s START, --start START
                        Start shape
  -fn FOCUS, --focus FOCUS
                        RDF focus node
  -d, --debug           Add debug output
  -ss, --slurper        Use SPARQL slurper graph
  -cf, --flattener      Use RDF Collections flattener graph

General Layout

The root pyshex package is subdivided into:

The ShEx schema definitions for this package come from ShExJSG

We are trying to keep the python as close as possible to the (semi-)formal specification. As an example, the statement:

Se is a ShapeAnd and for every shape expression se2 in shapeExprs, satisfies(n, se2, G, m)

is implemented in Python as:

        ...
if isinstance(se, ShExJ.ShapeAnd):
    return satisfiesShapeAnd(cntxt, n, se)
        ...
def satisfiesShapeAnd(cntxt: Context, n: nodeSelector, se: ShExJ.ShapeAnd) -> bool:
    return all(satisfies(cntxt, n, se2) for se2 in se.shapeExprs)

Dependencies

This package is built using:

Current status

Performance has been improved, but our current implementation of the sparql_slurper is entirely too fine-grained. Our next steps include:

  1. Get non-conformance reasons into the responses
  2. Improve diagnostic and debugging tools
  3. Add a time-out to catch really long evaluations
  4. Adjust the slurper to pull larget chunks as needed and then refine on the retrieval end

This implementation passes all of the tests in the master branch of validation/manifest.ttl with the following exceptions:

At the moment, there are 1077 tests, of which:

  • 970 pass
  • 107 are skipped - reasons:
  1. (52) sht:toldBNode, sht:LexicalBNode and sht:BNodeShapeLabel test non-blank blank nodes (rdflib does not preserve bnode "identity")
  2. (24) sht:OutsideBMP -- test uses multi byte unicode (two aren't tagged)
  3. (16) Uses ShEx 2.1 IMPORT feature -- not yet implemented (three aren't tagged)
  4. (3) Focus is a Literal -- not yet implemented
  5. (5) Uses ShEx 2.1 INCLUDE feature -- not yet implemented
  6. (3) Uses manifest shapemap feature -- not yet implemented
  7. (2) sht:relativeIRI -- this isn't a real problem, but we havent taken time to deal with this in the test harness
  8. (2) rdflib has a parsing error when escaping single quotes. (Issue submitted, awaiting release)

As mentioned above, at the moment this is as literal an implementation of the specification as was sensible. This means, in particular, that we are less than clever when it comes to partition management.

Notes

manifest_tester.py is the current testing tool. Once we get through the complete set of tests we'll create a command line tool and a UI

Note: At the moment we're just returning pass/fail. We need to find documentation about what the return document should look like before we start returning detailed reports.