PEP 426: Define a JSON-LD context as part of the proposal #31

Open
ncoghlan opened this Issue Sep 8, 2015 · 25 comments

Comments

Projects
None yet
5 participants
@ncoghlan
Member

ncoghlan commented Sep 8, 2015

I finally found time to investigate JSON-LD as Wes Turner has regularly suggested. It does look like a good fit for what I want to achieve with the metadata 2.0 spec: http://www.w3.org/TR/json-ld/#basic-concepts

Also useful to me was this blog post from the JSON-LD lead editor: http://manu.sporny.org/2014/json-ld-origins-2/

I've long ignored the semantic web people because they tend to design and create overengineered solutions that are completely impractical for real world use. Sporny's post persuaded me that JSON-LD wasn't like that, and hence worth investigating further.

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Sep 8, 2015

So, this is somewhat of a frequent documentation need,
and an opportunity for linked requirements traceability (#LinkedData (EDIT: #LinkedReproducibility #PEP426JSONLD)):

| Homepage: ...
| Src: git https://bitbucket.org/./.
| Download: .../download/
| Issues: bitbucket.org/././issues
| Docs: `<https://containsparens_(disambiguation)>`__
[... add'l ad-hoc attributes]

Before writing this as (most minimal, ordered) inline blocks, I wrote 'bobcat' (which requires FuXi for OWL schema reasoning) and one day drafted some thoughts for a 'sphinxcontrib-rdf' extension to add roles and directives.

  • bobcat -> ``RST` -> Sphinx (Use case: add a Appendix listing component RDF attributes to system docs)
  • sphinxcontrib-rdf <-> Sphinx

More practically, how do I simulate pip install without running any setup.py files (traverse and solve from the given Requirements rules)?

And then positive externalities of exposing JSON[-LD] that is schema.org compatible:

  • It's possible that search engines could index schema.org/SoftwareApplication (from JSON-LD and/or RDFa in crawled pages)
  • Other tools can retrieve metadata in a structured way (versioncheck (pip-tools pip-sync, ))
  • Upstream and downstream packages could be linked with URIs (and provenance metadata, with signatures)

An broader discussion for/with really tools in any language for/with RDFJS: https://text.allmende.io/p/rdfjs (see ### Classes)

So, this is somewhat of a frequent documentation need,
and an opportunity for linked requirements traceability (#LinkedData (EDIT: #LinkedReproducibility #PEP426JSONLD)):

| Homepage: ...
| Src: git https://bitbucket.org/./.
| Download: .../download/
| Issues: bitbucket.org/././issues
| Docs: `<https://containsparens_(disambiguation)>`__
[... add'l ad-hoc attributes]

Before writing this as (most minimal, ordered) inline blocks, I wrote 'bobcat' (which requires FuXi for OWL schema reasoning) and one day drafted some thoughts for a 'sphinxcontrib-rdf' extension to add roles and directives.

  • bobcat -> ``RST` -> Sphinx (Use case: add a Appendix listing component RDF attributes to system docs)
  • sphinxcontrib-rdf <-> Sphinx

More practically, how do I simulate pip install without running any setup.py files (traverse and solve from the given Requirements rules)?

And then positive externalities of exposing JSON[-LD] that is schema.org compatible:

  • It's possible that search engines could index schema.org/SoftwareApplication (from JSON-LD and/or RDFa in crawled pages)
  • Other tools can retrieve metadata in a structured way (versioncheck (pip-tools pip-sync, ))
  • Upstream and downstream packages could be linked with URIs (and provenance metadata, with signatures)

An broader discussion for/with really tools in any language for/with RDFJS: https://text.allmende.io/p/rdfjs (see ### Classes)

@ncoghlan

This comment has been minimized.

Show comment
Hide comment
@ncoghlan

ncoghlan Sep 8, 2015

Member

Also of potential interest would be linking this in to the ISO/IEC Software Identification effort: http://tagvault.org/about/

Member

ncoghlan commented Sep 8, 2015

Also of potential interest would be linking this in to the ISO/IEC Software Identification effort: http://tagvault.org/about/

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Sep 11, 2015

Also of potential interest would be linking this in to the ISO/IEC Software Identification effort: http://tagvault.org/about/

Do they have URNs that could be the object of a (pypi:projectname, ex:, urn:x-tagvault:xyz) triple?

Also of potential interest would be linking this in to the ISO/IEC Software Identification effort: http://tagvault.org/about/

Do they have URNs that could be the object of a (pypi:projectname, ex:, urn:x-tagvault:xyz) triple?

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Sep 11, 2015

The install_requires and extras_require edges need to be in the JSON[-LD]

https://github.com/ipython/ipython/blob/master/setup.py#L182

  • Note that here these variables are conditional based upon e.g. platformstr parameters.
    • Is it possible to serialize these edges to JSON at next build / release time?

The total graph of install_requires and extras_require
is the sum of each of the built eggs' JSON[-LD]
representations of runtime setup.py state.

  • { } Generate a separate JSON-LD
  • { } Aggregate all JSON-LD metadata sets for each instance of each version of each package

The install_requires and extras_require edges need to be in the JSON[-LD]

https://github.com/ipython/ipython/blob/master/setup.py#L182

  • Note that here these variables are conditional based upon e.g. platformstr parameters.
    • Is it possible to serialize these edges to JSON at next build / release time?

The total graph of install_requires and extras_require
is the sum of each of the built eggs' JSON[-LD]
representations of runtime setup.py state.

  • { } Generate a separate JSON-LD
  • { } Aggregate all JSON-LD metadata sets for each instance of each version of each package
@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Sep 11, 2015

  • { } Generate a separate JSON-LD
  • { } Generate a separate JSON-LD
@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Sep 11, 2015

  • I think some level of schema.org interoperability should be a goal:

  • Also of interest: "[Distutils] pip/warehouse feature idea: "help needed"
    https://mail.python.org/pipermail/distutils-sig/2015-April/026108.html

    On Sat, Apr 11, 2015 at 1:14 PM, Wes Turner <wes.turner at gmail.com> wrote:

    On Sat, Apr 11, 2015 at 12:29 PM, Marc Abramowitz
    wrote:

    Interesting. One of the things that would help with getting people to
    help and is in the PEPs but last I checked wasn't yet implemented is the
    metadata that allows putting in all kinds of URLs and the ones I'm
    primarily thinking of here are the source code repository URL and the issue
    tracker URL.

    http://legacy.python.org/dev/peps/pep-0459/:

    [...]

    A JSON-LD context would be outstanding.

    I personally sigh when I see a PyPI page that lists its URL as said PyPI
    page as this seems redundant and not useful and I'd rather see a GitHub or
    Bitbucket URL (or maybe a foo-project.org or readthedocs URL, but I the
    repo URL is usually what I'm most interested in).

    If we had the metadata with all the different kinds of URLs and the tools
    to show it and search it, then it would be clearer what to put where and
    would make it easier for consumers to find what they're looking for.

    Another thought I had while reading your email was the OpenHatch project
    and if there could be some tie-in with that.

    It also would be interesting if package maintainers had a channel to
    communicate with their user base. Back when I was at Yahoo, our proprietary
    package tool kept track of all installs of packages and stored the
    information in a centralized database. As a result, a package maintainer
    could see how many people had installed each version of their package and
    could send emails to folks who had installed a particular version or folks
    who had installed any version. A lot of folks used this to warn user bases
    about security issues, bugs, deprecations, etc. and to encourage folks to
    upgrade to newer versions and monitor the progress of such efforts.

    Links to e.g. cvedetails, lists, and RSS feeds would be super helpful.

    Links to e.g. IRC, Slack, Gitter would be super helpful.

    Where Links == {edges, predicates, new metadata properties}

    Links to downstream packages (and their RSS feeds) would also be helpful.

    • Debian has RDF (and also more structured link types that would be useful
      for project metadata)

      What URI should pypi:readme or warehouse:readme expand to?

      @prefix pypi: <https://pypi.python.org/pypi/> ;
      @prefix warehouse: <https://warehouse.python.org/project/> ;
      @prefix github: <https://github.com/> ;
      * pypi:json["info"]["name"]                ( + ".json" )
      * warehouse:json["info"]["name"]
      * github:json["info"]["name"]
      
      @prefix doap: <http://usefulinc.com/ns/doap#> ;
      * http://lov.okfn.org/dataset/lov/vocabs/doap
      
      @prefix schema: <http://schema.org/> ;
      
    • schema:SoftwareApplication -> https://schema.org/SoftwareApplication

    • schema:Code -> https://schema.org/Code

    • schema:Project -> TODO (new framework for extension vocabularies)

      Should/could there be a pypa: namespace?

      @prefix pypa: <https://pypa.github.io/ns/pypa/#> ;
      

    working thoughts:

    am working on adding Schema.org RDFa metdata to project detail pages
    for the next-gen PyPi (http://warehouse.python.org) [1]

    There are structured fields for Python Packaging metadata [2][3] and
    there are tables in warehouse [4].

    Challenges:
    (a) Mapping Author / Maintainer to s:
    (author/creator, editor, contributor, accountablePerson)
    (publisher, sourceOrganization may note be feasible)

    (b) Picking a canonical [URI] for a Package:

       warehouse.python.org/project/<name>
       warehouse.python.org/project/<version>
       pypi.python.org/pypi/<name>
       pypi.python.org/pypi/<name>/<version>
       release.homepage
       release.project_url
    

    (c) Expressing softwareVersion[s]

    • How to express the project <--- release[.version] relation? [5]

    (d) documentationUrl, bugtrackerUrl, [repositoryUrl]

    • I think it could be helpful to amend SoftwareApplication with
      these properties.
    • There is not yet an analogue of repositoryUrl in Python
      Packaging Metadata.
      • DOAP Project and Versions [6]
        • DOAP schema includes typed *Repository properties [5]
          • {CVS, SVN, Darcs, Bk, Git, Hg; BazaarBranch }

    (e) Should there be a SoftwareRelease?

  • mozillascience/code-research-object#15 "(JSON-LD) Metadata for software discovery"

[EDIT] ~fulltext cc here, emphasis added, markdown
[EDIT] warehouse pkg detail template is now at https://github.com/pypa/warehouse/blob/master/warehouse/templates/packaging/detail.html

  • I think some level of schema.org interoperability should be a goal:

  • Also of interest: "[Distutils] pip/warehouse feature idea: "help needed"
    https://mail.python.org/pipermail/distutils-sig/2015-April/026108.html

    On Sat, Apr 11, 2015 at 1:14 PM, Wes Turner <wes.turner at gmail.com> wrote:

    On Sat, Apr 11, 2015 at 12:29 PM, Marc Abramowitz
    wrote:

    Interesting. One of the things that would help with getting people to
    help and is in the PEPs but last I checked wasn't yet implemented is the
    metadata that allows putting in all kinds of URLs and the ones I'm
    primarily thinking of here are the source code repository URL and the issue
    tracker URL.

    http://legacy.python.org/dev/peps/pep-0459/:

    [...]

    A JSON-LD context would be outstanding.

    I personally sigh when I see a PyPI page that lists its URL as said PyPI
    page as this seems redundant and not useful and I'd rather see a GitHub or
    Bitbucket URL (or maybe a foo-project.org or readthedocs URL, but I the
    repo URL is usually what I'm most interested in).

    If we had the metadata with all the different kinds of URLs and the tools
    to show it and search it, then it would be clearer what to put where and
    would make it easier for consumers to find what they're looking for.

    Another thought I had while reading your email was the OpenHatch project
    and if there could be some tie-in with that.

    It also would be interesting if package maintainers had a channel to
    communicate with their user base. Back when I was at Yahoo, our proprietary
    package tool kept track of all installs of packages and stored the
    information in a centralized database. As a result, a package maintainer
    could see how many people had installed each version of their package and
    could send emails to folks who had installed a particular version or folks
    who had installed any version. A lot of folks used this to warn user bases
    about security issues, bugs, deprecations, etc. and to encourage folks to
    upgrade to newer versions and monitor the progress of such efforts.

    Links to e.g. cvedetails, lists, and RSS feeds would be super helpful.

    Links to e.g. IRC, Slack, Gitter would be super helpful.

    Where Links == {edges, predicates, new metadata properties}

    Links to downstream packages (and their RSS feeds) would also be helpful.

    • Debian has RDF (and also more structured link types that would be useful
      for project metadata)

      What URI should pypi:readme or warehouse:readme expand to?

      @prefix pypi: <https://pypi.python.org/pypi/> ;
      @prefix warehouse: <https://warehouse.python.org/project/> ;
      @prefix github: <https://github.com/> ;
      * pypi:json["info"]["name"]                ( + ".json" )
      * warehouse:json["info"]["name"]
      * github:json["info"]["name"]
      
      @prefix doap: <http://usefulinc.com/ns/doap#> ;
      * http://lov.okfn.org/dataset/lov/vocabs/doap
      
      @prefix schema: <http://schema.org/> ;
      
    • schema:SoftwareApplication -> https://schema.org/SoftwareApplication

    • schema:Code -> https://schema.org/Code

    • schema:Project -> TODO (new framework for extension vocabularies)

      Should/could there be a pypa: namespace?

      @prefix pypa: <https://pypa.github.io/ns/pypa/#> ;
      

    working thoughts:

    am working on adding Schema.org RDFa metdata to project detail pages
    for the next-gen PyPi (http://warehouse.python.org) [1]

    There are structured fields for Python Packaging metadata [2][3] and
    there are tables in warehouse [4].

    Challenges:
    (a) Mapping Author / Maintainer to s:
    (author/creator, editor, contributor, accountablePerson)
    (publisher, sourceOrganization may note be feasible)

    (b) Picking a canonical [URI] for a Package:

       warehouse.python.org/project/<name>
       warehouse.python.org/project/<version>
       pypi.python.org/pypi/<name>
       pypi.python.org/pypi/<name>/<version>
       release.homepage
       release.project_url
    

    (c) Expressing softwareVersion[s]

    • How to express the project <--- release[.version] relation? [5]

    (d) documentationUrl, bugtrackerUrl, [repositoryUrl]

    • I think it could be helpful to amend SoftwareApplication with
      these properties.
    • There is not yet an analogue of repositoryUrl in Python
      Packaging Metadata.
      • DOAP Project and Versions [6]
        • DOAP schema includes typed *Repository properties [5]
          • {CVS, SVN, Darcs, Bk, Git, Hg; BazaarBranch }

    (e) Should there be a SoftwareRelease?

  • mozillascience/code-research-object#15 "(JSON-LD) Metadata for software discovery"

[EDIT] ~fulltext cc here, emphasis added, markdown
[EDIT] warehouse pkg detail template is now at https://github.com/pypa/warehouse/blob/master/warehouse/templates/packaging/detail.html

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Sep 11, 2015

Also of potential interest would be linking this in to the ISO/IEC Software Identification effort: http://tagvault.org/about/

Do they have URNs that could be the object of a (pypi:projectname, ex:, urn:x-tagvault:xyz) triple?

Here is the XSD schema for "[ISO/IEC 19770-2:2009 Software Identification Tag Standard]" from http://tagvault.org/standards/swid_tagstandard/:

AFAIU, there is not yet support for ISO/IEC 19770-2:2009 "Software Identification (SWID) Tag Standard" tags in schema.org (e.g. schema.org/SoftwareApplication).

  • A. add a schema.org/swid property to schema.org/SoftwareApplication
  • B. create an extension vocabulary (in RDFa), generate the TTL and JSON-LD context, and host those:
    • https://schema.org/MedicalCode shows how to create a flexible reified edge (that can be used with many codingSystem s).
    • This is/could/should/would then be defined in a JSON-LD context, and referenced as properties ("predicates") from package metadata RDF (as represented as JSON-LD)

Also of potential interest would be linking this in to the ISO/IEC Software Identification effort: http://tagvault.org/about/

Do they have URNs that could be the object of a (pypi:projectname, ex:, urn:x-tagvault:xyz) triple?

Here is the XSD schema for "[ISO/IEC 19770-2:2009 Software Identification Tag Standard]" from http://tagvault.org/standards/swid_tagstandard/:

AFAIU, there is not yet support for ISO/IEC 19770-2:2009 "Software Identification (SWID) Tag Standard" tags in schema.org (e.g. schema.org/SoftwareApplication).

  • A. add a schema.org/swid property to schema.org/SoftwareApplication
  • B. create an extension vocabulary (in RDFa), generate the TTL and JSON-LD context, and host those:
    • https://schema.org/MedicalCode shows how to create a flexible reified edge (that can be used with many codingSystem s).
    • This is/could/should/would then be defined in a JSON-LD context, and referenced as properties ("predicates") from package metadata RDF (as represented as JSON-LD)
@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Sep 11, 2015

  • B. create an extension vocabulary (in RDFa), generate the TTL and JSON-LD context, and host those:

Possible prefix URIs (these don't have to resolve as deferencable URLs (they are URIs)):, but it's helpful if there is an HTML(+RDFa) representation there, for reference, which links to the source vocabs)

Docs on creating schema.org extension vocabulary for [Python] packages:

[EDIT] Links
[EDIT] schema.org 2.1 -> 2.2 links

  • B. create an extension vocabulary (in RDFa), generate the TTL and JSON-LD context, and host those:

Possible prefix URIs (these don't have to resolve as deferencable URLs (they are URIs)):, but it's helpful if there is an HTML(+RDFa) representation there, for reference, which links to the source vocabs)

Docs on creating schema.org extension vocabulary for [Python] packages:

[EDIT] Links
[EDIT] schema.org 2.1 -> 2.2 links

@qwcode qwcode added the PEP426 label Sep 13, 2015

@qwcode qwcode added the Metadata label Nov 3, 2015

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Nov 10, 2015

#PEP426JSONLD

#PEP426JSONLD

@sigmavirus24

This comment has been minimized.

Show comment
Hide comment
@sigmavirus24

sigmavirus24 Nov 10, 2015

Member

@westurner #WhatDoHashTagsMean?

Member

sigmavirus24 commented Nov 10, 2015

@westurner #WhatDoHashTagsMean?

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Nov 23, 2015

Should there be / would it be useful to have:

[
{'distro':'...'},
{'distro': 'Ubuntu',
 'pkgname': 'python-pip',
 'url': 'http://packages.ubuntu.com/trusty/python-pip',
 # ... may also be present in e.g. downstream DOAP RDF records
 'maintainers': [{
   'name': 'Ubuntu MOTU Developers',
   'url': 'http://lists.ubuntu.com/archives/ubuntu-motu/',
   'emailAddress': 'ubuntu-motu@lists.ubuntu.com',
  }]
},]

Should there be / would it be useful to have:

[
{'distro':'...'},
{'distro': 'Ubuntu',
 'pkgname': 'python-pip',
 'url': 'http://packages.ubuntu.com/trusty/python-pip',
 # ... may also be present in e.g. downstream DOAP RDF records
 'maintainers': [{
   'name': 'Ubuntu MOTU Developers',
   'url': 'http://lists.ubuntu.com/archives/ubuntu-motu/',
   'emailAddress': 'ubuntu-motu@lists.ubuntu.com',
  }]
},]
@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Nov 23, 2015

... So, in Linked Data terminology, the package URN URI (urn:x-pythonpkg:pip) is resolved to a dereferencable URL at install time, given the distutils/setuptools/pip (~index_servers and find-links * configuration)

... So, in Linked Data terminology, the package URN URI (urn:x-pythonpkg:pip) is resolved to a dereferencable URL at install time, given the distutils/setuptools/pip (~index_servers and find-links * configuration)

@ncoghlan

This comment has been minimized.

Show comment
Hide comment
@ncoghlan

ncoghlan Nov 24, 2015

Member

For the distro metadata question, that's the main reason the draft metadata 2.0 proposal moves project details out to a metadata extension: https://www.python.org/dev/peps/pep-0459/#the-python-project-extension

Having the project metadata in an extension means it is then trivial to re-use the same format for redistributor metadata: https://www.python.org/dev/peps/pep-0459/#the-python-integrator-extension

For the pkgname to URI question: what practical problem will that solve for Python developers? What will they be able to do if metadata 2.0 defines that mapping that they won't be able to do if we don't define it?

Member

ncoghlan commented Nov 24, 2015

For the distro metadata question, that's the main reason the draft metadata 2.0 proposal moves project details out to a metadata extension: https://www.python.org/dev/peps/pep-0459/#the-python-project-extension

Having the project metadata in an extension means it is then trivial to re-use the same format for redistributor metadata: https://www.python.org/dev/peps/pep-0459/#the-python-integrator-extension

For the pkgname to URI question: what practical problem will that solve for Python developers? What will they be able to do if metadata 2.0 defines that mapping that they won't be able to do if we don't define it?

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Nov 24, 2015

For the distro metadata question, that's the main reason the draft metadata 2.0 proposal moves project container details out to a metadata extension: https://www.python.org/dev/peps/pep-0459/#the-python-project-extension

Got it, thanks hadn't been aware of this draft spec.

Having the project metadata in an extension means it is then trivial to re-use the same format for redistributor metadata: https://www.python.org/dev/peps/pep-0459/#the-python-integrator-extension

For the pkgname to URI question: what practical problem will that solve for Python developers? What will they be able to do if metadata 2.0 defines that mapping that they won't be able to do if we don't define it?

Linked Data names things with namespaced URIs for many of the same reasons that Python uses namespaces.

  • Build a graph of package metadata that more completely describes the actual installation / build requirements for a given package
  • JOIN with other sources of metadata using a canonical [URI] key
  • A procedure for resolving / expanding (with context) that all of the package specifiers in following pip requirements file describe the same package resource (given the state of index_servers, pip configuration, PyPI):
pip
pip==7.1.2
https://pypi.python.org/packages/source/p/pip/pip-7.1.2.tar.gz#md5=3823d2343d9f3aaab21cf9c917710196
https://pypi.python.org/packages/py2.py3/p/pip/pip-7.1.2-py2.py3-none-any.whl#md5=5ff9fec0be479e4e36df467556deed4d


-e git+https://github.com/pypa/pip#egg=pip
-e git+ssh://git@github.com/pypa/pip#egg=pip
-e git+ssh://git@github.com/pypa/pip@7.1.2#egg=pip

Practical utility of this:

  • Do I already have the metadata for this package?
  • Do I already have the metadata for this [installed] package in my journaled, append-only, JSON-LD log of (system/VIRTUAL_ENV) pip operations?

For the distro metadata question, that's the main reason the draft metadata 2.0 proposal moves project container details out to a metadata extension: https://www.python.org/dev/peps/pep-0459/#the-python-project-extension

Got it, thanks hadn't been aware of this draft spec.

Having the project metadata in an extension means it is then trivial to re-use the same format for redistributor metadata: https://www.python.org/dev/peps/pep-0459/#the-python-integrator-extension

For the pkgname to URI question: what practical problem will that solve for Python developers? What will they be able to do if metadata 2.0 defines that mapping that they won't be able to do if we don't define it?

Linked Data names things with namespaced URIs for many of the same reasons that Python uses namespaces.

  • Build a graph of package metadata that more completely describes the actual installation / build requirements for a given package
  • JOIN with other sources of metadata using a canonical [URI] key
  • A procedure for resolving / expanding (with context) that all of the package specifiers in following pip requirements file describe the same package resource (given the state of index_servers, pip configuration, PyPI):
pip
pip==7.1.2
https://pypi.python.org/packages/source/p/pip/pip-7.1.2.tar.gz#md5=3823d2343d9f3aaab21cf9c917710196
https://pypi.python.org/packages/py2.py3/p/pip/pip-7.1.2-py2.py3-none-any.whl#md5=5ff9fec0be479e4e36df467556deed4d


-e git+https://github.com/pypa/pip#egg=pip
-e git+ssh://git@github.com/pypa/pip#egg=pip
-e git+ssh://git@github.com/pypa/pip@7.1.2#egg=pip

Practical utility of this:

  • Do I already have the metadata for this package?
  • Do I already have the metadata for this [installed] package in my journaled, append-only, JSON-LD log of (system/VIRTUAL_ENV) pip operations?
@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Nov 24, 2015

If, in the future, I want to store checksums for each and every file in a package (so that they can be later reviewed), what do I key that auxiliary document to? Should I be able to just ingest 1+ JSON-LD documents into an [in-memory, ..., RDF] graph datastore?

This is a graph of packages which happened to have fit a given set of constraints on a given date and time, with a given index_servers, pip configuration... At present, pip.log and pip freeze are not sufficient to recreate / reproduce / CRC a given environment.

What I would like is:

  • (pkgname, version, install_date, installed_from_URI, installed_for=[])
  • (pkgname, version, filename, file checksum)

IIUC, currently, the suggested solution is "just rebuild [in a venv [in a Docker container named 'distro']] and re-run the comprehensive test suite".

If, in the future, I want to store checksums for each and every file in a package (so that they can be later reviewed), what do I key that auxiliary document to? Should I be able to just ingest 1+ JSON-LD documents into an [in-memory, ..., RDF] graph datastore?

This is a graph of packages which happened to have fit a given set of constraints on a given date and time, with a given index_servers, pip configuration... At present, pip.log and pip freeze are not sufficient to recreate / reproduce / CRC a given environment.

What I would like is:

  • (pkgname, version, install_date, installed_from_URI, installed_for=[])
  • (pkgname, version, filename, file checksum)

IIUC, currently, the suggested solution is "just rebuild [in a venv [in a Docker container named 'distro']] and re-run the comprehensive test suite".

@ncoghlan

This comment has been minimized.

Show comment
Hide comment
@ncoghlan

ncoghlan Nov 24, 2015

Member

The currently suggested solution for cryptographic assurance of repeated installations is to use peep to capture the hash of the Python components in the requirements.txt file: https://pypi.python.org/pypi/peep

If you want full traceability, then Nix is a better fit than any other current packaging system: http://nixos.org/nix/about.html

Offering these kinds of capabilities by default isn't a current design goal for the upstream Python ecosystem, since they can already be added by the folks that need them, and providing them by default doesn't help lower barriers to entry for new users.

Member

ncoghlan commented Nov 24, 2015

The currently suggested solution for cryptographic assurance of repeated installations is to use peep to capture the hash of the Python components in the requirements.txt file: https://pypi.python.org/pypi/peep

If you want full traceability, then Nix is a better fit than any other current packaging system: http://nixos.org/nix/about.html

Offering these kinds of capabilities by default isn't a current design goal for the upstream Python ecosystem, since they can already be added by the folks that need them, and providing them by default doesn't help lower barriers to entry for new users.

@dstufft

This comment has been minimized.

Show comment
Hide comment
@dstufft

dstufft Nov 24, 2015

Member

FWIW pip 8.0 will include peep’s functionality built into pip (though it is opt in by adding hashes to your requirements file).

Member

dstufft commented Nov 24, 2015

FWIW pip 8.0 will include peep’s functionality built into pip (though it is opt in by adding hashes to your requirements file).

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Dec 1, 2015

FWIW pip 8.0 will include peep’s functionality built into pip (though it is opt in by adding hashes to your requirements file).

Is/should this also be defined in "PEP 0508 -- Dependency specification for Python Software Packages" https://www.python.org/dev/peps/pep-0508/ ? ... 👍

FWIW pip 8.0 will include peep’s functionality built into pip (though it is opt in by adding hashes to your requirements file).

Is/should this also be defined in "PEP 0508 -- Dependency specification for Python Software Packages" https://www.python.org/dev/peps/pep-0508/ ? ... 👍

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Dec 1, 2015

Do I already have the metadata for this [installed] package in my journaled, append-only, JSON-LD log of (system/VIRTUAL_ENV) pip operations?

{ 
"@graph": {
    "actions": [
     {"@type": "InstallAction",
      "command": "pip install -U pip",
      "description": "log message",
      "packages": [
           {"name": "pip", "version": "7.1.2",  "versionwas": "7.1.0",
        "versionspec_constraint": ">=7.0.0",
        # ... pypi/pip/json metadata ... 
        }
      ]}
    ]}
}

Then indexing on actions[*]["packages"][*][("name", "version" [, PEP0508]] would get the current snapshot off the top of the journaled history of the env (according to [pip, ])

Do I already have the metadata for this [installed] package in my journaled, append-only, JSON-LD log of (system/VIRTUAL_ENV) pip operations?

{ 
"@graph": {
    "actions": [
     {"@type": "InstallAction",
      "command": "pip install -U pip",
      "description": "log message",
      "packages": [
           {"name": "pip", "version": "7.1.2",  "versionwas": "7.1.0",
        "versionspec_constraint": ">=7.0.0",
        # ... pypi/pip/json metadata ... 
        }
      ]}
    ]}
}

Then indexing on actions[*]["packages"][*][("name", "version" [, PEP0508]] would get the current snapshot off the top of the journaled history of the env (according to [pip, ])

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Dec 2, 2015

A JSON-LD journal of package Actions [and inlined-metadata.json] would be an improvement over (PEP376 .dist-info directories) and (pip-log.txt, pip.log) because:

  • It would then be possible to differentiate between pip environment changes and system package environment changes (as compared with the outputs from pip freeze or pip-ls )
    • Each VIRTUAL_ENV would then have something like a pip-log.jsonld JSONLD w/ an inlined @context

https://github.com/pypa/interoperability-peps/blob/master/pep-0376-installation-db.rst
https://www.python.org/dev/peps/pep-0376/

pip log

A JSON-LD journal of package Actions [and inlined-metadata.json] would be an improvement over (PEP376 .dist-info directories) and (pip-log.txt, pip.log) because:

  • It would then be possible to differentiate between pip environment changes and system package environment changes (as compared with the outputs from pip freeze or pip-ls )
    • Each VIRTUAL_ENV would then have something like a pip-log.jsonld JSONLD w/ an inlined @context

https://github.com/pypa/interoperability-peps/blob/master/pep-0376-installation-db.rst
https://www.python.org/dev/peps/pep-0376/

pip log

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Jul 17, 2016

A JSONLD context for the current JSON would need an "index map" to skip over the version keys;

but in JSONLD 2.0, we would need the ability to not skip but apply the key to each nested record.

... https://github.com/json-ld/tests

A JSONLD context for the current JSON would need an "index map" to skip over the version keys;

but in JSONLD 2.0, we would need the ability to not skip but apply the key to each nested record.

... https://github.com/json-ld/tests

@westurner

This comment has been minimized.

Show comment
Hide comment
@westurner

westurner Jul 17, 2016

This discusdion indicates that there may be need to add reified edges for packages which, according to maintainers and/or index maintainers, supersede existing packages (e.g. PIL -> pillow)

This discusdion indicates that there may be need to add reified edges for packages which, according to maintainers and/or index maintainers, supersede existing packages (e.g. PIL -> pillow)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment