Add draft conventions documentation for AboutCode Data (i.e. ABCD) #2

pombredanne · 2017-02-27T22:58:39Z

Signed-off-by: Philippe Ombredanne pombredanne@nexb.com

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

pombredanne · 2017-02-27T22:59:59Z

@tdruez @jdaguil @chinyeungli @mjherzog @DennisClark @MaJuRG @johnmhoran @sschuberth @JonoYang @pkunz @nspsjsu and others: This is a draft and feedback is welcomed

sschuberth · 2017-02-28T09:07:07Z

aboutcode-data/README.rst

+technology-specific.
+
+Recently there have been efforts to collect and expose more data such
+as:


Maybe add https://dependencyci.com/ to the list?

sschuberth · 2017-02-28T09:09:12Z

aboutcode-data/README.rst

+dependencies. It only provides information (metadata) about the code.
+
+The vision for the ABC Data structure is to provide a common way to
+exchange data about code between all nexB  tools, such that these tools


Nit: Superfluous space in "nexB tools".

sschuberth · 2017-02-28T09:11:18Z

aboutcode-data/README.rst

+   underscores. Names cannot start with a number. Names cannot contain
+   spaces nor other punctuation, not even a dot or period.
+
+-  Names are NOT case sensitive:  upper or lowercase does not matter and


Nit: Superfluous space in ": upper".

sschuberth · 2017-02-28T09:24:29Z

I haven't read it fully yet, but looks good so far 👍

I'd be eager to implement some sort of "meta-dependency-manager" that is able to query various native dependency managers like Maven, npm / yarn, pip etc. to output the dependency tree in a common format, namely, ABCD, instead of each using their own data format.

This intermediate representation of the dependency tree could the be passed to another tool (a new one, or an extended version of extractcode?) to download and extract the source code of the dependencies, which then finally get scanned by scancode (or any other scanner) to get the full proctive of that really goes into a specific piece of software.

pombredanne · 2017-02-28T10:07:49Z

@sschuberth you wrote:

I'd be eager to implement some sort of "meta-dependency-manager" that is able to query various native dependency managers like Maven, npm / yarn, pip etc. to output the dependency tree in a common format, namely, ABCD, instead of each using their own data format.

that would be awesomely great! and I have thought about it too. Eventually the difficulty is that each package manager may implement some arcane and specific way to resolve a version of a package .... So the only correct way to do this is to rely on the dependency resolution algorithms of each package manager directly, meaning eventually running code.

On the ABCD side, the thing is to capture:

the tree of potential deps with the constraints structured in a normalized fashion
the tree of resolved deps as computed by the package manager also structured in a normalized fashion (think of something like a common denominator lockfile like a Gemfile.lock and similar)

On the tooling side, the things needed would be a way to either compute or run package managers to get resolved deps, eventually using a container that would have all the required package manager installed. Possibly adding/creating some package manager plugins (e.g Maven or similar) to help were needed. Or have parser (like in packagedcode) for things that are already resolved (e.g lockfiles).
Note that some tools like LicenseFinder, libraries.io and versioneye implement some of this and could may be readily reused on integrated.

And re:

This intermediate representation of the dependency tree could the be passed to another tool (a new one, or an extended version of extractcode?) to download and extract the source code of the dependencies, which then finally get scanned by scancode (or any other scanner) to get the full proctive of that really goes into a specific piece of software.

I would see two things there:

something I could call fetchcode that knows how to handle fetching packages from a package repo given some essential metadata (e.g. in most cases a type, name, and resolved version and/or VCS pointer). I already have a bunch of code that I should push soon and deals with some of it.
and then inject that in a scan pipeline. On the extractcode side, with Transparent extraction of archives scancode-toolkit#14 this could become a non issue. Note that we have been accepted as a mentoring org for the GSOC 2017 and there are some students showing interest to implement the transparent archive extraction in ScanCode.

pombredanne · 2017-02-28T10:14:29Z

Actually let's start a repo and project for this: https://github.com/nexB/dependentcode :)

@sschuberth

Thank you to @sschuberth for the review Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

sschuberth · 2017-02-28T10:24:58Z

So the only correct way to do this is to rely on the dependency resolution algorithms of each package manager directly, meaning eventually running code.

Correct, and that matches what we already discussed by mail once. I'd just clarify that "running code" could mean "running the native dependency manager executable" (and capturing / parsing its output).

the tree of potential deps with the constraints

If by "constraints" you mean e.g. Maven scope, or Gradle configurations, I completely agree.

For our use case, we'd also need to capture the download / clone URL for the dependency's source code. In case of an SCM, maybe we could re-use Maven's SCM URL format even for non-Maven project, simply to standardize on something.

Actually let's start a repo and project for this: https://github.com/nexB/dependentcode :)

Hold yer horses! ;-) As you know, we already have this implements internally for several major dependency managers, and I'm looking into going over the code to make it publicly available.

pombredanne · 2017-02-28T11:11:26Z

@sschuberth

maybe we could re-use Maven's SCM URL format even for non-Maven project, simply to standardize on something.

actually this part has been spec'ed by yours truly in SPDX and is loosely modelled after pip rather than Maven in SPDX's 3.7 Package Download Location

Hold yer horses

😆 I need some place to spec/design this in anycase... so there is no conflict.

pombredanne · 2017-02-28T11:13:13Z

FWIW, the advantage of using a pip-like URI/URL scheme is that it comes with a readily available and well tested implementation used for million downloads every day ;)

sschuberth · 2017-02-28T11:20:19Z

Well, the latter basically is also the case for Maven SCM (and the accompanying plugin), but I have no strong opinion about pip vs mvn as long as we try to avoid reinventing the wheel.

pombredanne · 2017-02-28T11:21:37Z

as long as we try to avoid reinventing the wheel.

Agreed. Eventually converting from one scheme to the other should be easy-peasy and supported here.

sschuberth · 2017-03-01T11:06:19Z

Sorry for becoming a bit off topic, but speaking of the pip SCM URL scheme, would you happen to know how to get form a package name and version, like for example Jinja2 2.9.5 to the SCM URL to get the source code? I could not find any meta-data on the pypi page pointing to https://github.com/pallets/jinja.git. So, where is pip using the SCM URL scheme you've mentioned?

pombredanne · 2017-03-01T11:49:12Z

how to get form a package name and version, like for example Jinja2 2.9.5 to the SCM URL to get the source code

You cannot get this consistently unfortunately: it may be in homepage url field or elsewhere.

The way it is used in pip is when you call this:
pip install -e git+https://github.com/pallets/jinja.git@09f8b2b2d1cfca7e5b231cf3773bef2a952b6312#egg=jinja or pip install -e git+https://github.com/pallets/jinja.git@2.9.5#egg=jinja2 where the @ part would be any "commitish"

It could also be in a requirements file.

The part we care for in SPDX and here would be: git+https://github.com/pallets/jinja.git@09f8b2b2d1cfca7e5b231cf3773bef2a952b6312

sschuberth · 2017-03-01T11:52:55Z

You cannot get this consistently unfortunately: it may be in homepage url field or elsewhere.

Ugh. Or nowhere, it seems. At the example of Jinja2 I was only able to the to the source code location by manually navigating to the homepage, and form there clicking on the GitHub banner.

If you ever want to work on PEP stuff again, try establishing a meta-data field for the SCM URL ;-)

pombredanne · 2017-03-01T11:56:04Z

yep... the only packages manifests OTH that are clear there wrt to always having a VCS location are Go deps/ven/weorkspace (many variants) and packagist for PHP where the fetching is primarily or only done from VCS directly... :|

sschuberth · 2017-03-01T11:59:30Z

Also NPM almost always fetches directly from SCM / GitHub, and Maven POMs also often contain the scm tag. So it's not that bad.

pombredanne · 2017-03-01T12:10:01Z

Also NPM almost always fetches directly from SCM / GitHub

This has not been my experience with npm fetches, but the VCS info is often readily there.

And Maven POMs contain the scm tag, but only sometimes.

pombredanne · 2017-03-07T11:14:47Z

Merging this. The conversation can continue in tickets or elsewhere.Thank for the feedback.

Temp

* Add PEP 517/518 pyproject.toml file * Add setuptools_scm to handle versioning * Add setup.py content to setup.cfg * Update setup.py to act as a shim (so pip install -e works) Addresses: #2 Signed-off-by: Steven Esser <sesser@nexb.com>

Add draft conventions documentation for AboutCode Data (i.e. ABCD)

53ffddc

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

sschuberth reviewed Feb 28, 2017

View reviewed changes

Fix typos

a84e207

Thank you to @sschuberth for the review Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

pombredanne merged commit 9dac746 into master Mar 7, 2017

pombredanne deleted the aboutcode-data branch March 7, 2017 11:14

pombredanne mentioned this pull request Jun 24, 2017

Instructions for Identifying Dependencies Unclear aboutcode-org/scancode-toolkit#631

Open

DennisClark pushed a commit that referenced this pull request Jun 11, 2019

Merge pull request #2 from TG1999/temp

fc3a62b

Temp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add draft conventions documentation for AboutCode Data (i.e. ABCD) #2

Add draft conventions documentation for AboutCode Data (i.e. ABCD) #2

pombredanne commented Feb 27, 2017

pombredanne commented Feb 27, 2017

sschuberth Feb 28, 2017

pombredanne Feb 28, 2017

sschuberth Feb 28, 2017

pombredanne Feb 28, 2017

sschuberth Feb 28, 2017

pombredanne Feb 28, 2017

sschuberth commented Feb 28, 2017

pombredanne commented Feb 28, 2017

pombredanne commented Feb 28, 2017

sschuberth commented Feb 28, 2017

pombredanne commented Feb 28, 2017 •

edited

Loading

pombredanne commented Feb 28, 2017

sschuberth commented Feb 28, 2017

pombredanne commented Feb 28, 2017

sschuberth commented Mar 1, 2017

pombredanne commented Mar 1, 2017 •

edited

Loading

sschuberth commented Mar 1, 2017

pombredanne commented Mar 1, 2017

sschuberth commented Mar 1, 2017

pombredanne commented Mar 1, 2017

pombredanne commented Mar 7, 2017

Add draft conventions documentation for AboutCode Data (i.e. ABCD) #2

Add draft conventions documentation for AboutCode Data (i.e. ABCD) #2

Conversation

pombredanne commented Feb 27, 2017

pombredanne commented Feb 27, 2017

sschuberth Feb 28, 2017

Choose a reason for hiding this comment

pombredanne Feb 28, 2017

Choose a reason for hiding this comment

sschuberth Feb 28, 2017

Choose a reason for hiding this comment

pombredanne Feb 28, 2017

Choose a reason for hiding this comment

sschuberth Feb 28, 2017

Choose a reason for hiding this comment

pombredanne Feb 28, 2017

Choose a reason for hiding this comment

sschuberth commented Feb 28, 2017

pombredanne commented Feb 28, 2017

pombredanne commented Feb 28, 2017

sschuberth commented Feb 28, 2017

pombredanne commented Feb 28, 2017 • edited Loading

pombredanne commented Feb 28, 2017

sschuberth commented Feb 28, 2017

pombredanne commented Feb 28, 2017

sschuberth commented Mar 1, 2017

pombredanne commented Mar 1, 2017 • edited Loading

sschuberth commented Mar 1, 2017

pombredanne commented Mar 1, 2017

sschuberth commented Mar 1, 2017

pombredanne commented Mar 1, 2017

pombredanne commented Mar 7, 2017

pombredanne commented Feb 28, 2017 •

edited

Loading

pombredanne commented Mar 1, 2017 •

edited

Loading