Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add draft conventions documentation for AboutCode Data (i.e. ABCD) #2

Merged
merged 2 commits into from
Mar 7, 2017

Conversation

pombredanne
Copy link
Member

Signed-off-by: Philippe Ombredanne pombredanne@nexb.com

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne
Copy link
Member Author

@tdruez @jdaguil @chinyeungli @mjherzog @DennisClark @MaJuRG @johnmhoran @sschuberth @JonoYang @pkunz @nspsjsu and others: This is a draft and feedback is welcomed

technology-specific.

Recently there have been efforts to collect and expose more data such
as:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add https://dependencyci.com/ to the list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

dependencies. It only provides information (metadata) about the code.

The vision for the ABC Data structure is to provide a common way to
exchange data about code between all nexB  tools, such that these tools

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Superfluous space in "nexB tools".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

underscores. Names cannot start with a number. Names cannot contain
spaces nor other punctuation, not even a dot or period.

- Names are NOT case sensitive:  upper or lowercase does not matter and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Superfluous space in ": upper".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@sschuberth
Copy link

I haven't read it fully yet, but looks good so far 👍

I'd be eager to implement some sort of "meta-dependency-manager" that is able to query various native dependency managers like Maven, npm / yarn, pip etc. to output the dependency tree in a common format, namely, ABCD, instead of each using their own data format.

This intermediate representation of the dependency tree could the be passed to another tool (a new one, or an extended version of extractcode?) to download and extract the source code of the dependencies, which then finally get scanned by scancode (or any other scanner) to get the full proctive of that really goes into a specific piece of software.

@pombredanne
Copy link
Member Author

@sschuberth you wrote:

I'd be eager to implement some sort of "meta-dependency-manager" that is able to query various native dependency managers like Maven, npm / yarn, pip etc. to output the dependency tree in a common format, namely, ABCD, instead of each using their own data format.

that would be awesomely great! and I have thought about it too. Eventually the difficulty is that each package manager may implement some arcane and specific way to resolve a version of a package .... So the only correct way to do this is to rely on the dependency resolution algorithms of each package manager directly, meaning eventually running code.

On the ABCD side, the thing is to capture:

  1. the tree of potential deps with the constraints structured in a normalized fashion
  2. the tree of resolved deps as computed by the package manager also structured in a normalized fashion (think of something like a common denominator lockfile like a Gemfile.lock and similar)

On the tooling side, the things needed would be a way to either compute or run package managers to get resolved deps, eventually using a container that would have all the required package manager installed. Possibly adding/creating some package manager plugins (e.g Maven or similar) to help were needed. Or have parser (like in packagedcode) for things that are already resolved (e.g lockfiles).
Note that some tools like LicenseFinder, libraries.io and versioneye implement some of this and could may be readily reused on integrated.

And re:

This intermediate representation of the dependency tree could the be passed to another tool (a new one, or an extended version of extractcode?) to download and extract the source code of the dependencies, which then finally get scanned by scancode (or any other scanner) to get the full proctive of that really goes into a specific piece of software.

I would see two things there:

  1. something I could call fetchcode that knows how to handle fetching packages from a package repo given some essential metadata (e.g. in most cases a type, name, and resolved version and/or VCS pointer). I already have a bunch of code that I should push soon and deals with some of it.
  2. and then inject that in a scan pipeline. On the extractcode side, with Transparent extraction of archives scancode-toolkit#14 this could become a non issue. Note that we have been accepted as a mentoring org for the GSOC 2017 and there are some students showing interest to implement the transparent archive extraction in ScanCode.

@pombredanne
Copy link
Member Author

Actually let's start a repo and project for this: https://github.com/nexB/dependentcode :)

Thank you to @sschuberth for the review

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@sschuberth
Copy link

So the only correct way to do this is to rely on the dependency resolution algorithms of each package manager directly, meaning eventually running code.

Correct, and that matches what we already discussed by mail once. I'd just clarify that "running code" could mean "running the native dependency manager executable" (and capturing / parsing its output).

the tree of potential deps with the constraints

If by "constraints" you mean e.g. Maven scope, or Gradle configurations, I completely agree.

For our use case, we'd also need to capture the download / clone URL for the dependency's source code. In case of an SCM, maybe we could re-use Maven's SCM URL format even for non-Maven project, simply to standardize on something.

Actually let's start a repo and project for this: https://github.com/nexB/dependentcode :)

Hold yer horses! ;-) As you know, we already have this implements internally for several major dependency managers, and I'm looking into going over the code to make it publicly available.

@pombredanne
Copy link
Member Author

pombredanne commented Feb 28, 2017

@sschuberth

maybe we could re-use Maven's SCM URL format even for non-Maven project, simply to standardize on something.

actually this part has been spec'ed by yours truly in SPDX and is loosely modelled after pip rather than Maven in SPDX's 3.7 Package Download Location

Hold yer horses

😆 I need some place to spec/design this in anycase... so there is no conflict.

@pombredanne
Copy link
Member Author

FWIW, the advantage of using a pip-like URI/URL scheme is that it comes with a readily available and well tested implementation used for million downloads every day ;)

@sschuberth
Copy link

Well, the latter basically is also the case for Maven SCM (and the accompanying plugin), but I have no strong opinion about pip vs mvn as long as we try to avoid reinventing the wheel.

@pombredanne
Copy link
Member Author

as long as we try to avoid reinventing the wheel.

Agreed. Eventually converting from one scheme to the other should be easy-peasy and supported here.

@sschuberth
Copy link

Sorry for becoming a bit off topic, but speaking of the pip SCM URL scheme, would you happen to know how to get form a package name and version, like for example Jinja2 2.9.5 to the SCM URL to get the source code? I could not find any meta-data on the pypi page pointing to https://github.com/pallets/jinja.git. So, where is pip using the SCM URL scheme you've mentioned?

@pombredanne
Copy link
Member Author

pombredanne commented Mar 1, 2017

how to get form a package name and version, like for example Jinja2 2.9.5 to the SCM URL to get the source code

You cannot get this consistently unfortunately: it may be in homepage url field or elsewhere.

The way it is used in pip is when you call this:
pip install -e git+https://github.com/pallets/jinja.git@09f8b2b2d1cfca7e5b231cf3773bef2a952b6312#egg=jinja or pip install -e git+https://github.com/pallets/jinja.git@2.9.5#egg=jinja2 where the @ part would be any "commitish"

It could also be in a requirements file.

The part we care for in SPDX and here would be: git+https://github.com/pallets/jinja.git@09f8b2b2d1cfca7e5b231cf3773bef2a952b6312

@sschuberth
Copy link

You cannot get this consistently unfortunately: it may be in homepage url field or elsewhere.

Ugh. Or nowhere, it seems. At the example of Jinja2 I was only able to the to the source code location by manually navigating to the homepage, and form there clicking on the GitHub banner.

If you ever want to work on PEP stuff again, try establishing a meta-data field for the SCM URL ;-)

@pombredanne
Copy link
Member Author

yep... the only packages manifests OTH that are clear there wrt to always having a VCS location are Go deps/ven/weorkspace (many variants) and packagist for PHP where the fetching is primarily or only done from VCS directly... :|

@sschuberth
Copy link

Also NPM almost always fetches directly from SCM / GitHub, and Maven POMs also often contain the scm tag. So it's not that bad.

@pombredanne
Copy link
Member Author

Also NPM almost always fetches directly from SCM / GitHub

This has not been my experience with npm fetches, but the VCS info is often readily there.

And Maven POMs contain the scm tag, but only sometimes.

@pombredanne
Copy link
Member Author

Merging this. The conversation can continue in tickets or elsewhere.Thank for the feedback.

@pombredanne pombredanne merged commit 9dac746 into master Mar 7, 2017
@pombredanne pombredanne deleted the aboutcode-data branch March 7, 2017 11:14
DennisClark pushed a commit that referenced this pull request Jun 11, 2019
AyanSinhaMahapatra pushed a commit that referenced this pull request Mar 30, 2022
* Add PEP 517/518 pyproject.toml file
* Add setuptools_scm to handle versioning
* Add setup.py content to setup.cfg
* Update setup.py to act as a shim (so pip install -e works)

Addresses: #2

Signed-off-by: Steven Esser <sesser@nexb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants