Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/compute dependency hashes #143

Merged
merged 6 commits into from
Jun 7, 2018
Merged

Conversation

xizhao
Copy link
Contributor

@xizhao xizhao commented Jun 1, 2018

WIP, proposal to support using dependency hashes as an alternative resolution method for builds that heavily depend on vendorized dependencies.

@xizhao xizhao requested a review from elldritch June 1, 2018 22:39
package builderutil

import (
"crypto/md5"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G501: Blacklisted import crypto/md5: weak cryptographic primitive

}
hashes.SHA1 = string(sha1Hash.Sum(nil))

md5Hash := md5.New()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G401: Use of weak cryptographic primitive

}
hashes.SHA1 = hex.EncodeToString(sha1Hash.Sum(nil))

md5Hash := md5.New()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G401: Use of weak cryptographic primitive

package builderutil

import (
"crypto/md5" // #nosec
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G501: Blacklisted import crypto/md5: weak cryptographic primitive

}
hashes.SHA1 = hex.EncodeToString(sha1Hash.Sum(nil))

md5Hash := md5.New() // #nosec
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G401: Use of weak cryptographic primitive

@elldritch
Copy link
Member

I'm not convinced that dependency hashing is the right way forward for detecting vendored dependencies. I'm especially not convinced that using standard MD5/etc. hashing is the correct way to implement dependency hashing. I suspect this implementation might be JAR-specific.

Here are two other examples where hashing might help us:

  1. Vendored Go dependencies. They're generally unmodified, but the vendor/ folder doesn't contain information about which revision of a package was used. Go has its own weird quirks about parts within a repository are and are not considered part of a package, and each tool has its own idiosyncrasies on top of that. In this case, I think that we would want to do a tree hash of only *.go files.
  2. Vendored JavaScript dependencies. If somebody drops jquery.min.js into a folder, we want to detect that. The issue here is that jquery.js comes in many forms: source, minified, uglified, compressed, comments stripped, etc. We might be able to hash and then match many hashes to a single revision. Alternative strategies include looking exclusively for file name (in one extreme) to doing AST parsing (at the other extreme).

I don't think it's clear that this is a good design for hashing dependencies in general, but I'm happy to LGTM for JARs specifically if that is the intention and you move this into the Ant builder package.

@xizhao
Copy link
Contributor Author

xizhao commented Jun 5, 2018

We need a narrative to address a package that we truly don't have access to, but is considered 3rd-party; like an old vendorized JAR from who knows where.

This doesn't need to be relied upon by all languages, but there are some environments where this is just common.

FOSSA should have a large variety of ways to resolve a dependency; a sha is a good, generic way to represent "here's a bit of code". The CLI should just support this as part of the spec, but not necessarily implement it across everything if there are better methods available.

This seems to be particularly useful for Java and C#, see VersionEye's approach here:

https://blog.versioneye.com/tag/sha/

https://github.com/versioneye/veye-checker

An alternative method of doing this is just to create a "hash" locator spec that resolves to a known artifact; but my hesitation is that hashes are not very semantic. It's hard to understand what it represents which would render an unresolved hash locator useless to a user.

Copy link
Member

@elldritch elldritch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for a v0, but we'll have to revisit this.

package module

// Hashes contains hexadecimal checksums of code libraries brought in by running a Build
type Hashes struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we computing all of these? Isn't there just one hash that's relevant per dependency?

@xizhao xizhao merged commit c86fa51 into master Jun 7, 2018
@xizhao xizhao deleted the feat/compute-dependency-hashes branch June 7, 2018 02:19
meghfossa pushed a commit that referenced this pull request Nov 12, 2021
* Differentiate between locators with and without revisions
* Retrieve and store project scan filters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants