Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concerns about metrics from @ljharb #1826

Open
justaugustus opened this issue Apr 7, 2022 · 17 comments
Open

Concerns about metrics from @ljharb #1826

justaugustus opened this issue Apr 7, 2022 · 17 comments
Labels
kind/bug Something isn't working kind/new-check New check for scorecard needs discussion question Further information is requested
Projects

Comments

@justaugustus
Copy link
Member

From @ljharb in https://openssf.slack.com/archives/C0235AR8N2C/p1649348249786279:

hi! i have some concerns about some of the metrics used in the scorecard. for example:

  • "binary-artifacts" likely doesn't often apply to ecosystems that don't compile (JS, php, ruby, python)
  • more than one contributor is nice, but the existence of a lone contributor is usually because nobody else wants to help (that's why i'm the sole maintainer of over 10% of npm's traffic - not because i don't want help, but because nobody offers it). penalizing projects for something that's not in their control doesn't seem like it'll help security.
  • similarly, on a 1-maintainer project, it shouldn't matter if changes are landed via a PR or not, because there's nobody to do code review.
  • Pinned-Dependencies: the best practice for an npm project is to never pin dependencies, and to always use a ^ semver range. apps should always pin deps with a lockfile, and thus pinning deps in package.json provides no value (because transitive deps exist)
  • does SAST take into account eslint usage? a linter is a static analysis tool just like codeql, depending on how it's configured.
  • publishing from CI is not something i'd call a best practice, at least in the npm ecosystem - it's impossible to do it with 2FA, since using an authentication token is just a single factor.
  • Signed-Releases: what's the benefit of signing releases in an ecosystem where the majority of users (or their tooling) don't verify them by default? Measuring whether releases are signed, before making that actually a beneficial action, seems like putting the cart before the horse.
  • Fuzzing: given that CI logs are not persistent (GHA deletes them in 30 days, i believe, and travis deletes them on a rerun) a fuzzing tool that doesn't eternally persist previously failed input seems like it doesn't add much value. Why is this an objectively cross-ecosystem best practice?
  • Dependency-Update-Tool: for renovate, does it check the project's owner's .github repo for renovate config as well? i use this, altho it's not that common to do so.

I'd love to chat more about this 🙂 metrics that add value are great, but simply assigning a number to something brings a risk of unintended consequences/perverse effects, like incentivizing me to make a sock puppet account solely to code review all my changes, to satisfy the "2 contributor" requirement.


@ossf/scorecard-maintainers -- Some of these points are familiar, so wherever possible, let's reference existing discussions/issues/PRs. I just want to make sure we capture all of @ljharb's questions as a first step! :)

@justaugustus justaugustus added kind/bug Something isn't working question Further information is requested kind/new-check New check for scorecard needs discussion labels Apr 7, 2022
@justaugustus justaugustus added this to Backlog in Scorecard via automation Apr 7, 2022
@azeemshaikh38
Copy link
Contributor

@brianrussell2 fyi.

@laurentsimon
Copy link
Contributor

laurentsimon commented Apr 7, 2022

From @ljharb in https://openssf.slack.com/archives/C0235AR8N2C/p1649348249786279:

hi! i have some concerns about some of the metrics used in the scorecard. for example:

  • "binary-artifacts" likely doesn't often apply to ecosystems that don't compile (JS, php, ruby, python)
  • more than one contributor is nice, but the existence of a lone contributor is usually because nobody else wants to help (that's why i'm the sole maintainer of over 10% of npm's traffic - not because i don't want help, but because nobody offers it). penalizing projects for something that's not in their control doesn't seem like it'll help security.
  • similarly, on a 1-maintainer project, it shouldn't matter if changes are landed via a PR or not, because there's nobody to do code review.
  • Pinned-Dependencies: the best practice for an npm project is to never pin dependencies, and to always use a ^ semver range. apps should always pin deps with a lockfile, and thus pinning deps in package.json provides no value (because transitive deps exist)

The check does not ask users to pin their dependencies in the manifest. It only verifies that there's a lock file when it's used in CI, so you get "reproducible" installation and don't suddenly fall victim to "npm color" attacks if someone pushed a malicious package. If there's a false positive you found, can you provide a link?

  • does SAST take into account eslint usage? a linter is a static analysis tool just like codeql, depending on how it's configured.

We want to improve this check #1487 I envisage adding support for commands as well

  • publishing from CI is not something i'd call a best practice, at least in the npm ecosystem - it's impossible to do it with 2FA, since using an authentication token is just a single factor.
  • Signed-Releases: what's the benefit of signing releases in an ecosystem where the majority of users (or their tooling) don't verify them by default? Measuring whether releases are signed, before making that actually a beneficial action, seems like putting the cart before the horse.

yes, we'd like to have checks be more ecosystem-aware. This is on our radar, but needs work.

  • Fuzzing: given that CI logs are not persistent (GHA deletes them in 30 days, i believe, and travis deletes them on a rerun) a fuzzing tool that doesn't eternally persist previously failed input seems like it doesn't add much value. Why is this an objectively cross-ecosystem best practice?

/cc @oliverchang

I'd love to chat more about this 🙂 metrics that add value are great, but simply assigning a number to something brings a risk of unintended consequences/perverse effects, like incentivizing me to make a sock puppet account solely to code review all my changes, to satisfy the "2 contributor" requirement.

good point.

@ossf/scorecard-maintainers -- Some of these points are familiar, so wherever possible, let's reference existing discussions/issues/PRs. I just want to make sure we capture all of @ljharb's questions as a first step! :)

@ljharb
Copy link
Member

ljharb commented Apr 7, 2022

It only verifies that there's a lock file when it's used in CI, so you get "reproducible" installation and don't suddenly fall victim to "npm color" attacks if someone pushed a malicious package

I don't think this is actually a best practice - I think that a package should fall victim to the same attacks and bugs that their users will on a fresh installation, and that encouraging lockfile use in packages actively harms users by hiding problems from maintainers. The false positive imo is that this is considered at all.

@laurentsimon
Copy link
Contributor

I don't entirely follow. How would a maintainer of package X control how users will use their package? Even if the maintainers of package X want to protect themselves in their CI tests (I think they should), they would not be able to control what their users will do or when installing X, right?

@ljharb
Copy link
Member

ljharb commented Apr 7, 2022

Speaking for the npm ecosystem, most lockfiles are "dev only", which means that a project with a lockfile has a pinned graph, but their consumers have nothing pinned. There's also one lockfile format that is published, and that is respected by consumers - but it's considered highly user-hostile to use this, because it prevents the consumer from deduping, and from seamlessly updating transitive deps.

The belief I'm professing here is that the CI on a public project isn't worth protecting in this fashion in general terms, but it's definitely less important than ensuring that maintainers are notified as early as possible about issues in their dep graph. With a lockfile, the project will continue passing CI even as users who newly install it get bugs or security vulnerabilities - without one, the project's success or failure matches that of the users, and the problems can be fixed much more rapidly.

@laurentsimon
Copy link
Contributor

The belief I'm professing here is that the CI on a public project isn't worth protecting in this fashion in general terms, but it's definitely less important than ensuring that maintainers are notified as early as possible about issues in their dep graph. With a lockfile, the project will continue passing CI even as users who newly install it get bugs or security vulnerabilities - without one, the project's success or failure matches that of the users, and the problems can be fixed much more rapidly.

There are two different classes of problems:

  1. Security vulnerabilities, i.e non-intentional problems. Good dependency patches/alerts are definitely really useful.
  2. Intentional, malicious packages. Dependency alerts don't help that much here, right? Pinning can limit the spread of problem. Remediation and incidence response are expensive, as we've seen in log4j

@ljharb
Copy link
Member

ljharb commented Apr 7, 2022

Right - but a dev-only lockfile (the almost exclusive kind used in the npm ecosystem) does not pin dependencies for consumers, only for the maintainers. The best practice in the npm ecosystem is to always use ^ on dependencies and never, ever pin them - that's always only the job of the top-level app (with a lockfile).

I'm not fully versed on log4j, but wasn't the issue with log4j a problem because of a ton of applications that were pinned, and thus when the long-standing vulnerability was discovered, they weren't able to upgrade to the fixed version?

@laurentsimon
Copy link
Contributor

laurentsimon commented Apr 8, 2022

Right - but a dev-only lockfile (the almost exclusive kind used in the npm ecosystem) does not pin dependencies for consumers, only for the maintainers. The best practice in the npm ecosystem is to always use ^ on dependencies and never, ever pin them - that's always only the job of the top-level app (with a lockfile).

I think we're saying the same thing... The scorecard check does this, i.e., checks that the final top-level app uses a lock file. If someone tests an API in their CI, scorecard will also check they've added a lockfile for the CI check, but not the overall project.

I'm not fully versed on log4j, but wasn't the issue with log4j a problem because of a ton of applications that were pinned, and thus when the long-standing vulnerability was discovered, they weren't able to upgrade to the fixed version?

log4j was an example of how remediation takes time and limiting the blast radius helps.
The npm color package attack is one that affected may people due to lack of pinning. Would the changes been malicious, it would have created a lot of work for remediation.

@ljharb
Copy link
Member

ljharb commented Apr 8, 2022

The scorecard runs on projects tho - a published project isn't a top-level app, generally (altho sometimes it can be).

@lucasgonze
Copy link

lucasgonze commented Apr 17, 2022

About the lockfile, is your thought that:

  1. A library should not use a lockfile, because that will conceal dependency-induced failures that might be caught by tests.
  2. The scorecard should distinguish between libraries and apps

I want to echo Laurent's point about how lockfiles and pinning protected many projects from colors.js , or failed to protect because they weren't present.

@ljharb
Copy link
Member

ljharb commented Apr 18, 2022

Sure, in this one case - less than a dozen over the lifetime of npm, by my reckoning - this is true. However, not pinning has allowed the seamless flow of bug and security fixes, and also allowed apps to dedupe, resulting in significantly smaller bundle sizes, saving the web uncountable amounts of bandwidth.

I’m saying that the benefits of pinning in a package are infinitesimally rare, and the harm of pinning is massive.

@ljharb
Copy link
Member

ljharb commented Aug 24, 2022

Some additional feedback: pinning an action to a tag is perfectly fine when the author is trusted, and is far better for security because security fixes can freely flow downstream.

@naveensrinivasan
Copy link
Member

Some additional feedback: pinning an action to a tag is perfectly fine when the author is trusted, and is far better for security because security fixes can freely flow downstream.

What happens if the author account is compromised? How do you deal with that?

@ljharb
Copy link
Member

ljharb commented Aug 24, 2022

The same thing that happens if the author account is compromised, they update the action to have well-hidden malicious code, and then someone rubberstamps a dependency update PR - be compromised.

In practice, this simply is not a thing that actually happens. Additionally, this is on open source published packages - there's nothing to compromise because nothing is done in the actions. In most cases, they don't even have permissions to do anything (via actions config).

This level of paranoia is appropriate for an application - but it's actively harmful for a published package.

@lucasgonze
Copy link

This would be a good panel discussion at something like https://events.linuxfoundation.org/openssf-day-europe/.

@laurentsimon
Copy link
Contributor

laurentsimon commented Aug 25, 2022

In cases where there are no permissions, I agree that pinning does not help and incur additional work for maintainer to maintain / update. @ljharb is this what you meant or you had a different example in mind?

The point on developer trust is interesting. It's hard for scorecard to do out of the box, but allowing consumers to tweak the results based on developer's identity would be interesting. What signals would you use to assess a "developer's trustworthiness" (2FA, projects they contribute to, ?)?

@ljharb
Copy link
Member

ljharb commented Aug 25, 2022

That’s up to each maintainer. GitHub’s official recommendations, for example, say that pinning an action to a tag is sufficient for an action author the maintainer trusts.

The scorecards system fails to be useful if it tries to editorialize too aggressively - when something is sometimes safe and reasonable, if the scorecard penalizes maintainers for it, the scorecards themselves stop being well-considered and useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working kind/new-check New check for scorecard needs discussion question Further information is requested
Projects
Scorecard
Backlog
Status: Backlog - Bugs
Development

No branches or pull requests

6 participants