Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triage Buckets and Prioritization #604

Closed
lirantal opened this issue Dec 13, 2019 · 50 comments
Closed

Triage Buckets and Prioritization #604

lirantal opened this issue Dec 13, 2019 · 50 comments
Assignees
Labels
process processes and documentation

Comments

@lirantal
Copy link
Member

lirantal commented Dec 13, 2019

Background

We are thinking that the current bucket order helps and a step in a good direction but we want to further improve it. Right now we have 2 buckets: Low priority <100 d/l weekly and High priority >100 d/l weekly and current stats are 30 vs 36 respectively. So it helps, but we'd like to also make sure we don't starve the reports in >100 that are affecting a good part of the ecosystem (i.e: >100000 downloads).

Bucket Segmentation

Segmenting the buckets further so that we have:

  1. Unmaintained - bucket for <100 but also for packages that didn't have a release/commit in the last 12 months. Possibly for this we could consider automatic disclosure once triage has confirmed the vulnerability. Requires discussion/agreement
  2. Low (and active) - bucket for <10000 downloads
  3. High - TBD based on stats
  4. Critical - TBD based on stats but generally speaking this would be where we want to have significantly impactful modules, probably the list of top 1000 modules on npm or so that would have a serious impact in terms of their security issues and reach.

Bucket SLA and Leads

For each of the proposed buckets above we'd like to have defined SLAs so e.g. for the Critical bucket we'd have an SLA of triage of 24 hours so it can be addressed in a timely manner.

Moreover, we'd like to have defined Leads that would be the contact person for handling those and then we can assign members to each of these bucket groups too to focus their work there.

@lirantal lirantal added the process processes and documentation label Dec 13, 2019
@lirantal lirantal self-assigned this Dec 13, 2019
@sam-github
Copy link
Contributor

While I don't object even a little to any of the above, I think an issue with just having different SLAs for different levels is it doesn't decrease the overall amount of work to be done by sec-wg volunteers, it just allows more time... but the work still needs to be done. And there is more work than people, so the result might be that bucket 3 and 4 meet SLAs, but buckets 1 and 2 will essentially go on a queue and never get off it, because no one will have had time.

I'd suggest that for buckets 1-3 above, that we make it clear that it is the reporter's responsibility to contact the package maintainers, and get a response from them, or to post a comment explaining that they have attempted to contact the maintainer, and got no response.

X days after the maintainers have been notified (or failed to respond), the vuln can be published, and the process no longer blocks on sec-wg activity.

The value of X could vary depending on the bucket, bucket 1 packages are essentially unused, the amount of energy the sec-wg should put into them should tend towards zero.

@lirantal
Copy link
Member Author

it doesn't decrease the overall amount of work to be done by sec-wg volunteers, it just allows more time

Not only that. It allows us to focus on real impacting reports, which otherwise would get lost in the clutter of tens or hundreds of reports. The work will always remain, but also notice that if we have a bucket that we agree we will disclose right after triage then that is still helpful in terms of shortening the queue and the time to resolution.

I'd suggest that for buckets 1-3 above, that we make it clear that it is the reporter's responsibility to contact the package maintainers

Respectfully disagree here, and definitely for all of those buckets. We want to make it accessible for researchers to report, not harder. Plus, they may not even be familiar with the ecosystem so it would make their lives much harder.

X days after the maintainers have been notified (or failed to respond), the vuln can be published, and the process no longer blocks on sec-wg activity.

This is the current process anyway.

@sam-github
Copy link
Contributor

sam-github commented Dec 16, 2019

I'll respond more fully, but just to make sure we agree on the current state of things:

I see Low Priority (< 100) vulns in H1 from over 6 months ago that have not been disclosed. This leads me to think that the vulnerabilities are being reported faster than the WG has capacity to handle the reports to completion.

Am I misunderstanding the state? I could be not understanding the flow, and maybe the vulns are slow through the process not because of WG capacity, but because the reporters aren't responding, or something like that.

@lirantal
Copy link
Member Author

the vulnerabilities are being reported faster than the WG has capacity to handle the reports to completion

It means that we aren't doing a good job in catching up with them as we indeed don't seem to have the resource capacity to handle all of those reports. The issue isn't with reporters or maintainers not responding (as in, it may be, but we would still go forward with disclosure after 45 days of triage), so bottom line it's just us not being attentive to these reports sadly.

@MarcinHoppe
Copy link
Contributor

I agree the bottleneck is definitely on the WG side right now.

@sam-github
Copy link
Contributor

I don't at all think that anyone is "not doing a good job" or "not being attentive". I think the entire triage is being done by only 2 or 3 volunteers, on pretty much their own time, and are effective with the time they spend.

I want it abundantly clear that I have absolutely no criticisms of how people spend their time working on triage, and if I could arrange it, you'd all get bottles of champagne quarterly (too bad that's not covered by the bug bounty).

That said, if the time WG members have available is a bottleneck, then something should be done. I don't think its helpful to anybody to for the WG to commit to doing things that it doesn't have the capacity to get done. Its frustrating to researchers (witness the pinging in the H1 issues for status updates), and having a multi-month backlog of known to some but not yet publicized vulnerabilities isn't helping ecosystem security.

So, back to #604 (comment)

Bucketing things so issues are handled in priority order is great, no objections, but if there is too much work, then either more triagers have to be onboarded (would be great, but forcing volunteers to appear isn't possible!), or the commitments have to be scaled back so there are no bottlenecks.

Respectfully disagree here, and definitely for all of those buckets. We want to make it accessible for researchers to report, not harder. Plus, they may not even be familiar with the ecosystem so it would make their lives much harder.

@lirantal That is exactly the point, to make things harder, or to put it more positively, to distribute the work to the people who are motivated to do it. Reporters are motivated by H1 rankings, so they will be motivated to contact the projects. Its standard practice, IMO, for vulnerability researchers to contact the projects they have found vulnerabilities in. I think this program setup is unusual, in that the sec WG has committed to doing this legwork for the researchers, even though we don't have the capacity.

To be clear, for high priority projects, I think it is within the WGs capacity to contact the projects and help move things along.

Other, more radical, proposals for decreasing time commitments:

If the project is of low enough priority by downloads, I am personally OK with simply verifying that the vulnerability is valid, and publishing. The publish on H1 will trigger scraping by npm, inc, and npm audit will start to warn the (very, very few) users of the package, and those users can then move away from it or contact the package maintainer to get it resolved. Since triage and reproduction is handled by the H1 team, the amount of time required for WG members is pretty much a couple button pushes to promote to publish.

An even more radical proposal would be to stop maintaining the JSON database. It predates H1. H1 itself is a DB of vulnerabilities, so reformatting as JSON isn't required, though we assume it makes it easier to consume. Do we know if there are any users of the JSON DB? IBM doesn't use it directly (to my knowlege). npm pulls directly from H1. Any other known users? If no WG member is supported by a company or org pulling the JSON, perhaps we should stop updating the JSON... this would encourage people who actually pull the JSON and find value in it to get involved in keeping it up to date (or perhaps we'd find there isn't that much interest).

@lirantal
Copy link
Member Author

lirantal commented Dec 19, 2019

I want it abundantly clear that I have absolutely no criticisms of how people spend their time working on triage, and if I could arrange it, you'd all get bottles of champagne quarterly (too bad that's not covered by the bug bounty).

cracked me up @sam-github 😆
❤️🤗

Reporters are motivated by H1 rankings, so they will be motivated to contact the projects.

I understand that but AFAIK even if they wanted to contact the maintainers and add them to the triage they couldn't. Instead, what we could here is ask for the H1 support team to reach out for us for those low buckets.

If the project is of low enough priority by downloads, I am personally OK with simply verifying that the vulnerability is valid, and publishing.

this is what we indeed want to do with bucket number 1 (see Unmaintained here #604 (comment))

An even more radical proposal would be to stop maintaining the JSON database.

that's an interesting take which I hadn't considered but unrelated directly to this so if you want to further see that we can talk about this in a new issue and over an agenda call.

@sam-github
Copy link
Contributor

This is on the sec WG agenda, and next meeting is Monday, Dec 30th. I can make that, I wonder how many other people can?

I guess the WG agenda will be auto-published next week and we'll see.

@lirantal
Copy link
Member Author

yeah let's skip that due to holidays and timeoff for everyone.

@sam-github
Copy link
Contributor

ok, then we'll discuss in 6 weeks time. no desperate hurry. And all your prioritization suggestions look OK to me, in the immediate term. If you and the other people doing Triage agree and a bucking/SLA system, I'm good with whatever you want to do.

@mralekzandr
Copy link

Hi friends! Just revisiting this thread that I previously missed. @sam-github - I also wish the bounty budget covered quarterly champagne parties. Good shout! Who do we talk to to make this happen? ;-)

Here are some suggestions I made with @MarcinHoppe on our last monthly sync. They don't quite align with what was discussed here, but I'll post them regardless to get feedback from the rest of the group:

  1. Priority Buckets: I noticed the High Priority bucket is exceeding 40+ tickets. When we first implemented these buckets, there was discussions around potentially raising the threshold for what is considered High (right now, it's >100 weekly downloads). Marcin and I discussed the option of expanding the views into High Priority, Medium Priority, and Low Priority. You all will know best, but my off-the-cuff suggestion was: High = >1000 weekly downloads, Medium = 100 - 999 weekly downloads, Low = <100 weekly downloads. Thoughts?

  2. Disclosure Buckets: I recently implemented some new inbox views for the node.js core team to help them move tickets through different inbox views as they make the decision to publicly disclose or not disclose reports. It's my understanding that there's perhaps an unwritten rule for ecosystem that ALL reports get disclosed. But if there are ever edge cases, I can help create inbox views to navigate the decision. For Core, there's Review 4 Disclosure, Requested Disclosure, and Disclosed Reports. Additionally, there's a Decided Not To Disclose group, where the team can assign reports they've decided aren't eligible or worth disclosing.
    Depending on your current workflows, Relationship to private security activities #2 might not be as helpful to you as it was for Core. But I'm happy to start a conversation around it either way!


some questions/concerns/ideas raised in this thread are more in depth than what I was considering when I made the initial suggestion. There's obviously a bit more to consider than criticality a la weekly downloads. My main goal is to make the inbox more manageable for you all, so I'll defer to you on best practice/what makes the most sense for the team.

-Alek

@lirantal
Copy link
Member Author

I'm agnostic on the disclosure buckets. I'm happy to have them if others think it helps.
Priority buckets, I think that anything around the 1000 weekly downloads still counts as low and should move to that bucket, and not high. We need to differentiate between critical modules in the ecosystem getting lost and others and so my personal take is to start with low being in and around the <= 10k downloads, and a high and critical buckets as well along with the unmaintained bucket as suggested in the original issue.

@MarcinHoppe
Copy link
Contributor

I also think we could mark specific assets as high priority. For example: if there is a report against jQuery or Express, I think we should treat it as high priority because of how widespread the usage is.

@lirantal
Copy link
Member Author

Yes but that requires someone actually reviewing the report, even at least looking at it, which sometimes (at least for me) I don't always do as a first glance. Some items we might not even recognize, as in, I wouldn't know by heart that a dep that is less popular like event-stream has millions of downloads.

If we however triage it with the H1 team and on the spot, based on popularity put it in to a CRITICAL bucket then we can also put SLA with that and preferably some automatic alerts too that ping us.

@MarcinHoppe
Copy link
Contributor

I was looking at it from a slightly different angle: my goal is to have the download count as one factor (this could be determined quickly by the H1 triage team) and perhaps give priority to a selected group of projects (e.g. https://openjsf.org/projects/). I think those projects would meet the download count criteria anyways (I need to do a bit of research here, too).

I like the idea of automated alerting (even a Slack ping) for High priority reports.

@mhdawson
Copy link
Member

I'm +1 for raising the cutoff for high priority

@mralekzandr
Copy link

Let's take disclosure buckets off the table for now; it sounds like that wouldn't be as helpful to your team as it was to node.js core.

Keep me informed as to what you all decide on raising the cutoffs / introducing new priority buckets. I'll want to be sure our Triage team is fully aware of any changes before we officially implement them. Thanks!

@mcollina
Copy link
Member

mcollina commented May 9, 2020

I would raise the cutoff of high priority to 10000 weekly, and update the template telling that for anything lower than that what the process is.

@MarcinHoppe
Copy link
Contributor

I would support that. Currently we have way too many submissions in the High priority bucket than what we can really handle.

@lirantal
Copy link
Member Author

Aligns with the original plans we had 😉

Hey @mgalexander - can you make sure that only >=10000 downloads weekly make it to the high priority bucket?

@sam-github
Copy link
Contributor

I continue to think that the prioritization doesn't help with the problem of "there are too many reports, and not enough triagers".

Prioritization is just that, it changes the order in which things are looked at, but the process is identical, https://github.com/nodejs/security-wg/blob/master/processes/third_party_vuln_process.md#handling-vulnerability-reports, for every prioritization level, except that there aren't enough people, so the low priority issues will never make it through the process. See #654

I think reports on modules at the Low priority level should be either arbitrarily rejected from H1 without bounty (and redirected to the package's github issue tracker) or insta-published after verification (without bounty).

@mralekzandr
Copy link

Hi team! Apologies for my latency here.

It sounds like 3 actions need to be taken on my end:

  1. update the HIGH PRIORITY bucket to only contain packages with >10000 downloads (i.e. update internal triage notes)
  2. instruct triage team to mark any valid (triaged) report with <10000 downloads as INELIGIBLE FOR BOUNTY before moving them to LOW PRIORITY bucket
  3. update the policy to reflect that reports on packages with <10000 downloads are automatically ineligible for bounty

Does this sound correct?

@lirantal
Copy link
Member Author

I think reports on modules at the Low priority level should be either arbitrarily rejected from H1

I disagree here, I wouldn't reject reports. That just seems wrong and doesn't communicate a helpful message. Is it saying we don't care about a module that has 1000 downloads?

We should communicate clearly that the SLA will be significantly slow, and advise them to consider posting the report elsewhere like with npmjs, snyk or others.

@lirantal
Copy link
Member Author

@mralekzandr yep, all 1-3 are correct.

@mcollina
Copy link
Member

I would not spend effort in anything less than 10000 weekly downloads :(. It seems we are currently overwhelmed, and it's unlikely that we will be able to handle them.

I think we should update the SLA.

@MarcinHoppe
Copy link
Contributor

Perhaps we should rethink how we manage scope in the first place? Currently the program is open for submissions for anything. One of the negative side effects of that is spending time and energy chasing maintainers that may no longer be active or willing to spend the time fixing the reported issue.

Perhaps the scope should only cover packages where maintainers opt-in, making our lives easier and focusing researchers on those packages. Of course, we would accept all packages that come to us and request being added. This way we would always have proper maintainer contacts and could encourage that each of the joining projects has a security policy that points to our program on H1.

Onboarding OpenJSF projects might be a good start in this direction.

Just a thought, I am not saying we definitely should do this but it would alleviate some of the concerns.

@mcollina
Copy link
Member

I agree with @MarcinHoppe.

@lirantal
Copy link
Member Author

It's a great validation to get maintainers to opt-in, but realistically I believe it won't span for more than a few. We tried it with eligible bounty modules - how many do we have? and how many did we increase since we've put that policy in place? None. As another evidence, after a recent security report that was fixed in a well known plugin (7M weekly d/l), I reached out to the maintainers involved (which also maintain even more popular projects) to ask them if they agree to get added on that module bounty list so we can reward the researcher and I heard nothing back (3 weeks now).

@mcollina
Copy link
Member

Overall I think we should reshape the program. I'm happy to say that anything with millions of downloads is eligible.

@lirantal
Copy link
Member Author

I feel so too but previous discussion on this was keen on getting confirmation first from the specific maintainers for us to specify eligibility.

@sam-github
Copy link
Contributor

I think this conversation is mixing in discussion of opt-in to bounties, and whether to accept reports at all.

Not addressing bounties for the moment:

@lirantal

We should communicate clearly that the SLA will be significantly slow

For the lower priority, what appears to be happening is the delay is infinite, because there aren't enough people to work through the high priority reports.

Is it saying we don't care about a module that has 1000 downloads?

No, it isn't saying "we don't care", its saying "sorry, but we don't have the capacity to handle these". Those are very different.

I feel its more caring, because instead of telling people "wait, we'll get to it eventually" (and then perhaps not) we say "I'm sorry, we don't have capacity to address this report".

Which leads to...

, and advise them to consider posting the report elsewhere like with npmjs, snyk or others.

I think this is a great idea. For packages under a certain number of downloads, they can report to organizations that have the capacity and interest to handle reports against (relatively) unpopular packages.

Even if we agree to that, it leaves the question: do we close as "sorry, no capacity to accept" or do we auto-disclose? I tend to the former.

@mcollina
Copy link
Member

I agree on the former

@MarcinHoppe
Copy link
Contributor

I feel auto-disclosure policy might lead to backlash when consumers of those reports (e.g. through GitHub alerts or npm audit) come back to us reporting false positives, etc.

I am also in favor of the former.

@phra
Copy link

phra commented May 16, 2020

Hello 👋

I was redirected to this issue from #654.

I think we are trying to implicitly solve two distinct problems at once here:

  1. noise generated by low-risk reports
  2. shortage of resources to triage reports and follow resolutions

Regarding 1

IMHO is also ok to outsource low-risk reports to 3rd parties that can have more capacity to handle them, given the guarantee of a deterministic disclosure timeline. (e.g. 90 days by default, possible extensions upon strong motivation)

Otherwise, I think is still acceptable for low-risk reports to directly forward the issues to the repository on GitHub, BitBucket, etc.

In any case, it's very important to avoid having stale reports for multiple reasons:

  1. they doesn't provide any tangible security benefit to the community
  2. they demotivate security researchers because they don't see actionable outcomes out of their researches (e.g. patch release, bounty assignment, etc)

Regarding 2

I think the solution for this is easier to implement, which is increasing the number of people triaging and following the resolution of the issues. I have some high/critical reports getting stale affecting packages with more than 10k weekly downloads, but I haven't received any comment but from h1 staff. This is completely unrelated to the prioritization discussion since they should be already considered as high risk. I am not aware of the current number of (active) members working on h1 reports, but I suggest to make it double or triple (or even more) to make it more effective, especially because we are dealing with a pro bono activity.

General considerations

After the shortage of resources is solved, on the medium/long term it's maybe worth it to think about some improvements that can be implemented in order to make the whole resolution process smoother, starting from making it easier for the triage team to contact the maintainers of affected packages. Some ideas:

  1. (for NPM) ensure the presence of dedicated security property to package.json or force the presence of a SECURITY.md files in order to be able to publish a package on NPM where the developers have to declare contact info that can be used if needed.
  2. (for GitHub) it would be nice to have a dedicated section for security issues on GitHub since it's usually the fastest medium to contact developers, based on my own experience. this section and the specific vulnerabilities should only be visible from the maintainers and the vulnerability reporters, but not for the general audience like for regular issues.

PS: I am eventually available to do some contributions to the h1 program in my spare time.

@lirantal
Copy link
Member Author

@sam-github thanks for your thoughts.
I agree on closing with "sorry no capacity" too for existing reports in terms of how should we act if we agree on this, but also from that point on make it clear what is the threshold of d/l for us to actually accept.

@lirantal
Copy link
Member Author

Taking actions on this:

  1. Update the policies (triage process, program page, security.md for the nodejs website, github.com/nodejs/node/security.md) for the updated threshold.
  2. Are we all agreeing on the threshold of 10000 monthly downloads as an eligible module for us?

Thumbs up so we can move on with this?

@mcollina
Copy link
Member

Are we all agreeing on the threshold of 10000 monthly downloads as an eligible module for us?

Maybe we can raise that a bit a more. 100k monthly.

@lirantal
Copy link
Member Author

Regardless of raising it to 10k or 100k, what do we do about existing "top priority bucket" items? should we still hold the up to the new threshold and then close with 'sorry no capacity' ?

@MarcinHoppe
Copy link
Contributor

How about reprioritizing them, then attending first to the "new high priority" and then working on the tail end as time permits? Looks like we will soon have new people on triage team so it sounds doable.

Also, getting the monthly downloads bar higher will stop the influx of new reports leaving some time to work through the backlog.

@mralekzandr
Copy link

just a heads up that I haven't updated any internal notes or processes on my end yet! Let me know when there's mutual agreement :)

@lirantal
Copy link
Member Author

@reedloden in relation to our bug bounty program (and maybe prioritization too) - is it possible to utilize the H1 platform in order to reward maintainers too for when they engage and fix an issue?

@reedloden
Copy link
Contributor

@lirantal The Platform isn't entirely built for that, but you could have the maintainer submit a new report and then award it bounty.

@lirantal
Copy link
Member Author

@reedloden A bit cumbersome. Maybe if we can create a specific kind of report type? it would be odd if that would be a vulnerability report. Both confusing and also will skew up the stats.

@mralekzandr
Copy link

mralekzandr commented Jun 15, 2020 via email

@lirantal
Copy link
Member Author

Gotcha. That might also work. I guess depending on how much can we automate it with the API.
How would these fake imported reports look like? would they be searchable and open to the public or is it some internal report only that the H1 team and us can see?

@mralekzandr
Copy link

mralekzandr commented Jun 16, 2020 via email

@DanielRuf
Copy link
Contributor

Is this still relevant or should we cleanup some open issues @lirantal?

@lirantal
Copy link
Member Author

lirantal commented May 8, 2022

This is indeed irrelevant anymore as we've exported and moved all the issues from HackerOne's queue into Snyk.

@lirantal lirantal closed this as completed May 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
process processes and documentation
Projects
None yet
Development

No branches or pull requests

9 participants