Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Lock module version in lookup.json #407

Open
BethGriggs opened this issue May 12, 2017 · 24 comments
Open

Proposal: Lock module version in lookup.json #407

BethGriggs opened this issue May 12, 2017 · 24 comments
Assignees

Comments

@BethGriggs
Copy link
Member

As we test more modules in CitGM, triaging failures does not seem to be scaling.

With any failure we need to work out whether it was caused by:

  • the change in Node we're testing
  • the module/module test suite being updated
  • flaky tests

To make life easier I would suggest that we:

  • Lock the modules to specific (known working) versions in the lookup.json (with appropriate flaky tags if necessary).
  • Allow module authors to raise PRs to update the versions (when they feel like it). We can update modules ourselves in batches too (perhaps at weekly or monthly intervals).

This means that instead of constantly having to stay on top of all of the module/modules test suite updates and regularly having to mark some of these tests as flaky, we could do this in batches at a regular interval.

@jasnell
Copy link
Member

jasnell commented May 12, 2017

Yeah, moving targets are never fun. I'm +1 on this.

@gibfahn
Copy link
Member

gibfahn commented May 12, 2017

@MylesBorins I'd be interested to know your thoughts on this.

I'm +1 on this. If we can encourage module authors to update their modules here then that would be an added bonus.

@MylesBorins
Copy link
Contributor

MylesBorins commented May 12, 2017 via email

@jasnell
Copy link
Member

jasnell commented May 12, 2017

@MylesBorins ... how would you feel about an approach that used two lookup tables in CI... one with a last-known-good configuration of modules, and one with the current-set? Running differentials between those could help us track down regressions quite easily.

@MylesBorins
Copy link
Contributor

MylesBorins commented May 12, 2017 via email

@gdams
Copy link
Member

gdams commented May 12, 2017

I am kind of with @MylesBorins on this, the whole point of citgm is that we catch module failures as soon as they are released, I agree that perhaps having two lookup tables could work though but it could get messy as you may have different versions being compatible with different platforms ?

@MylesBorins
Copy link
Contributor

MylesBorins commented May 13, 2017

OK so I'm thinking about this a bit more and thinking specifically about the high level problem... people don't want to use CITGM because of the number of false negatives, and the fact that it is not intuitive to read the results

In the past I had been more on top of keeping the flakyness of the lookup table up to date. I would run CITGM on each release line, review the results, and update the lookup as appropriate. An obvious solution would be to make this a task that people sign up for... (or possibly an expected part of using CITGM if you experience flakyness).

Another problem is knowing if something is truly flaky. @gdams had started work on a stress test feature for CITGM that would make it easier for us to run a module multiple times to figure out just how flaky they were.

Perhaps we can even find a way to automate the above process, or make it far easier to verify if a module is flaky or now (multiple failures auto updates the lookup for example).

@BethGriggs thanks for bringing this up! Obviously there is push back from the collaborators on using this tool regularly and we should definitely work on figuring out how to make it more user friendly

@gibfahn
Copy link
Member

gibfahn commented May 13, 2017

The whole point of citgm is to find failures

the whole point of citgm is that we catch module failures as soon as they are released

I disagree pretty strongly with this. The whole point of CitGM is to smoke-test whether Node.js core changes break the community. The purpose of CitGM isn't to fix module issues, the modules we test have their own CI.

It is inconvenient when they break, but isn't that kind of the point?

CitGM is not useful unless it's green. I don't think most breaking changes will be discovered only on the absolute latest module versions, so being a month or two (at most) behind the latest module should be fine (if anything older module versions probably use more deprecated features).

and it sounds like quite a bit of work to implement / maintain.

Implementing should just be a version in the lookup.json, maintaining means running with the latest versions and PRing said update. We could have a tool that auto-updates the lookup with the latest version if it passes.

We should likely focus our efforts on finding out why our CI infra has false negatives that don't exist on other machines

Sure, but we have to do that anyway, this is just one thing we can do to try to insulate people running CitGM on their PRs in core from having to deal with transient module issues.

What I tend to see is someone running CitGM, getting a bunch of failures, and then pinging @nodejs/citgm to ask whether they're expected. And I don't think we've been very good at answering those people.

IMO the possible reasons we might not want to do this are:

  • Updating the versions will be more effort than fixing issues
    • Locking the versions means that CitGM stays green while we resolve any issues.
  • There aren't enough module issues for it to be worth it
    • I'd say there are.

@MylesBorins
Copy link
Contributor

I disagree pretty strongly with this. The whole point of CitGM is to smoke-test whether Node.js core changes break the community. The purpose of CitGM isn't to fix module issues, the modules we test have their own CI.

When I run CITGM on a release I care that we are testing against what people will get when they run npm install. That is whatever the latest version of a module is. If it is broken, it is broken

@gibfahn
Copy link
Member

gibfahn commented May 14, 2017

When I run CITGM on a release I care that we are testing against what people will get when they run npm install. That is whatever the latest version of a module is.

I would assume the vast majority of installs come from a module range specified in a package.json. Have any of the issues that CitGM has turned up come from the most recent version of a module?

If it is broken, it is broken

Okay, but if it is broken we just skip it in the lookup. It's not like we hold the release until all the modules are fixed.

If we had a mechanism to automatically update the lookup, raise a PR, and run CI on the latest release (e.g. v7.10.0 right now) every week, would that ease the issue? It could even be triggered before a release.

@MylesBorins
Copy link
Contributor

@gibfahn I think we can definitely introduce a more aggressive policy for skipping modules. Technically at this point any collaborator can make changes to the lookup without requiring any sign off.

What we really need is regular audits of the release lines with CITGM, where we ignore flakes and keep the lookup up to date. This could potentially be automated

@gdams
Copy link
Member

gdams commented May 22, 2017

#321 is a start to keeping our lookup up to date

@refack
Copy link
Contributor

refack commented May 22, 2017

IMHO multiple tables makes sense. A sparse one for the general case, and a locked version one for each release line. This will allow us to differentiate our regressions and modules' regressions.
A version specific lookup.json could sit in https://github.com/nodejs/node/tree/master/test and get PRs that get peer reviewed and semantic versioning...

@MylesBorins
Copy link
Contributor

I'm a huge -1 on multiple tables. lookup is the baseline and it is what we run off of. A separate table does not solve my concerns above regarding not keeping up to date... it simply doesn't enforce them in the main citgm.

I think the wiki has been a really good first take fwiw

@refack
Copy link
Contributor

refack commented May 22, 2017

I think the wiki has been a really good first take fwiw

Thanks!
The only caveat is that you still need to know what's what, so I get people who ask me "is my CITGM run Ok?" then I essentially do a second manual lookup filter (the difference is I have version context).

@BridgeAR
Copy link
Member

I personally think that CITGM would have a much higher value if it would be green in average even if the versions tested are not all up to date. The point is that they should still reflect most of the user basis even if they are older. Let us get more modules in and have a better guarantee because of that and not because of the newest version.

We should use a lookup table for the newest versions though e.g. once each two month.
Ideally CITGM could use something like greenkeeper.io but I do not see a trivial way to implement that. Even though it would also not help against flaky test suites.

@mhdawson
Copy link
Member

I agree we need to figure out how to let more people run this usefully. To be useful it needs to be green most of the time when people need/want to run it. It would be interesting to see if a version with fixed module versions would be more stable or not.

Running something like a "stable" and "canary" where new versions of modules get promoted from "canary" to "stable" once they have passed for some period of time is a model that might balance the need to test new versions of modules while also allowing it to be more broadly used.

@MylesBorins
Copy link
Contributor

I once again am going to push back on this.

As the person who is doing the majority of the work with CITGM, the majority of the updates to the lookup.json and is one of the primary people using the tool day to day. I am not convinced that we would keep the lookup table up to date and am concerned that it would fall back on me to maintain it.

We do get failures, and it is a mixed bag of reasons.

Currently the majority of failures have been due to flakes or infra issues on our side. In fact, the 6.x line is for the most part green on citgm.

All of that being said. If someone is willing to do the work to implement installing specific versions I am willing to give it a try to see if it improves things, landing commits and testing workflow is not a massive burden. One thing to keep in mind is that we should like support tagged Majors only as a non trivial number of npm modules are installed to auto-update to the latest minor

@gibfahn
Copy link
Member

gibfahn commented Oct 24, 2017

I am not convinced that we would keep the lookup table up to date and am concerned that it would fall back on me to maintain it.

Yeah, this is definitely the core issue. I would say that I think the maintenance burden is likely to be the same either way (modules break pretty frequently in my experience), especially if we could get some automation.

However we might also fall into a false sense of security, where we think everything's fine because some ancient version of a module is green, but actually latest is broken (and we just never update the lookup version because the latest one is broken).

Alternative proposal

How about we do something more on the lines of:

Have an 'lkgr' version in the lookup. First run tests on latest, and if the tests fail there then fall back to the lkgr. That way we should be able to quickly see that

  • Fail, Pass => Module broken by module update
  • Fail, Fail => Module broken by node update

Obviously it won't work if the module is flaky, but nothing works if the module is flaky, so 🤷‍♂️ .

@MylesBorins
Copy link
Contributor

MylesBorins commented Oct 24, 2017 via email

@gibfahn
Copy link
Member

gibfahn commented Oct 24, 2017

would you propose we keep the lookup updated?

Ideally we'd have

How do we handle lkgr being different across release lines and platforms
I can definitely see that snowballing out of control. I think for platforms we should just control it with "lkgr for all non-flaky platforms".

For release lines we might need multiple versions sometimes.

@MylesBorins
Copy link
Contributor

MylesBorins commented Oct 24, 2017

For release lines we might need multiple versions sometimes.

this leaves me with a feeling that keeping track of this stuff is going to explode in complexity

@gibfahn gibfahn self-assigned this Oct 26, 2017
@al-k21
Copy link
Contributor

al-k21 commented Feb 23, 2018

Proposal:

  1. Leave CITGM as it is now to default to getting version from npm.
  2. Add an option --run-lkgr:
    • this would use lkgr version from the lookup instead of getting it from npm
  3. Add an option --update-lkgr:
    • check if the latest version from npm is the same as lkgr in the lookup:
      same version -> no need to run test
      new version -> if module succeeds then update lkgr, else if the module fails save the name
    • Raise a PR with all changes to the lookup and print a list of failing modules.

1. We would care about:

testing against what people will get when they run npm install. That is whatever the latest version of a module is.

2. We should see citgm green while doing the release if node is not breaking any module, else this would be an indicator of an potential issue with node.
3. If newer version of the module fails on the same version of node as the older version of the same module passes -> this would be an indicator of a problem with a module.

There is much more to consider (lkgr per platform/node version, etc.), but having cigm green when doing the release would be ideal.

@MylesBorins
Copy link
Contributor

So fwiw I'm still not 100% sold on this, but I really like the approach. The LKGR approach is particularly interested, but I do fear the complexity of adding platforms / versions and the lookup exploding in size.

One idea I was kicking around was updating the CI job to run both the No-Build job and the Latest job and compare the differences.

TBH most of the failures we see are flakes / timeouts that are specific to infra... it is really hard to lock those down

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants