Proposal: Lock module version in lookup.json #407

BethGriggs · 2017-05-12T15:36:46Z

As we test more modules in CitGM, triaging failures does not seem to be scaling.

With any failure we need to work out whether it was caused by:

the change in Node we're testing
the module/module test suite being updated
flaky tests

To make life easier I would suggest that we:

Lock the modules to specific (known working) versions in the lookup.json (with appropriate flaky tags if necessary).
Allow module authors to raise PRs to update the versions (when they feel like it). We can update modules ourselves in batches too (perhaps at weekly or monthly intervals).

This means that instead of constantly having to stay on top of all of the module/modules test suite updates and regularly having to mark some of these tests as flaky, we could do this in batches at a regular interval.

The text was updated successfully, but these errors were encountered:

jasnell · 2017-05-12T15:45:30Z

Yeah, moving targets are never fun. I'm +1 on this.

gibfahn · 2017-05-12T15:46:29Z

@MylesBorins I'd be interested to know your thoughts on this.

I'm +1 on this. If we can encourage module authors to update their modules here then that would be an added bonus.

MylesBorins · 2017-05-12T19:59:39Z

I'm still very -1 on this. The whole point of citgm is to find failures We have no clear mechanism to upgrade modules, or know when they need to be upgraded. It is inconvenient when they break, but isn't that kind of the point?

…

On May 12, 2017 11:46 AM, "Gibson Fahnestock" ***@***.***> wrote: @MylesBorins <https://github.com/mylesborins> I'd be interested to know your thoughts on this. I'm +1 on this. If we can encourage module authors to update their modules here then that would be an added bonus. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#407 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAecV8NF0IiGSvaTSebZHUK1hbivdbYoks5r5H7WgaJpZM4NZcso> .

jasnell · 2017-05-12T20:23:44Z

@MylesBorins ... how would you feel about an approach that used two lookup tables in CI... one with a last-known-good configuration of modules, and one with the current-set? Running differentials between those could help us track down regressions quite easily.

MylesBorins · 2017-05-12T20:32:05Z

Tbqh I'm still not sure that is super useful, and it sounds like quite a bit of work to implement / maintain. We should likely focus our efforts on finding out why our CI infra has false negatives that don't exist on other machines

…

On May 12, 2017 4:23 PM, "James M Snell" ***@***.***> wrote: @MylesBorins <https://github.com/mylesborins> ... how would you feel about an approach that used two lookup tables in CI... one with a last-known-good configuration of modules, and one with the current-set? Running differentials between those could help us track down regressions quite easily. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#407 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAecV9zvBpw0vpx5qCJtPRJgdgpN4wVfks5r5L_QgaJpZM4NZcso> .

gdams · 2017-05-12T20:46:00Z

I am kind of with @MylesBorins on this, the whole point of citgm is that we catch module failures as soon as they are released, I agree that perhaps having two lookup tables could work though but it could get messy as you may have different versions being compatible with different platforms ?

MylesBorins · 2017-05-13T18:31:23Z

OK so I'm thinking about this a bit more and thinking specifically about the high level problem... people don't want to use CITGM because of the number of false negatives, and the fact that it is not intuitive to read the results

In the past I had been more on top of keeping the flakyness of the lookup table up to date. I would run CITGM on each release line, review the results, and update the lookup as appropriate. An obvious solution would be to make this a task that people sign up for... (or possibly an expected part of using CITGM if you experience flakyness).

Another problem is knowing if something is truly flaky. @gdams had started work on a stress test feature for CITGM that would make it easier for us to run a module multiple times to figure out just how flaky they were.

Perhaps we can even find a way to automate the above process, or make it far easier to verify if a module is flaky or now (multiple failures auto updates the lookup for example).

@BethGriggs thanks for bringing this up! Obviously there is push back from the collaborators on using this tool regularly and we should definitely work on figuring out how to make it more user friendly

gibfahn · 2017-05-13T20:37:31Z

The whole point of citgm is to find failures

the whole point of citgm is that we catch module failures as soon as they are released

I disagree pretty strongly with this. The whole point of CitGM is to smoke-test whether Node.js core changes break the community. The purpose of CitGM isn't to fix module issues, the modules we test have their own CI.

It is inconvenient when they break, but isn't that kind of the point?

CitGM is not useful unless it's green. I don't think most breaking changes will be discovered only on the absolute latest module versions, so being a month or two (at most) behind the latest module should be fine (if anything older module versions probably use more deprecated features).

and it sounds like quite a bit of work to implement / maintain.

Implementing should just be a version in the lookup.json, maintaining means running with the latest versions and PRing said update. We could have a tool that auto-updates the lookup with the latest version if it passes.

We should likely focus our efforts on finding out why our CI infra has false negatives that don't exist on other machines

Sure, but we have to do that anyway, this is just one thing we can do to try to insulate people running CitGM on their PRs in core from having to deal with transient module issues.

What I tend to see is someone running CitGM, getting a bunch of failures, and then pinging @nodejs/citgm to ask whether they're expected. And I don't think we've been very good at answering those people.

IMO the possible reasons we might not want to do this are:

Updating the versions will be more effort than fixing issues
- Locking the versions means that CitGM stays green while we resolve any issues.
There aren't enough module issues for it to be worth it
- I'd say there are.

MylesBorins · 2017-05-13T23:15:57Z

I disagree pretty strongly with this. The whole point of CitGM is to smoke-test whether Node.js core changes break the community. The purpose of CitGM isn't to fix module issues, the modules we test have their own CI.

When I run CITGM on a release I care that we are testing against what people will get when they run npm install. That is whatever the latest version of a module is. If it is broken, it is broken

gibfahn · 2017-05-14T12:16:42Z

When I run CITGM on a release I care that we are testing against what people will get when they run npm install. That is whatever the latest version of a module is.

I would assume the vast majority of installs come from a module range specified in a package.json. Have any of the issues that CitGM has turned up come from the most recent version of a module?

If it is broken, it is broken

Okay, but if it is broken we just skip it in the lookup. It's not like we hold the release until all the modules are fixed.

If we had a mechanism to automatically update the lookup, raise a PR, and run CI on the latest release (e.g. v7.10.0 right now) every week, would that ease the issue? It could even be triggered before a release.

MylesBorins · 2017-05-17T13:15:30Z

@gibfahn I think we can definitely introduce a more aggressive policy for skipping modules. Technically at this point any collaborator can make changes to the lookup without requiring any sign off.

What we really need is regular audits of the release lines with CITGM, where we ignore flakes and keep the lookup up to date. This could potentially be automated

gdams · 2017-05-22T08:22:46Z

#321 is a start to keeping our lookup up to date

refack · 2017-05-22T12:27:14Z

IMHO multiple tables makes sense. A sparse one for the general case, and a locked version one for each release line. This will allow us to differentiate our regressions and modules' regressions.
A version specific lookup.json could sit in https://github.com/nodejs/node/tree/master/test and get PRs that get peer reviewed and semantic versioning...

MylesBorins · 2017-05-22T15:25:27Z

I'm a huge -1 on multiple tables. lookup is the baseline and it is what we run off of. A separate table does not solve my concerns above regarding not keeping up to date... it simply doesn't enforce them in the main citgm.

I think the wiki has been a really good first take fwiw

refack · 2017-05-22T18:28:00Z

I think the wiki has been a really good first take fwiw

Thanks!
The only caveat is that you still need to know what's what, so I get people who ask me "is my CITGM run Ok?" then I essentially do a second manual lookup filter (the difference is I have version context).

BridgeAR · 2017-08-29T08:18:49Z

I personally think that CITGM would have a much higher value if it would be green in average even if the versions tested are not all up to date. The point is that they should still reflect most of the user basis even if they are older. Let us get more modules in and have a better guarantee because of that and not because of the newest version.

We should use a lookup table for the newest versions though e.g. once each two month.
Ideally CITGM could use something like greenkeeper.io but I do not see a trivial way to implement that. Even though it would also not help against flaky test suites.

mhdawson · 2017-10-24T14:29:46Z

I agree we need to figure out how to let more people run this usefully. To be useful it needs to be green most of the time when people need/want to run it. It would be interesting to see if a version with fixed module versions would be more stable or not.

Running something like a "stable" and "canary" where new versions of modules get promoted from "canary" to "stable" once they have passed for some period of time is a model that might balance the need to test new versions of modules while also allowing it to be more broadly used.

MylesBorins · 2017-10-24T14:40:56Z

I once again am going to push back on this.

As the person who is doing the majority of the work with CITGM, the majority of the updates to the lookup.json and is one of the primary people using the tool day to day. I am not convinced that we would keep the lookup table up to date and am concerned that it would fall back on me to maintain it.

We do get failures, and it is a mixed bag of reasons.

Currently the majority of failures have been due to flakes or infra issues on our side. In fact, the 6.x line is for the most part green on citgm.

All of that being said. If someone is willing to do the work to implement installing specific versions I am willing to give it a try to see if it improves things, landing commits and testing workflow is not a massive burden. One thing to keep in mind is that we should like support tagged Majors only as a non trivial number of npm modules are installed to auto-update to the latest minor

gibfahn · 2017-10-24T15:45:29Z

I am not convinced that we would keep the lookup table up to date and am concerned that it would fall back on me to maintain it.

Yeah, this is definitely the core issue. I would say that I think the maintenance burden is likely to be the same either way (modules break pretty frequently in my experience), especially if we could get some automation.

However we might also fall into a false sense of security, where we think everything's fine because some ancient version of a module is green, but actually latest is broken (and we just never update the lookup version because the latest one is broken).

Alternative proposal

How about we do something more on the lines of:

Have an 'lkgr' version in the lookup. First run tests on latest, and if the tests fail there then fall back to the lkgr. That way we should be able to quickly see that

Fail, Pass => Module broken by module update
Fail, Fail => Module broken by node update

Obviously it won't work if the module is flaky, but nothing works if the module is flaky, so 🤷‍♂️ .

MylesBorins · 2017-10-24T15:57:56Z

I like the proposal of making this a fall back rather than a default. How would you propose we keep the lookup updated? How do we handle lkgr being different across release lines and platforms?

…

On Oct 24, 2017 11:45 AM, "Gibson Fahnestock" ***@***.***> wrote: I am not convinced that we would keep the lookup table up to date and am concerned that it would fall back on me to maintain it. Yeah, this is definitely the core issue. I would say that I think the maintenance burden is likely to be the same either way (modules break pretty frequently in my experience), especially if we could get some automation. However we might also fall into a false sense of security, where we think everything's fine because some ancient version of a module is green, but actually latest is broken (and we just never update the lookup version because the latest one is broken). Alternative proposal How about we do something more on the lines of: Have an 'lkgr' version in the lookup. First run tests on latest, and if the tests fail there then fall back to the lkgr. That way we should be able to quickly see that - Fail, Pass => Module broken by module update - Fail, Fail => Module broken by node update Obviously it won't work if the module is flaky, but nothing works if the module is flaky, so 🤷‍♂️ . — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#407 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAecV_hNuuU_X8qSXF3UQE3e3kuooWy0ks5svgYagaJpZM4NZcso> .

gibfahn · 2017-10-24T17:24:31Z

would you propose we keep the lookup updated?

Ideally we'd have

How do we handle lkgr being different across release lines and platforms
I can definitely see that snowballing out of control. I think for platforms we should just control it with "lkgr for all non-flaky platforms".

For release lines we might need multiple versions sometimes.

MylesBorins · 2017-10-24T17:59:06Z

For release lines we might need multiple versions sometimes.

this leaves me with a feeling that keeping track of this stuff is going to explode in complexity

al-k21 · 2018-02-23T18:24:34Z

Proposal:

Leave CITGM as it is now to default to getting version from npm.
Add an option --run-lkgr:
- this would use lkgr version from the lookup instead of getting it from npm
Add an option --update-lkgr:
- check if the latest version from npm is the same as lkgr in the lookup:
  same version -> no need to run test
  new version -> if module succeeds then update lkgr, else if the module fails save the name
- Raise a PR with all changes to the lookup and print a list of failing modules.

1. We would care about:

testing against what people will get when they run npm install. That is whatever the latest version of a module is.

2. We should see citgm green while doing the release if node is not breaking any module, else this would be an indicator of an potential issue with node.
3. If newer version of the module fails on the same version of node as the older version of the same module passes -> this would be an indicator of a problem with a module.

There is much more to consider (lkgr per platform/node version, etc.), but having cigm green when doing the release would be ideal.

MylesBorins · 2018-02-23T18:44:10Z

So fwiw I'm still not 100% sold on this, but I really like the approach. The LKGR approach is particularly interested, but I do fear the complexity of adding platforms / versions and the lookup exploding in size.

One idea I was kicking around was updating the CI job to run both the No-Build job and the Latest job and compare the differences.

TBH most of the failures we see are flakes / timeouts that are specific to infra... it is really hard to lock those down

gibfahn mentioned this issue May 22, 2017

Enforce citgm for major changes nodejs/CTC#126

Closed

gibfahn self-assigned this Oct 26, 2017

This was referenced Feb 28, 2018

add: --last-known-good-release option #551

Closed

WIP: add --lkgr and --ulkgr options #552

Closed

BethGriggs mentioned this issue Jan 3, 2020

CITGM Jenkins jobs by release line #771

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Lock module version in lookup.json #407

Proposal: Lock module version in lookup.json #407

BethGriggs commented May 12, 2017

jasnell commented May 12, 2017

gibfahn commented May 12, 2017

MylesBorins commented May 12, 2017 via email

jasnell commented May 12, 2017

MylesBorins commented May 12, 2017 via email

gdams commented May 12, 2017

MylesBorins commented May 13, 2017 •

edited

Loading

gibfahn commented May 13, 2017

MylesBorins commented May 13, 2017

gibfahn commented May 14, 2017 •

edited

Loading

MylesBorins commented May 17, 2017

gdams commented May 22, 2017

refack commented May 22, 2017

MylesBorins commented May 22, 2017

refack commented May 22, 2017

BridgeAR commented Aug 29, 2017

mhdawson commented Oct 24, 2017

MylesBorins commented Oct 24, 2017

gibfahn commented Oct 24, 2017

MylesBorins commented Oct 24, 2017 via email

gibfahn commented Oct 24, 2017

MylesBorins commented Oct 24, 2017 •

edited

Loading

al-k21 commented Feb 23, 2018

MylesBorins commented Feb 23, 2018

Proposal: Lock module version in lookup.json #407

Proposal: Lock module version in lookup.json #407

Comments

BethGriggs commented May 12, 2017

jasnell commented May 12, 2017

gibfahn commented May 12, 2017

MylesBorins commented May 12, 2017 via email

jasnell commented May 12, 2017

MylesBorins commented May 12, 2017 via email

gdams commented May 12, 2017

MylesBorins commented May 13, 2017 • edited Loading

gibfahn commented May 13, 2017

MylesBorins commented May 13, 2017

gibfahn commented May 14, 2017 • edited Loading

MylesBorins commented May 17, 2017

gdams commented May 22, 2017

refack commented May 22, 2017

MylesBorins commented May 22, 2017

refack commented May 22, 2017

BridgeAR commented Aug 29, 2017

mhdawson commented Oct 24, 2017

MylesBorins commented Oct 24, 2017

gibfahn commented Oct 24, 2017

Alternative proposal

MylesBorins commented Oct 24, 2017 via email

gibfahn commented Oct 24, 2017

MylesBorins commented Oct 24, 2017 • edited Loading

al-k21 commented Feb 23, 2018

Proposal:

MylesBorins commented Feb 23, 2018

MylesBorins commented May 13, 2017 •

edited

Loading

gibfahn commented May 14, 2017 •

edited

Loading

MylesBorins commented Oct 24, 2017 •

edited

Loading