Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate flakey Couch DB test #4462

Closed
akhenry opened this issue Nov 15, 2021 · 8 comments
Closed

Investigate flakey Couch DB test #4462

akhenry opened this issue Nov 15, 2021 · 8 comments
Labels
type:maintenance tests, chores, or project maintenance
Milestone

Comments

@akhenry
Copy link
Contributor

akhenry commented Nov 15, 2021

Summary

A test that was recently modified for the Couch DB plugin has started failing randomly. Although it passed automated tests when originally merged, it subsequently failed in master, and then failed in other pull requests after then merged from master. Some investigation is needed to understand why it's failing.

Steps to Reproduce

  1. git checkout master
  2. Re-enable the flakey test
  3. npm run test
  4. Observe the test failure
  5. If the test does not fail, repeat step 3 until it does.
@akhenry
Copy link
Contributor Author

akhenry commented Nov 15, 2021

@scottbell Do you mind taking a look?

@scottbell
Copy link
Contributor

I'll take a look and see if I can work out what's going on. I was able to replicate it locally before the CouchDB change by checking out a previous commit:
d10561f
and run npm run test:debug until the test fails:

      ✗ updates an object (14ms)
	Error: Expected spy update to have been called.
	    at <Jasmine>
	    at eval (webpack-internal:///./src/plugins/persistence/couch/pluginSpec.js:124:45)

I suspect there's a timing issue going on here, but unclear where.

@scottbell
Copy link
Contributor

I think I found a clue. It seems OpenMCT thinks the object is already persisted between tests, even though we're resetting OpenMCT between tests:

afterEach(async () => {
        await resetApplicationState(openmct);
    });

Perhaps resetApplicationState isn't waiting for all the objects to be cleared before returning...

@scottbell
Copy link
Contributor

scottbell commented Nov 16, 2021

One save in OpenMCT kicks off a big chain of saves & mutates:

    at ObjectAPI.save (webpack-internal:///./src/api/objects/ObjectAPI.js:351:21)
    at ObjectAPI.mutate (webpack-internal:///./src/api/objects/ObjectAPI.js:491:14)
    at ObjectAPI.save (webpack-internal:///./src/api/objects/ObjectAPI.js:369:18)
    at ObjectAPI.mutate (webpack-internal:///./src/api/objects/ObjectAPI.js:491:14)
    at ObjectAPI.save (webpack-internal:///./src/api/objects/ObjectAPI.js:361:22)
    at async UserContext.eval (webpack-internal:///./src/plugins/persistence/couch/pluginSpec.js:136:28)

save can call mutate, which in turn calls save, but neither is awaiting the other, so I think this may be the source of the issue. E.g., in mutate:

if (this.isTransactionActive()) {
        this.transaction.add(domainObject);
    } else {
        this.save(domainObject);
    }

and in save:

this.mutate(domainObject, 'persisted', persistedTime);

These save calls ground out into a asynchronous queue in the CouchDB adapter.

@scottbell
Copy link
Contributor

scottbell commented Nov 17, 2021

Ok, I think I know what's going on. Per the above 👆 comment, whenever one calls save or update, this actually causes a chain of save & mutate events due to the persistedTime being mutated, which in turns calls another save, etc.:
https://github.com/nasa/openmct/blob/master/src/api/objects/ObjectAPI.js#L352
https://github.com/nasa/openmct/blob/master/src/api/objects/ObjectAPI.js#L362
Depending on how fast the Javascript engine processes everything, the second order save commands to persist the persistedTime (woof, very meta), can cause the hasAlreadyBeenPersisted to return true:
https://github.com/nasa/openmct/blob/master/src/api/objects/ObjectAPI.js#L669
as persistedTime is now greater than the original modifiedTime passed by the original save call:
https://github.com/nasa/openmct/blob/master/src/plugins/persistence/couch/pluginSpec.js#L116
OpenMCT now thinks the Object has already been persisted, and thus the original save returns without calling the provider.update function.
A quick and dirty fix just for the test is to set this modified time well in the future, e.g.:

mockDomainObject.modified = Date.now() + 1000;

But I'm wondering if we've discovered a bug. Namely, are we doing the right thing with persistedTime? Shouldn't this be an atomic transaction? Or maybe just something the provider handles so the subsequent save/mutate cycles doesn't start?

What do you think @akhenry?

@scottbell
Copy link
Contributor

@akhenry If we're happy with the dirty fix, I have a PR for it here:
#4473

@scottbell
Copy link
Contributor

@shefalijoshi this can be closed per our discussion.

@shefalijoshi
Copy link
Contributor

Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:maintenance tests, chores, or project maintenance
Projects
None yet
Development

No branches or pull requests

5 participants