New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky failure mode when running tests #1874

Open
maherbeg opened this Issue Oct 5, 2016 · 38 comments

Comments

Projects
None yet
@maherbeg

maherbeg commented Oct 5, 2016

Do you want to request a feature or report a bug?
Bug

What is the current behavior?
Very rarely (as in we've only seen this happen once so far), we see a failure when running a test like so

FAIL __tests/test-file.js
   ● Test suite failed to run

     b.js:94
                         /* istanbul ignore next */_react2.def
                                                           ^^^
     SyntaxError: missing ) after argument list

       at transformAndBuildScript (node_modules/jest-runtime/build/transform.js:284:10)
       at Object.<anonymous> (a.js:5:47)

Where the line number for a.js is an ES6 import for b.js. We have no instanbul ignore comments in our code base.

What is the expected behavior?
I don't expect this strange type of failure. Because it's only happened once so far, I have nothing useful for reproduction unfortunately. I presume there's a strange race condition with the transform and test execution, but I'm not sure what.

Happy to provide any more info!

@maherbeg

This comment has been minimized.

maherbeg commented Oct 7, 2016

Also another sporadic, random test failure in transform with something that usually works

     /vendor/jquery-2.1.0.min.js:4
     }});
     ^
     SyntaxError: Unexpected token }

       at transformAndBuildScript (node_modules/jest-runtime/build/transform.js:284:10)
       at Object.<anonymous> (src/compliance/modules/subjects.js:281:739)
@cpojer

This comment has been minimized.

Contributor

cpojer commented Oct 7, 2016

seems like @Daniel15 ran into this at FB too.

@Daniel15

This comment has been minimized.

Member

Daniel15 commented Oct 7, 2016

Yeah I encountered the same thing, but very sporadically (maybe 1 in 30 or 40 test runs). It's failing at the exact same location (transform.js:284:10) every time

    /..../html/shared/react/renderers/dom/client/eventPlugins/SimpleEventPlugin.js:274
    }});
      ^
    SyntaxError: Unexpected token )

      at transformAndBuildScript (scripts/jest-config/node_modules/jest-runtime/build/transform.js:284:10)
      at Object.inject (html/shared/react/renderers/dom/shared/ReactDefaultInjection-core.js:61:19)
      at Object.<anonymous> (html/shared/react/renderers/dom/ReactDOM-core.js:28:34)
@aaronabramov

This comment has been minimized.

Member

aaronabramov commented Oct 13, 2016

the only race condition that might take place i can think of is writing the same transformed file to cache at the same time from two different child processes.

fs.WriteFileSync(cacheFilename, transformedSource)

can that be the case?

@neoziro

This comment has been minimized.

Contributor

neoziro commented Dec 24, 2016

I have the issue on my repository, it happens very very often:

https://travis-ci.org/neoziro/recompact/builds/186529118

Not in local, but on travis, a race condition. I will try to fix it in jest, hints are welcome.

@neoziro

This comment has been minimized.

Contributor

neoziro commented Dec 27, 2016

I found a workaround to fix it. Force execution in the same thread using -i or --runInBand option.

https://facebook.github.io/jest/docs/troubleshooting.html#tests-are-extremely-slow-on-docker-and-or-continuous-integration-server

sapegin added a commit to styleguidist/react-styleguidist that referenced this issue Jan 6, 2017

@michaelAndersonCampingWorld

This comment has been minimized.

michaelAndersonCampingWorld commented Feb 2, 2017

I get this same error when in CI. I believe it's exacerbated by slow CPUs. I only see the problem locally when my laptop is being thermally throttled, but I see it quite a bit on codeship where run times are orders of magnitude longer than they are locally.

Test suite failed to run

/home/rof/src/github.com/GoodSamEnterprises/RVCare/node_modules/asn1/lib/ber/writer.js:210
encodeOctet(bytes,parseIn
^^^^^^^
SyntaxError: missing ) after argument list

at transformAndBuildScript (node_modules/jest-runtime/build/transform.js:320:12)
at Object.<anonymous> (node_modules/asn1/lib/ber/index.js:7:12)
at Object.<anonymous> (node_modules/asn1/lib/index.js:6:9)
@jakubzitny

This comment has been minimized.

jakubzitny commented Apr 7, 2017

Any update on this, anyone? We're still getting this on CI, it's quite annoying.. 😭

@aaronabramov

This comment has been minimized.

Member

aaronabramov commented Apr 7, 2017

i tried to debug it and stress test with multiple concurrent writes and wasn't able to reproduce. What i noticed is that it almost never happens on hot cache. What happens on our CI servers is that we get this failure on the first day after we check in a new test, and then it goes away after a few days.

@jakubzitny

This comment has been minimized.

jakubzitny commented Apr 8, 2017

Thanks for the info @dmitriiabramov. How do you cache on CI though? We have a fresh docker image for each test job on (GitLab) CI. I am not sure how to cache there, do you have any suggestions on that, please?

We deal with this by manually retrying the test that fails on this. It happens roughly 1 in 10 CI pipelines. Not sure if it's related to CI runner load.

I saw that you added logTransformErrorsconfig in #2159. Did it reveal anything interesting or not at all? I am open to help with further debugging this if you point me in the right direction.

targos added a commit to image-js/image-js that referenced this issue Apr 26, 2017

@Vanuan

This comment has been minimized.

Contributor

Vanuan commented May 8, 2017

I have the following flaky error:

SyntaxError: Unexpected token ILLEGAL

at transformAndBuildScript (node_modules/jest-runtime/build/transform.js:320:12)
at Object.<anonymous> (node_modules/lodash-es/lodash.js:156:62)

So might it be related to using the --coverage flag?

@thymikee

This comment has been minimized.

Collaborator

thymikee commented May 8, 2017

This is unrelated. It seems like lodash-es is not compiled to ES5, therefore you need to either import regular lodash or whitelist lodash-es so Jest can compile it with babel or other compiler of your choosing, look here for details http://facebook.github.io/jest/docs/en/tutorial-react-native.html#transformignorepatterns-customization

@Vanuan

This comment has been minimized.

Contributor

Vanuan commented May 8, 2017

@thymikee This doesn't explain why it's flaky.

I have the following babel transform:

var babel = require('babel-jest').createTransformer({
  babelrc: false,
  presets: [
    'react',
    ['es2015', {
       // import statements are supported since node 7
       // TODO switch to false after upgrade
       'modules': 'commonjs'
    }],
    'stage-1'
  ],
  plugins: ['transform-decorators-legacy']
});
module.exports = babel;

And it is added to exceptions:

"transformIgnorePatterns": ["/node_modules\/(?!(react-tree-menu|lodash-es)).*/"],

And it is not specific to lodash. I have this in ./src folder too.

@thymikee

This comment has been minimized.

Collaborator

thymikee commented May 8, 2017

Oh, sorry then!

@Vanuan

This comment has been minimized.

Contributor

Vanuan commented May 8, 2017

BTW the line numbers are quite strange:

node_modules/lodash-es/lodash.js:156:62

Does it do transform in memory?

@Vanuan

This comment has been minimized.

Contributor

Vanuan commented May 11, 2017

@dmitriiabramov

the only race condition that might take place i can think of is writing the same transformed file to cache at the same time from two different child processes.

fs.WriteFileSync(cacheFilename, transformedSource)
can that be the case?

Oh, so does it write to file from multiple processes without any mutexes? This will definitely cause corruption!

@Vanuan

This comment has been minimized.

Contributor

Vanuan commented May 11, 2017

Maybe no-cache will help?

Retry with --no-cache. Jest caches transformed module files to speed up test execution. If you are using your own custom transformer, consider adding a getCacheKey function to it: getCacheKey in Relay.

I'm using require('babel-jest').createTransformer. Maybe that one has the bug.

@Vanuan

This comment has been minimized.

Contributor

Vanuan commented May 11, 2017

Looks like the default cache directory is /tmp:
https://facebook.github.io/jest/docs/configuration.html#cachedirectory-string

That would explain why it is reproducible on CI much frequently. So to reproduce you should run two Jest processes with exactly the same settings. And don't forget to clean cache before that.

@jakubzitny

This comment has been minimized.

jakubzitny commented May 11, 2017

@Vanuan We have Docker on CI — everything runs in its own container so there are not multiple processes writing to the same file. And we do see this on CI frequently.

@Vanuan

This comment has been minimized.

Contributor

Vanuan commented May 11, 2017

@jakubzitny Each jest run creates multiple processes unless you specify --runInBand.

I'm just pointing out why you can't reproduce this locally: the default cache directory is in /tmp. So unless you clean it up before each run you wouldn't be able reproduce.

I'm commenting this:

What happens on our CI servers is that we get this failure on the first day after we check in a new test, and then it goes away after a few days.

Looks like they use a global cache directory at facebook's CI servers. I.e. /tmp or whatever cache directory used is never cleaned. That would speed up things but hide such race condition bugs.

I also use docker. But I mount container's /tmp to host /tmp. Maybe you do this as well?

In either case there's a race condition when cache doesn't exist.

@Vanuan

This comment has been minimized.

Contributor

Vanuan commented May 12, 2017

Unfortunately, there's no file locking in node, so the use of thirdparty libraries is required. The cross-platform alternative is TCP port locking but that would be more troublesome for user.

@Vanuan

This comment has been minimized.

Contributor

Vanuan commented May 12, 2017

Probably the best thing we can do now is --no-cache. But there should be a comparison whether --runInBand is slower.

@Vanuan

This comment has been minimized.

Contributor

Vanuan commented May 12, 2017

Comparison for my set up:

  1. flaky: 50 seconds
  2. no cache: 110 seconds
  3. run in band: out of memory (looks like my tests are leaking memory)
@zamotany

This comment has been minimized.

Contributor

zamotany commented Jul 6, 2017

I'm also having similar issue with the exception being this it is run in monorepo (lerna). If jest is installed in each package separately, it's working fine, but if run from parent directory (monorepo directory) it fails when transforming files. You can see the output from CI here: https://circleci.com/gh/TrueCar/gluestick/2112 (I'm running it with --no-cache and -i).

Possible workaround:
It works with a custom transformer.
In jest config add:

"transform": {
  "\\.js$": "<rootDir>/jestTransform.js"
},

and in jestTransform.js:

const babel = require('babel-core');
const fs = require('fs');

function getBabelRc(path) {
  const pathToBabelRc = '<path-to-babel-rc>';
  return Object.assign({ retainLines: true }, JSON.parse(fs.readFileSync(pathToBabelRc)));
}

module.exports = {
  process(src, path) {
    if (/\.js$/.test(path)) {
      return babel.transform(src, getBabelRc(path));
    }
    return src;
  },
};

@jgreubel

This comment has been minimized.

jgreubel commented Jul 14, 2017

Just had my first apparent flaky failure (first in 6~ months). Here's the error I received:

Summary of all failing tests
FAIL  <test-name-here.spec.js>
● Test suite failed to run

 var /* istanbul ignore
      ^^^^^^^^^^^^^^^^^^
  
  SyntaxError: Invalid or unexpected token
    
    at ScriptTransformer._transformAndBuildScript (node_modules/jest-runtime/build/ScriptTransformer.js:289:17)
    at Object.<anonymous> (src/b.js:5:46)
    at Object.<anonymous> (src/a.js:50:600)

I would also guess that the line on a.js in the stack trace would be an ES6 import for an export in b.js

My jest config:

{
  "collectCoverage" : true,
  "collectCoverageFrom" : ["src/**/*.js", "!src/index.js", "!src/common/Colors.js", "!src/polyfills.js"],
  "coverageReporters" : ["html"],
  "setupTestFrameworkScriptFile": "<rootDir>/src/polyfills.js",
  "moduleNameMapper": {
    "\\.png$": "<rootDir>/src/common/FileMock.js"
  }
}

jest: v20.0.4
babel-jest: v20.0.3

@jeanlauliac

This comment has been minimized.

Contributor

jeanlauliac commented Jul 20, 2017

the only race condition that might take place i can think of is writing the same transformed file to cache at the same time from two different child processes.

fs.WriteFileSync(cacheFilename, transformedSource)

can that be the case?

Yes. Or more like, if a process B is reading a file while A is writing, then B will get a corrupted/partial file in memory. That could easily explain syntax errors that are observed.

There is no need for a mutex to read/write to the cache, because we can leverage rename(3) (and equivalents) as being an atomic operation. Writes should be atomic, and reads should verify the validity of the data against a hash.

Using a lock like in #3561 can work too but being more complicated, is susceptible to additional concurrency issues. Besides, reading the data also needs to lock the same mutex, because you want to make sure you read fully-written files. Regardless, a checksum needs to be added to verify the validity of the data.

@Villar74

This comment has been minimized.

Villar74 commented Jul 24, 2017

tried to fix this, but i cant understand what's wrong
`
Test suite failed to run

/home/makswork/is74/lkapp/node_modules/react-native/Libraries/BatchedBridge/BatchedBridgedModules/NativeModules.js: Unexpected token, expected , (17:35)
    15 | const RemoteModules = BatchedBridge.RemoteModules;
    16 | 
  > 17 | function normalizePrefix(moduleName: string): string {
       |                                    ^
    18 |   return moduleName.replace(/^(RCT|RK)/, '');
    19 | }
@Vanuan

This comment has been minimized.

Contributor

Vanuan commented Jul 24, 2017

@Villar74 Doesn't look like related to this issue. It looks like you're trying to run jest without babel transformation.

@Villar74

This comment has been minimized.

Villar74 commented Jul 25, 2017

@Vanuan Thx, i just need some help with setup jest

@cpojer

This comment has been minimized.

Contributor

cpojer commented Jul 26, 2017

Please feel free to upgrade to jest@20.1.0-delta.3 which should contain a fix for this issue thanks to @jeanlauliac.

@cpojer cpojer closed this Jul 26, 2017

@jeanlauliac

This comment has been minimized.

Contributor

jeanlauliac commented Jul 26, 2017

I think we could release a new revision for jest-runtime@20.0 so that we get the fix everywhere. I'm a little stressed this could cause regression, but at worst we can always revert.

@Daniel15

This comment has been minimized.

Member

Daniel15 commented Jul 27, 2017

@jeanlauliac - Thank you so much for fixing this! It's been broken for a while and I still sporadically hit the same problem, so I really appreciate it 😃

@aaronabramov

This comment has been minimized.

Member

aaronabramov commented Aug 8, 2017

i'm still getting a lot of these errors at fb after the latest update. Seems like it's still not fixed :(

@onetrickwolf

This comment has been minimized.

onetrickwolf commented Aug 29, 2017

Any update on this? I am getting intermittent failures on circleci (seems to be during times of load and always when max workers are set to 1). Tried updating to 20.1.0-echo.1 but does not seem to be working.

Have to rebuild several times to get tests to work even though they work every time on laptop. Any help would be appreciated.

@aaronabramov aaronabramov reopened this Aug 29, 2017

@MattCopenhaver

This comment has been minimized.

MattCopenhaver commented Aug 30, 2017

I am also getting intermittent test failures. My team first noticed this at roughly the same time as we upgraded react, react-dom, and react-addons-test-utils from 0.14.3 to 15.4.0 and enzyme from 2.4.1 to 2.9.1 (maybe a few days after). I'm not sure if this is related or not.

We have a test suite of about 2800 tests (not all are testing react components though). At least one test fails very rarely on local workstations, but fairly often on our CI server. It does seem to be related to load on the workstation or server.


Here is an example of an error message:
Method “simulate” is only meant to be run on a single node. 0 found instead.
Which is an enzyme error thrown when you try to run .simulate('click') on something that doesn't exist.

In this particular example, we run:
wrapper.find('li').simulate('click');
where wrapper is an enzyme.mount object, and get that error.

However, when I console.log(mountedDOM.debug());, the li is clearly there.

Most, if not all, of the tests that fail intermittently are trying to interact the an enzyme.mount object.


Hopefully this provides some more data that will help diagnose this. We aren't really sure if it's react, jest, enzyme, something in our project config, or something environment related that is causing this problem.

@Vanuan

This comment has been minimized.

Contributor

Vanuan commented Aug 31, 2017

That doesn't look like the original issue. It's something different, related to jsdom.
The original issue was due to filesystem write race condition. I suggest creating another issue and renaming this one to "Corrupted compile cache".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment