New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jest-runtime: atomic cache write, and check validity of data #4088

Merged
merged 1 commit into from Jul 25, 2017

Conversation

Projects
None yet
9 participants
@jeanlauliac
Contributor

jeanlauliac commented Jul 20, 2017

This change tries to address what may be a cause for #1874, where I posted some details on the approach. By doing atomic writes we ensure there's no cache file ending up being a mix of two transformed code files, and we limit the concurrency issues of having a file read and written at the same time. A prior PR #3561 tries to address the problem using a lock, but locks bring additional complexity that this change tries to avoid (ex. deadlocks).

This change also adds a checksum, because there can be other processes still writing non-atomically to cache files. Additional, not all filesystems may support atomic renames (what atomic-write-file relies on).

This change incurs a slight performance cost that is the additional I/O call to rename the files. However I believe it is less costly than a lock solution. I don't think this should have much effect even on processing several thousands of files.

Test plan

It is quite challenging to test for concurrency issues, so this change relies on the knowledge that writeFileSync is not atomic and that corruption could have the observed effects. I rely on the existing automated testing to ensure that the caching behaviour and jest-runtime in general is working as expected.

@jeanlauliac

This comment has been minimized.

Contributor

jeanlauliac commented Jul 20, 2017

Tests are failing, I'll fix!

@jeanlauliac jeanlauliac requested a review from cpojer Jul 20, 2017

@jeanlauliac jeanlauliac force-pushed the jeanlauliac:atomic-cache-writes branch from fc68a1d to bef5880 Jul 20, 2017

@cpojer cpojer referenced this pull request Jul 20, 2017

Closed

Fix #1874 race condition #3561

@jeanlauliac jeanlauliac force-pushed the jeanlauliac:atomic-cache-writes branch from bef5880 to f102dba Jul 20, 2017

@cpojer

cpojer approved these changes Jul 20, 2017

Approving, pending the issues with CI.

@@ -43,6 +44,9 @@ const configToJsonMap = new Map();
// Cache regular expressions to test whether the file needs to be preprocessed
const ignoreCache: WeakMap<ProjectConfig, ?RegExp> = new WeakMap();
// To reset the cache for specific changesets (rather than package version).
const CACHE_BREAKER = '1';

This comment has been minimized.

@BYK

BYK Jul 21, 2017

Member

CACHE_VERSION may be a better name

@jeanlauliac

This comment has been minimized.

Contributor

jeanlauliac commented Jul 24, 2017

Looking into this today. Not sure why it fails CI still

@jeanlauliac jeanlauliac force-pushed the jeanlauliac:atomic-cache-writes branch from 9752fe4 to 91a5f31 Jul 24, 2017

@jeanlauliac

This comment has been minimized.

Contributor

jeanlauliac commented Jul 24, 2017

I don't know how it's possible, but apparently adding write-file-atomic as a dependency affects other, unrelated dependencies, making the test to break because the coverage file is written a bit differently. Maybe it's an unsafe test in a sense it reads "private" data? I'm not familiar with the coverage-final.json file. @cpojer normally this version should now pass CI. But, I had to fix the unrelated test, do you have context about it by any chance?

@@ -29,7 +29,7 @@ it('maps code coverage against original source', () => {
// reduce absolute paths embedded in the coverage map to just filenames
Object.keys(coverageMap).forEach(filename => {
coverageMap[filename].path = path.basename(coverageMap[filename].path);
coverageMap[filename].data.path = path.basename(coverageMap[filename].data.path);

This comment has been minimized.

@jeanlauliac

jeanlauliac Jul 24, 2017

Contributor

That's the shady change. It's like we were using a completely different version of the "coverage module", whichever it is; except that this changeset doesn't make any change related to coverage. See also how the snap file is pretty different.

@jeanlauliac

This comment has been minimized.

Contributor

jeanlauliac commented Jul 24, 2017

I changed my mind, I think my code genuinely breaks something. Trying to figure it out.

@jeanlauliac

This comment has been minimized.

Contributor

jeanlauliac commented Jul 24, 2017

Got it, it's because we use return paths from inside the jest cache itself for the source maps. When code outside tries to read these files, naturally it fails because of the hash header. To fix this we can write hashes to different files. The problem is that it'll cause the number of files to double, something I'm not too sure is acceptable for some of our big repos.

EDIT: actually it's not acceptable because when reading source maps we should also check the hash.

@cpojer

This comment has been minimized.

Contributor

cpojer commented Jul 25, 2017

What's the action plan there?

@jeanlauliac

This comment has been minimized.

Contributor

jeanlauliac commented Jul 25, 2017

Ideally, get rid of the code that return a path, and return a memory-based hash table instead. But, this will force us to publish a new major release of jest-runtime I think, because that'll be a breaking change.

To avoid a breaking change, we can just write the source map to a temporary file, but this will clutter more and more the temporary file and it never gets cleaned.

@cpojer

This comment has been minimized.

Contributor

cpojer commented Jul 25, 2017

I'm fine with the breaking change. Since the transformer in jest-runtime includes the version number, it shouldn't really matter from any one version to the next what the transform cache looks like. Do you think you could work on this tomorrow so we can tag a release and unblock shipping a bunch of fixes? :)

@jeanlauliac

This comment has been minimized.

Contributor

jeanlauliac commented Jul 25, 2017

Yes, I'll ship that tomorrow. I'll reduce this PR's scope so that it only has the atomic cache write, that is urgent to ship soon, and I'll do the checksums and the source map fixes in another PR.

@jeanlauliac jeanlauliac force-pushed the jeanlauliac:atomic-cache-writes branch from 91a5f31 to c5f9768 Jul 25, 2017

@jeanlauliac

This comment has been minimized.

Contributor

jeanlauliac commented Jul 25, 2017

I figured out a way to do it without breaking change. The source maps will not be covered by the checksum verification, but to make them covered we'd have to change the model all across the stack I think, that'd take some more research and time. The transformed code itself is the most important thing that I hope this changeset gets right.

I'll land once CircleCI shows green.

@jeanlauliac jeanlauliac force-pushed the jeanlauliac:atomic-cache-writes branch from c5f9768 to edde90d Jul 25, 2017

@codecov-io

This comment has been minimized.

codecov-io commented Jul 25, 2017

Codecov Report

Merging #4088 into master will increase coverage by 0.04%.
The diff coverage is 88.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4088      +/-   ##
==========================================
+ Coverage   60.32%   60.37%   +0.04%     
==========================================
  Files         195      195              
  Lines        6757     6768      +11     
  Branches        6        6              
==========================================
+ Hits         4076     4086      +10     
- Misses       2678     2679       +1     
  Partials        3        3
Impacted Files Coverage Δ
packages/jest-runtime/src/script_transformer.js 89.92% <88.23%> (+0.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ffd43f7...edde90d. Read the comment docs.

@jeanlauliac jeanlauliac merged commit 2607f2b into facebook:master Jul 25, 2017

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@screendriver

This comment has been minimized.

Contributor

screendriver commented Jul 26, 2017

@jeanlauliac thank you for the great work! Any chance to make a hotfix patch release v20.0.5 for that? We are encountering flaky unit tests (#1874) all the time in our CI and can't really rely on Jest at the moment 😕

@thymikee

This comment has been minimized.

Collaborator

thymikee commented Jul 26, 2017

@screendriver if I'm not mistaken, it should already be available under jest@test tag (there may be some breaking changes for you)

@jeanlauliac

This comment has been minimized.

Contributor

jeanlauliac commented Jul 28, 2017

@screendriver did you try to upgrade to jest@test, and did it work for you?

@screendriver

This comment has been minimized.

Contributor

screendriver commented Jul 28, 2017

Unfortunately I could not test it until yet because it is not welcome to use unstable dependencies in our product. I try to test it in a separate branch and give you feedback about it.

@screendriver

This comment has been minimized.

Contributor

screendriver commented Jul 28, 2017

Just a little update: I couldn't test it because ts-jest does not allow the test version as peerDependency. I opened an issue here kulshekhar/ts-jest#282

tushardhole pushed a commit to tushardhole/jest that referenced this pull request Aug 21, 2017

@gdborton

This comment has been minimized.

Contributor

gdborton commented Aug 29, 2017

@cpojer Is there any eta on when a release will be cut that includes this fix?

@cpojer

This comment has been minimized.

Contributor

cpojer commented Aug 29, 2017

You can use jest@test for now.

@adamdicarlo

This comment has been minimized.

adamdicarlo commented Nov 1, 2017

Thank you so much for this fix!

We have a project that was manifesting this issue extremely often on CodeShip under docker. It was behaving as if input source code from node_modules was truncated on a multiple of 4096-bytes boundary, but only on a first run (no transform cache). The problem only seemed to start happening once we introduced a custom transformer, so tracking it down to finally realizing "hey wait jest has multiple processes and it's acting like it's reading a partly written cache file..." was an odyssey. (Spent an entire day on it with a coworker.) We finally came to the issue tracker with the right search terms... such a relief.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment