Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test-fs-utimes failing locally #36591

Closed
targos opened this issue Dec 21, 2020 · 28 comments
Closed

test-fs-utimes failing locally #36591

targos opened this issue Dec 21, 2020 · 28 comments
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. fs Issues and PRs related to the fs subsystem / file system.

Comments

@targos
Copy link
Member

targos commented Dec 21, 2020

  • Version: master, v15.x
  • Platform: Linux 4.18.0-240.1.1.el8_3.x86_64 #1 SMP Thu Nov 19 17:20:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Subsystem: fs

What steps will reproduce the bug?

Run node test/parallel/test-fs-utimes.js

How often does it reproduce? Is there a required condition?

Every time

What is the expected behavior?

The test should pass

What do you see instead?

$ node test/parallel/test-fs-utimes.js 
node:assert:119
  throw new AssertionError(obj);
  ^

AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
+ actual - expected

+ 2147483647
- 2147483648
           ^
    at Object.<anonymous> (/home/mzasso/git/nodejs/node/test/parallel/test-fs-utimes.js:169:10)
    at Module._compile (node:internal/modules/cjs/loader:1108:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1137:10)
    at Module.load (node:internal/modules/cjs/loader:973:32)
    at Function.Module._load (node:internal/modules/cjs/loader:813:14)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:76:12)
    at node:internal/main/run_main_module:17:47 {
  generatedMessage: true,
  code: 'ERR_ASSERTION',
  actual: 2147483647,
  expected: 2147483648,
  operator: 'strictEqual'
}

Additional information

I recently updated to CentOS Stream, but I don't remember if it was already failing before that.

@targos targos added the fs Issues and PRs related to the fs subsystem / file system. label Dec 21, 2020
@richardlau
Copy link
Member

FWIW I saw the same failure on RHEL 8. I don't see it on Fedora 33.

@ntedgi
Copy link
Contributor

ntedgi commented Dec 22, 2020

@targos i would like to take this opportunity

@targos
Copy link
Member Author

targos commented Dec 22, 2020

Sure, feel free to take it!

@PoojaDurgad PoojaDurgad added the flaky-test Issues and PRs related to the tests with unstable failures on the CI. label Dec 23, 2020
@ntedgi
Copy link
Contributor

ntedgi commented Dec 23, 2020

hi @targos i installed cantos streams 4.18.0-193.14.2.el8_2.x86_64 #1 SMP Sun Jul 26 03:54:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
and can't reproduce the bug i test master and v15.x branches
and i also cant manage to boot load the same kernel you describe here
this is the next version available x86_64 4.18.0-257.el8

any ideas how to proceed?

@mhdawson

This comment has been minimized.

@Trott

This comment has been minimized.

@Trott
Copy link
Member

Trott commented Mar 10, 2021

@targos Does this still fail for you every time locally? If so, I might open a different issue for the CI failures as they seem to be different and trying to resolve both in one place may be confusing.

@targos
Copy link
Member Author

targos commented Mar 10, 2021

Yes, it still fails everytime locally. I agree that the CI failures seem to be different.

@Trott
Copy link
Member

Trott commented Mar 10, 2021

I've opened #37692 for the CI failures and hidden the comments here (by @mhdawson and me) about them.

@mhdawson
Copy link
Member

@targos I see the same failure on my local RHEL boxes. I'd done some investigation and even opened a Bugzilla to get help. I was particularly confused why our CENTOS system in the CI were fine, while my local instances failed.

The answer was that not all file systems support Y2K38 (ie they will roll over versus being able to handle times beyond that point). Therefore,the test will fail if the file system used for the system does not support Y2K38 and that includes xfs which is the default for some RHEL/CENTOS installs.

A quick google does not show an easy way to test what type the file system is, so I did not have a good idea of how to fix the test and then it got lost in my backlog.

Don't have any ideas on how to address at this point, other than excluding the test which I'm not sure is the right answer.

@Trott
Copy link
Member

Trott commented Mar 11, 2021

The answer was that not all file systems support Y2K38 (ie they will roll over versus being able to handle times beyond that point). Therefore,the test will fail if the file system used for the system does not support Y2K38 and that includes xfs which is the default for some RHEL/CENTOS installs.

If that's the explanation, it's surprising that it's always off-by-one and not something bigger?

AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
+ actual - expected

+ 2147483647
- 2147483648

But if that's the explanation and this is not a bug (off by one second?), we can perhaps permit the off-by-one value as valid. That may also mean that perhaps we can add back in a bunch of the platforms that we skip:

// Test Y2K38 for all platforms [except 'arm', 'OpenBSD', 'SunOS' and 'IBMi']
if (!process.arch.includes('arm') &&
  !common.isOpenBSD && !common.isSunOS && !common.isIBMi) {

@targos
Copy link
Member Author

targos commented Mar 12, 2021

If that's the explanation, it's surprising that it's always off-by-one and not something bigger?

Yeah, I don't know why it's like that, but the behavior is that any value greater than 2**31-1 is converted to 2**31-1.

I found https://www.phoronix.com/scan.php?page=news_item&px=XFS-Linux-5.10

CentOS Stream has kernel version 4.18.

@targos
Copy link
Member Author

targos commented Mar 12, 2021

I also don't understand how it could have worked before. The test started to fail when I upgraded from CentOS 8 to CentOS Stream.

@mhdawson
Copy link
Member

@targos is the same file system being used between the CentOS 8 and CentOS Stream installations ?

@mhdawson
Copy link
Member

@Trott I think allowing off by one is equivalent to disabling the test for a platform since the test sets the value to be 1 more than is possible on a non Y2K38 compliant system. I think the skip is a better indication that the test is not applicable for a platform versus allowing it to be one off.

@Trott
Copy link
Member

Trott commented Mar 12, 2021

@Trott I think allowing off by one is equivalent to disabling the test for a platform since the test sets the value to be 1 more than is possible on a non Y2K38 compliant system. I think the skip is a better indication that the test is not applicable for a platform versus allowing it to be one off.

The problem with that approach is that we would need to skip all Linux tests just because it fails on one Linux host. I think the thing to do is allow the max - 1 value and keep the skip on those other platforms.

Trott added a commit to Trott/io.js that referenced this issue Mar 13, 2021
@targos
Copy link
Member Author

targos commented Mar 14, 2021

@targos is the same file system being used between the CentOS 8 and CentOS Stream installations ?

I suppose so. I doubt that the switch from CentOS 8 to CentOS Stream could have changed the file system.

@mhdawson
Copy link
Member

mhdawson commented Mar 15, 2021

@Trott I must be missing something if we "allow the max - 1" then the test will always pass, even in the case of a new failure....

@mhdawson
Copy link
Member

@targos if its the same file system then I'm not sure how it ever worked too.

@Trott
Copy link
Member

Trott commented Mar 15, 2021

@targos Can you run this code with a recent node and then run it again with something from the 8.x line or earlier and report the output?

const fs = require('fs');
fs.writeFileSync('/var/tmp/trott', 'fhqwhgads');
const path = '/var/tmp/trott'
const Y2K38_mtime = 2147483648;
fs.utimesSync(path, Y2K38_mtime, Y2K38_mtime);
const Y2K38_stats = fs.statSync(path);
console.log(Y2K38_mtime, Y2K38_stats.mtime.getTime() / 1000);
fs.unlinkSync('/var/tmp/trott');

@targos
Copy link
Member Author

targos commented Mar 16, 2021

@Trott

$ node -v
v15.11.0

$ node test.js
2147483648 2147483647

$ volta run --node=8 node -v
v8.17.0

$ volta run --node=8 node test.js
2147483648 2147483647

@Trott
Copy link
Member

Trott commented Mar 16, 2021

@targos Hmmm... OK, so that makes me second-guess my "allow 2*31-1" suggested hack for the test.

@mhdawson had previously comented:

A quick google does not show an easy way to test what type the file system is,

I wonder, though, if that's the path we want to go down, at least for the POSIX-like operating systems.

I would propose something like this:

  • Where TMPDIR is the value of tmpdir.path, use spawnSync() to run df -T TMPDIR
  • If the command fails (as it will on macOS and probably elsewhere), ignore the results.
  • If the command succeeded, check the file system type. If it is a file system that does not support Y2K38, skip the Y2K38 tests. (Alternative: If we know what the non-Y2K38 results are supposed to be, then assert for those rather than the "correct" values. But that's only if we want to define behavior for non-Y2K38 file systems. I don't think we want to do that. I believe we basically just report whatever libuv/the operating system tells us.)

Does that sound like a reasonable plan, @targos and @mhdawson? If so, I'll update #37707 to do something like that instead of what it's doing right now.

Of course, all this assumes that the problem is file system type support for timestamps beyond 2038 and not something else. My naive and superficial reading of a few things suggests that ext2 and ext3 do not support Y2K38 but that ext4 does. We'll need to confirm that on CI and in a few other places. (Or someone who knows better can just tell me I'm wrong.) @targos, can you share the output of df -T on your machine?

@richardlau
Copy link
Member

@Trott Maybe another solution would be to touch a file in the test tmpdir with a timestamp after 2038 (e.g. touch -t 203901020304 y2k38-test) and check the result via ls to see if the filesystem supports y2k38? If I'm understanding https://www.phoronix.com/scan.php?page=news_item&px=XFS-Linux-5.10 correctly, even after kernel 5.10 an XFS formatted filesystem may not support y2k38 unless it was formatted with "bigtime" enabled so it's not just enough to test df -T.

Adopting such a scheme could also allow us to drop some of the platform specific skips in favour of detecting errors from touch or ls -l not reporting a matching date,
e.g. on IBM i (currently skipped in the test):

-bash-4.4$ touch -t 203901020304 y2k38-test
touch: setting times of 'y2k38-test': A system call received a parameter that is not valid.
-bash-4.4$ touch -t 203801020304 y2k38-test
-bash-4.4$ ls -l y2k38-test
-rw-r--r-- 1 nodejs 0 0 Jan  2  2038 y2k38-test
-bash-4.4$

@Trott
Copy link
Member

Trott commented Mar 16, 2021

@richardlau That sounds both simpler and more reliable than what I suggested.

@mhdawson
Copy link
Member

I like @richardlau suggestion as well. They key for me is that it checks if Y2K38 is supported using a test that does not depend on Node.js and then we only run the Node.js tests if we know from the os level check that it should be able to pass the tests.

@Trott
Copy link
Member

Trott commented Mar 17, 2021

The issue I'm running into is that there's no guarantee that ls will show the date in English. And there doesn't seem to be an obvious portable way to get the timestamp in bash.

@Trott
Copy link
Member

Trott commented Mar 17, 2021

Hmmm...I wonder if I can use the date part of the output of ls -l as input to new Date() and go from there. But this is starting to feel more and more brittle.

@Trott
Copy link
Member

Trott commented Mar 17, 2021

date -r FILENAME +%s might be portable.

EDIT: It works everywhere on CI except Windows (of course) and AIX.

Trott added a commit to Trott/io.js that referenced this issue Mar 17, 2021
Move Y2K38-specific parts of test-fs-utimes to test-fs-utimes-y2K38.js.
On non-Windows, check for Y2K38 support and skip if it is unsupported.

Fixes: nodejs#36591
@Trott Trott closed this as completed in ab6c7dd Mar 18, 2021
ruyadorno pushed a commit that referenced this issue Mar 20, 2021
Move Y2K38-specific parts of test-fs-utimes to test-fs-utimes-y2K38.js.
On non-Windows, check for Y2K38 support and skip if it is unsupported.

Fixes: #36591

PR-URL: #37707
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
targos pushed a commit that referenced this issue May 1, 2021
Move Y2K38-specific parts of test-fs-utimes to test-fs-utimes-y2K38.js.
On non-Windows, check for Y2K38 support and skip if it is unsupported.

Fixes: #36591

PR-URL: #37707
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
jkleinsc added a commit to electron/electron that referenced this issue Jan 10, 2022
Resolves nodejs/node#36591, which is currently happening on 13-x-y
jkleinsc added a commit to electron/electron that referenced this issue Jan 11, 2022
Resolves nodejs/node#36591, which is currently happening on 13-x-y
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. fs Issues and PRs related to the fs subsystem / file system.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants