Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

negative datetime::delta exception #155

Open
brooksdavis opened this issue Feb 7, 2016 · 26 comments
Open

negative datetime::delta exception #155

brooksdavis opened this issue Feb 7, 2016 · 26 comments
Labels

Comments

@brooksdavis
Copy link
Contributor

I've started trying to run the FreeBSD test suite on mips64 under qemu. The run inevitably fails (after an indeterminate number of tests) with:

kyua: E: Cannot subtract 1454883115560690us from 1454883116245587us as it would result in a negative datetime::delta, which are not supported.

The check in datetime::timestamp::operator- appears to be correct, but the exception string has the variables swapped.

I'm using the kyua-0.12,2 package. The version of qemu is the amd64 version of emulators/qemu-cheri. I can provide images if that would be useful. I'll try to get this tested on real hardware as well to try and rule out a qemu bug.

@staceyson
Copy link

On an edgerouter lite (running FreeBSD 11.0-CURRENT (ERL) #0 4cf4c54(master)) I am seeing a lot of the tests fail but nothing related to having a negative time delta (suggesting time travel). Some of the test that fail include:

bin/pkill/pgrep-j_test:main  ->  failed: 3 of 3 tests failed  [2.513s]

bin/pkill/pkill-j_test:main  ->  failed: 1 of 3 tests failed  [4.145s]

lib/atf/libatf-c++/atf_c++_test:include  ->  failed: Header check failed; atf-c++.hpp is not self-contained  [1.011s]

lib/atf/libatf-c++/build_test:include  ->  failed: Header check failed; atf-c++/build.hpp is not self-contained  [0.622s]

sbin/growfs/legacy_test:main  ->  broken: TAP test program yielded invalid data: Load of '/tmp/kyua.lDzHmM/510/stdout.txt' failed: Reported plan differs from actual executed tests  [4.141s]

sbin/mdconfig/legacy_test:main  ->  failed: 1 of 1 tests failed  [4.961s]

sys/kern/unix_seqpacket_test:pipe_128k_8k  ->  failed: /usr/src/tests/sys/kern/unix_seqpacket_test.c:347: 0 != setsockopt(sv[0], SOL_SOCKET, SO_SNDBUF, &sndbufsize, sizeof(sndbufsize))  [0.072s]
sys/kern/unix_seqpacket_test:pipe_8k_128k  ->  failed: /usr/src/tests/sys/kern/unix_seqpacket_test.c:347: 0 != setsockopt(sv[0], SOL_SOCKET, SO_SNDBUF, &sndbufsize, sizeof(sndbufsize))  [0.078s]
sys/kern/unix_seqpacket_test:pipe_8k_8k  ->  failed: /usr/src/tests/sys/kern/unix_seqpacket_test.c:347: 0 != setsockopt(sv[0], SOL_SOCKET, SO_SNDBUF, &sndbufsize, sizeof(sndbufsize))  [0.075s]
sys/kern/unix_seqpacket_test:pipe_simulator_128k_128k  ->  failed: /usr/src/tests/sys/kern/unix_seqpacket_test.c:221: 0 != setsockopt(sv[0], SOL_SOCKET, SO_SNDBUF, &sndbufsize, sizeof(sndbufsize))  [0.077s]
[...]

I may be running a rather old test suite, however. I haven't updated this edgerouter with a newer image in some time. Attached are the complete test suite output and results db file.
results.usr_tests.20160208-051013-160329.db.zip
testsuite_output.txt

jmmv added a commit that referenced this issue Feb 8, 2016
The check to ensure the results of a datetime subtraction are postive
is correct, but the error message printed when that happens had the
values swapped.  Fix this.

Addresses part of issue #155.
@jmmv
Copy link
Member

jmmv commented Feb 8, 2016

Good catch on the error message. I'm fixing it but I first need to get the Travis setup back in working order so I can properly validate the fix... (Something has changed in the environment there that's causing builds to fail.)

Could you rerun with --loglevel=debug and then attach the log file from ~/.kyua/logs/ for the corresponding run? (Or use --logfile=/foo to specify where to save it.)

Thanks

@brooksdavis
Copy link
Contributor Author

@jmmv
Copy link
Member

jmmv commented Feb 10, 2016

Bleh. The log file does not provide much insight on the problem. (I wonder why I didn't make that error just an assertion to trigger a core dump, which could be more useful here...)

Is the failure deterministic?

Have you been able to run the Kyua self-tests alone (those in /usr/local/tests/kyua/), without the rest of FreeBSD's test suite? Any failures reported by it would hopefully point quite well to the problem.

@brooksdavis
Copy link
Contributor Author

It's not deterministic. Some number of tests run, but I've had failures occur between 30min and 3 hours into the test run. To be honest, I suspect a qemu bug since we can't trigger this anywhere else.

I've spotted a couple seemingly unrelated test failures in the kyua engine tests:

atf_test:test__body_only__crashes  ->  failed: Line 154: exp_result != test_result_handle->test_result() (model::test_result{type='broken', reason='Premature exit; test case received signal 6 (core dumped)'} != model::test_result{type='broken', reason='Premature exit; test case received signal 6'})  [2.989s]
scheduler_test:integration__stacktrace  ->  failed: Line 986: atf::utils::grep_file("attempting to gather stack trace", result_handle->stderr_file().str()) not met  [1.411s]

Some failures in utils, also probably unrelated (the core dump ones are probably test bugs due to us sysctl kern.coredump=0 in our default images due to very slow SDCards):

stacktrace_test:unlimit_core_size  ->  failed: Core not dumped as expected  [1.161s]
process/executor_test:integration__output_files_always_exist  ->  broken: Premature exit; test case received signal 14  [7.331s]
process/isolation_test:isolate_child__enable_core_dumps  ->  failed: Line 421: status.coredump() not met  [1.108s]
process/operations_test:terminate_self_with__termsig_and_core  ->  failed: Line 375: WCOREDUMP(status) not met  [0.875s]
process/status_test:integration__coredump  ->  failed: Line 206: coredump.coredump() not met  [0.810s]
signals/timer_test:multiprogram_reorder_next_activations  ->  failed: Line 215: exp_items != items ([2, 1, 4, 3] != [1, 2, 4, 3])  [1.158s]

jmmv added a commit that referenced this issue Feb 11, 2016
The check to ensure the results of a datetime subtraction are postive
is correct, but the error message printed when that happens had the
values swapped.  Fix this.

Addresses part of issue #155.
@jmmv
Copy link
Member

jmmv commented Feb 11, 2016

Well, the failure of signals/timer_test:multiprogram_reorder_next_activations is suspicious in this context, but the test itself could be flaky considering that it plays with time.

If I provide you with a patch plumbing additional logging for the failure you are experiencing, could you apply it and retest?

@brooksdavis
Copy link
Contributor Author

I think either I or @staceyson could get it built for mips. I don't currently have a build environment, but do have a machine that I could set one up on.

@staceyson
Copy link

Point me to the patch for this and I can build a mips64 package.

@brooksdavis
Copy link
Contributor Author

@staceyson: as a hack to see if there are any other issues, I might try replacing the exception in datetime::timestamp::operator- with returning 0 since I don't care all that much how long the test takes.

@brooksdavis
Copy link
Contributor Author

On another note, I've done some testing on our FPGA platform and thus far haven't hit this issue, but have managed to wedge the hardware before completing any runs (I've run quite a lot of tests).

@staceyson
Copy link

@brooksdavis I created a version that gives abs() of the difference of the two timestamps. It is available at: https://people.freebsd.org/~sson/mips/packages/ You can install the new package or just kyua into /usr/local/bin.

@jmmv
Copy link
Member

jmmv commented Feb 23, 2016

Disabling the check seems fair, though I can't remember operator- is only used to compute the test's length. It could also be being used to compute test deadlines.

I also find the failure in signals/timer_test:multiprogram_reorder_next_activations to be suspicious, but that test could certainly be flaky.

Would be good to have the list of failures with the hacked binary.

@brooksdavis
Copy link
Contributor Author

With @staceyson's package I'm still seeing seeing these exceptions. I've made very sure that I've installed that one and not another one, but something isn't working. The SHA256 of the package is 633f49f98fac6f4c74f4d48b18afcb0d644df0d31709b864306773073b0dff87 and the kyua executable is be91bd0ab394a4a4485c5c19fc396a4f7729a3fabf92cc9c25ecd0ed5dc578e8

@brooksdavis
Copy link
Contributor Author

I'm no longer able to trigger this so hopefully it's all fixed where ever the bug was.

@brooksdavis
Copy link
Contributor Author

I'm seeing this again with the 0.13,3 package on FreeBSD.

date/format_string_test:C_test  ->  kyua: E: Cannot subtract 1476919548082338us from 1476919547556756us as it would result in a negative datetime::delta, which are not supported.

@brooksdavis brooksdavis reopened this Oct 20, 2016
@brooksdavis
Copy link
Contributor Author

After further reflection, the core assumption this exception is asserting is false. gettimeofday() is not guaranteed to be monotonic. If you want something that is, clock_gettime(CLOCK_MONOTONIC, &ts); would be more appropriate. Otherwise, this code should handle time travel in some sensible manner.

@brooksdavis
Copy link
Contributor Author

With this patch applied to the port I get a usable kyua for qemu. It's worth noting that I need the patch to use the results database as well since it contains the backwards timestamps. I chose to make "negative" deltas 1us rather than 0us in case there was code that assumed positive deltas.

+--- utils/datetime.cpp.orig
++++ utils/datetime.cpp
+@@ -590,11 +590,12 @@
+ datetime::delta
+ datetime::timestamp::operator-(const datetime::timestamp& other) const
+ {
+-    if ((*this) < other) {
+-        throw std::runtime_error(
+-            F("Cannot subtract %s from %s as it would result in a negative "
+-              "datetime::delta, which are not supported") % other % (*this));
+-    }
++    /*
++     * XXX-BD: gettimeofday isn't necessicarily monotonic so return the
++     * smallest non-zero delta if time went backwards.
++     */
++    if ((*this) < other)
++        return datetime::delta::from_microseconds(1);
+     return datetime::delta::from_microseconds(to_microseconds() -
+                                               other.to_microseconds());
+ }

I'd like to apply this change to the port.

@jmmv
Copy link
Member

jmmv commented Nov 5, 2016

Thanks for tracking this down!

This change is insufficient, if only because it will break tests, and smells like a hack after all. Let me dig and see why "negative deltas" are claimed to be unsupported and how to make use of the monotonic clock, which would be much better to compute deltas.

@brd
Copy link
Member

brd commented Nov 15, 2016

Hey @jmmv, any progress on this yet?

@jmmv
Copy link
Member

jmmv commented Nov 15, 2016

So... I think the way we are tracking time in Kyua is broken. Time deltas are not going to work with the current timing mechanisms if the underlying time can change, and it can as you discovered.

The basic question is: what should happen if the current time changes while Kyua is running? Time deltas should work "as you expect", but what about timestamps? Should the "end time" of the run be allowed to be before the "start time", for example?

I had two different thoughts:

  • Make each timestamp object carry the current real time and the monotonic time. Use the former to display timestamps, and the later to compute deltas. This is "hard" to do, would result in incompatible changes in the database, and may be quite confusing. (Timestamps shifting backwards without matching deltas.)
  • Use the real time clock to get the current timestamp the very first time we query the time, and store that as the internal view of what the "base time" is. Then, use the monotonic clock to compute deltas against that time and add them to the "base time" to calculate specific timestamps. This would require minimal changes I think. It's still a bit weird from the user's perspective, because if the time shifts back, the "end time" of the test run will be in the future. But at least it'd offer a consistent view of time throughout a test run.

There of course is the possibility of ignoring this altogether and truncating the timestamps, as your change proposes, but I fear this will not resolve other problems that may exist.

Thoughts?

@ngie-eign
Copy link
Contributor

ngie-eign commented Dec 14, 2016

The timer creation (setitimer...) should be replaced with timer_create (see #164 for more details).

@jmmv
Copy link
Member

jmmv commented Dec 14, 2016

Unfortunately, macOS does not have timer_create et. al.

@asomers
Copy link
Member

asomers commented Feb 17, 2017

@jmmv any of your suggestions would work, and any of them would be much better than crashing, which is what Kyua does now. Personally, I don't think it's a big problem if Kyua thinks a test had a negative duration. It shouldn't happen very often, and the cause will be fairly obvious.

The best solution is obviously to store both RTC and monotonic times in each timestamp, though it's the most work. I wouldn't worry about database compatibility issues too much. Most people only care about the most recent run, and I would think that most people who do keep historical data keep it in the form of the junit.xml files.

@brooksdavis
Copy link
Contributor Author

We're well past the first anniversary of this bug without progress so I plan to commit my hack which makes negative deltas positive 1ms as a patch to the port to allow us to start testing MIPS in ci.freebsd.org and to let me stop rolling my own package in the CheriBSD project.

@jmmv
Copy link
Member

jmmv commented Mar 16, 2017

Please go ahead with that plan. I've only recently retaken Kyua-related work after finding a way to fit it into my schedule, and there are many things to go through... :-/

@brooksdavis
Copy link
Contributor Author

Thanks!

uqs pushed a commit to freebsd/freebsd-ports that referenced this issue Mar 16, 2017
As reported in freebsd/kyua#155, the wall clock
time can go backwards resulting in an apparent negative delta. As
a workaround, convert such deltas to 1us. This allows tests to run
successfully in MIPS64 qemu.

Approved by:	jmmv (maintainer)
Sponsored by:	DARPA, AFRL


git-svn-id: svn+ssh://svn.freebsd.org/ports/head@436312 35697150-7ecd-e111-bb59-0022644237b5
uqs pushed a commit to freebsd/freebsd-ports that referenced this issue Mar 16, 2017
As reported in freebsd/kyua#155, the wall clock
time can go backwards resulting in an apparent negative delta. As
a workaround, convert such deltas to 1us. This allows tests to run
successfully in MIPS64 qemu.

Approved by:	jmmv (maintainer)
Sponsored by:	DARPA, AFRL
brooksdavis added a commit to brooksdavis/kyua that referenced this issue Sep 18, 2020
The computer's concept of "now" can go backwards (and does so with
regularity in QEMU MIPS). If two timestamps are mis-ordered, just return
a 1ms delta.

For more discussion of this issue see:
freebsd#155
brooksdavis added a commit to brooksdavis/kyua that referenced this issue Sep 18, 2020
The computer's concept of "now" can go backwards (and does so with
regularity in QEMU MIPS). If two timestamps are mis-ordered, just
return a 1ms delta.

See also: freebsd#155

Test fix sumitted by: @ngie-eign
brooksdavis added a commit to brooksdavis/kyua that referenced this issue Oct 26, 2020
The computer's concept of "now" can go backwards (and does so with
regularity in QEMU MIPS). If two timestamps are mis-ordered, just
return a 1ms delta.

See also: freebsd#155

Test fix sumitted by: @ngie-eign
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants