Tests non-determistically fail due to unreliable timing #100

mejo- · 2017-12-26T17:12:23Z

Hi, I'm forwarding a bugreport that got reported against the python-blessed Debian package:

python-blessed's testsuite appears to use method timing/benchmarking in such
a way that it will non-deterministically FTBFS:

[…]
________________________ test_esc_delay_cbreak_timout_0 ________________________

  def test_esc_delay_cbreak_timout_0():
      """esc_delay still in effect with timeout of 0 ("nonblocking")."""
      pid, master_fd = pty.fork()
      if pid == 0:  # child
          cov = init_subproc_coverage('test_esc_delay_cbreak_timout_0')
          term = TestTerminal()
          os.write(sys.__stdout__.fileno(), SEMAPHORE)
          with term.cbreak():
              stime = time.time()
              inp = term.inkey(timeout=0)
              measured_time = (time.time() - stime) * 100
              os.write(sys.__stdout__.fileno(), (
                  '%s %i' % (inp.name, measured_time,)).encode('ascii'))
              sys.stdout.flush()
          if cov is not None:
              cov.stop()
              cov.save()
          os._exit(0)
  
      with echo_off(master_fd):
          os.write(master_fd, u'\x1b'.encode('ascii'))
          read_until_semaphore(master_fd)
          stime = time.time()
          key_name, duration_ms = read_until_eof(master_fd).split()
  
      pid, status = os.waitpid(pid, 0)
      assert key_name == u'KEY_ESCAPE'
      assert os.WEXITSTATUS(status) == 0
      assert math.floor(time.time() - stime) == 0.0

  assert 34 <= int(duration_ms) <= 45, int(duration_ms)

E AssertionError: 71
E assert 71 <= 45
E + where 71 = int('71')

blessed/tests/test_keyboard.py:528: AssertionError
==================== 1 failed, 305 passed in 75.27 seconds =====================
E: pybuild pybuild:283: test: plugin distutils failed with: exit code=1: cd /build/1st/python-blessed-1.14.1/.pybuild/pythonX.Y_3.5/build; python3.5 -m pytest
dh_auto_test: pybuild --test --test-pytest -i python{version} -p 3.5 returned exit code 13
debian/rules:7: recipe for target 'build' failed
make: *** [build] Error 25
dpkg-buildpackage: error: debian/rules build gave error exit status 2
I: copying local configuration
E: Failed autobuilding of package

[…]

The full build log is attached or can be viewed here:

https://tests.reproducible-builds.org/debian/logs/unstable/amd64/python-blessed_1.14.1-1.build1.log.gz

The text was updated successfully, but these errors were encountered:

P-EB · 2017-12-31T12:47:40Z

The assertion that fails is not relevant to this test. Indeed, this test is with no minimal duration and no timeout. There is no way to predict how long it'll take depending on the computer, the architecture or whatever else.

The assertion above

assert math.floor(time.time() - stime) == 0.0

Allows to make sure that the test takes less than 1 second, which is reasonable enough.

I suggest to remove the incriminated assertion.

jquast · 2018-06-20T04:53:57Z

We will shortly delete/disable any such tests.

We can delete all tests, honestly. I don't know why you folks run them. The code isn't changing. We run tests when we change code. Why do you bother me about my tests?

jquast · 2018-06-20T05:05:25Z

tests are commented out in next release.

jquast · 2018-06-20T06:39:43Z

https://pypi.org/project/blessed/

mejo- · 2018-06-20T09:45:34Z

@jquast: Tests are there for quality assurance, i.e. to detect regressions. So it's actually a good thing that distributions which ship python-blessed (like Debian does) run the testsuite at build-time.

jquast · 2018-06-20T14:21:00Z

I am the one who wrote over 3,000 LOC in tests, which is more than the package itself, so I feel like I know better than you what the tests are there for.

We have successfully disabled a large swatch of very useful tests. I hope that is helpful to you, it certain is a detriment to me and all future contributors, thanks for the help.

mejo- · 2018-06-20T14:29:25Z

I don't want to question your authority on declaring the purpose of tests written by you 😁

I'm just stating what becomes more and more common (and in my pov for a good reasion). That testsuites are run by distributions as build time in order to detect regressions.

If some of the tests are not intended to be run by anyone but upstream (yourself), what do you think about separating them from the ones that should succeed on all systems?

P-EB · 2018-06-25T14:34:42Z

Dear @jquast ,

We can delete all tests, honestly. I don't know why you folks run them. The code isn't changing. We run tests when we change code. Why do you bother me about my tests?

As @mejo- pointed out, we run these because it allows to have a better insurance that no regression happens. And not only from your own code and changes, but also from the dependencies of your code. As for an example, recently, tornado got updated, and all its reverse-dependencies got broken into debian. Without this continuous integration feature that is running tests periodically, we'd have taken longer to realize it.

Hence, it's a matter of "your tests are useful to youself as a developer, but also useful for the distributions that rely on your software". Maybe some tests are not suitable for integration in GNU/Linux distributions. To tackle this very issue, there are two solutions : either you are willing to help us, and you could split your tests between those specific to you as an upstream author and those useful for anyone willing to get your software, or you consider that it's not your job (and honestly I'd understand you), and we, as the maintainers, will disable/alter the tests that are not adapted to our architecture.

Please, keep in mind that we are not putting your work at questions, we're only notifying you about a usage of your tests that you actually seem to have not think about before.

Cheers. :)

jquast closed this as completed Jun 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests non-determistically fail due to unreliable timing #100

Tests non-determistically fail due to unreliable timing #100

mejo- commented Dec 26, 2017

P-EB commented Dec 31, 2017

jquast commented Jun 20, 2018

jquast commented Jun 20, 2018

jquast commented Jun 20, 2018

mejo- commented Jun 20, 2018

jquast commented Jun 20, 2018

mejo- commented Jun 20, 2018

P-EB commented Jun 25, 2018

Tests non-determistically fail due to unreliable timing #100

Tests non-determistically fail due to unreliable timing #100

Comments

mejo- commented Dec 26, 2017

P-EB commented Dec 31, 2017

jquast commented Jun 20, 2018

jquast commented Jun 20, 2018

jquast commented Jun 20, 2018

mejo- commented Jun 20, 2018

jquast commented Jun 20, 2018

mejo- commented Jun 20, 2018

P-EB commented Jun 25, 2018