Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_faulthandler.py::test_timeout[True] occasionally crashes #7022

Open
bluetech opened this issue Apr 5, 2020 · 14 comments · Fixed by #7030
Open

test_faulthandler.py::test_timeout[True] occasionally crashes #7022

bluetech opened this issue Apr 5, 2020 · 14 comments · Fixed by #7030
Labels
type: bug problem that needs to be addressed

Comments

@bluetech
Copy link
Member

bluetech commented Apr 5, 2020

This tests the faulthandler_timeout functionality which dumps stacktraces of all threads of tests that run for more than a specified amount of times.

The test sometimes crashes with Windows fatal exception: access violation while dumping the threads. This is probably a cpython bug.

Noticed on both windows-py35 and windows-py37 jobs at least. (Update: also on pypy3-ubuntu).

For now I think we should just skip it on Windows, unless anyone has better ideas or ability to investigate. I'll send a PR doing that.

@bluetech
Copy link
Member Author

bluetech commented Apr 5, 2020

@blueyed also found that it happens on pypy3-ubuntu at least: https://github.com/pytest-dev/pytest/runs/562142405

So it's neither CPython or Windows only... Will edit the title.

@blueyed

I think it might be worth trying to bump the timeout - although the behavior on/with timeout appears to be buggy at least on pypy (no real/full stacktrace).

It seems clear to me that the timeout does trigger so I'm not sure that increasing it will help. For example in the run above:

Timeout (0:00:00.010000)!
Thread 0x7F320BF2BDC0 (most recent call first, approximate line numbers):

@bluetech bluetech changed the title test_faulthandler.py::test_timeout[True] occasionally crashes on Windows test_faulthandler.py::test_timeout[True] occasionally crashes Apr 5, 2020
@blueyed
Copy link
Contributor

blueyed commented Apr 5, 2020

Note that here it looks like it was just "slow on CI", apart from the message where it looks like the timeout is not 1s (but that might be a Python issue).
I.e. I would try bumping the timeout to 10s first on CI.

2020-04-05T15:06:38.7502748Z ================================== FAILURES ===================================
2020-04-05T15:06:38.7503011Z ________________________________ test_timeout _________________________________
2020-04-05T15:06:38.7503415Z TypeError: expected bytes, str found
2020-04-05T15:06:38.7503571Z 
2020-04-05T15:06:38.7503815Z During handling of the above exception, another exception occurred:
2020-04-05T15:06:38.7503978Z 
2020-04-05T15:06:38.7504199Z     """
2020-04-05T15:06:38.7504431Z     
2020-04-05T15:06:38.7504642Z     __all__ = ['TestResult', 'TestCase', 'TestSuite',
2020-04-05T15:06:38.7504859Z                'TextTestRunner', 'TestLoader', 'FunctionTestCase', 'main',
2020-04-05T15:06:38.7505092Z                'defaultTestLoader', 'SkipTest', 'skip', 'skipIf', 'skipUnless',
2020-04-05T15:06:38.7505318Z                'expectedFailure', 'TextTestResult', 'installHandler',
2020-04-05T15:06:38.7505540Z                'registerResult', 'removeResult', 'removeHandler']
2020-04-05T15:06:38.7505782Z     
2020-04-05T15:06:38.7506022Z     # Expose obsolete functions for backwards compatibility
2020-04-05T15:06:38.7506767Z     __all__.extend(['getTestCaseNames', 'makeSuite', 'findTestCases'])
2020-04-05T15:06:38.7507051Z     
2020-04-05T15:06:38.7507286Z     __unittest = True
2020-04-05T15:06:38.7507506Z     
2020-04-05T15:06:38.7507738Z     from .result import TestResult
2020-04-05T15:06:38.7507941Z     from .case import (TestCase, FunctionTestCase, SkipTest, skip, skipIf,
2020-04-05T15:06:38.7508190Z                        skipUnless, expectedFailure)
2020-04-05T15:06:38.7508433Z >   from .suite import BaseTestSuite, TestSuite
2020-04-05T15:06:38.7508681Z E   SystemError: PyEval_EvalFrameEx returned a result with an error set
2020-04-05T15:06:38.7508844Z 
2020-04-05T15:06:38.7509094Z c:\hostedtoolcache\windows\python\3.5.4\x64\lib\unittest\__init__.py:61: SystemError
2020-04-05T15:06:38.7509372Z =========================== short test summary info ===========================
2020-04-05T15:06:38.7509616Z FAILED test_timeout.py::test_timeout - SystemError: PyEval_EvalFrameEx return...
2020-04-05T15:06:38.7509867Z ============================== 1 failed in 0.09s ==============================
2020-04-05T15:06:38.7510133Z ---------------------------- Captured stderr call -----------------------------
2020-04-05T15:06:38.7510367Z Timeout (0:00:00.010000)!
2020-04-05T15:06:38.7510599Z Current thread 0x00001470 (most recent call first):
2020-04-05T15:06:38.7511029Z   File "c:\hostedtoolcache\windows\python\3.5.4\x64\lib\collections\__init__.py", line 446 in namedtuple
2020-04-05T15:06:38.7511338Z   File "c:\hostedtoolcache\windows\python\3.5.4\x64\lib\unittest\case.py", line 1 in <module>
2020-04-05T15:06:38.7511668Z   File "<frozen importlib._bootstrap>", line 222 in _call_with_frames_removed
2020-04-05T15:06:38.7512031Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\_pytest\reports.py", line 294 in from_item_and_call
2020-04-05T15:06:38.7512411Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\_pytest\runner.py", line 285 in pytest_runtest_makereport
2020-04-05T15:06:38.7512801Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\pluggy\callers.py", line 187 in _multicall
2020-04-05T15:06:38.7513157Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\pluggy\manager.py", line 87 in <lambda>
2020-04-05T15:06:38.7513474Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\pluggy\manager.py", line 93 in _hookexec
2020-04-05T15:06:38.7513786Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\pluggy\hooks.py", line 286 in __call__
2020-04-05T15:06:38.7514132Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\_pytest\runner.py", line 192 in call_and_report
2020-04-05T15:06:38.7514493Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\_pytest\runner.py", line 104 in runtestprotocol
2020-04-05T15:06:38.7514853Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\_pytest\runner.py", line 89 in pytest_runtest_protocol
2020-04-05T15:06:38.7515207Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\pluggy\callers.py", line 187 in _multicall
2020-04-05T15:06:38.7515525Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\pluggy\manager.py", line 87 in <lambda>
2020-04-05T15:06:38.7515942Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\pluggy\manager.py", line 93 in _hookexec
2020-04-05T15:06:38.7516251Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\pluggy\hooks.py", line 286 in __call__
2020-04-05T15:06:38.7516578Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\_pytest\main.py", line 272 in pytest_runtestloop
2020-04-05T15:06:38.7516896Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\pluggy\callers.py", line 187 in _multicall
2020-04-05T15:06:38.7517211Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\pluggy\manager.py", line 87 in <lambda>
2020-04-05T15:06:38.7517572Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\pluggy\manager.py", line 93 in _hookexec
2020-04-05T15:06:38.7518236Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\pluggy\hooks.py", line 286 in __call__
2020-04-05T15:06:38.7518630Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\_pytest\main.py", line 247 in _main
2020-04-05T15:06:38.7518982Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\_pytest\main.py", line 191 in wrap_session
2020-04-05T15:06:38.7519335Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\_pytest\main.py", line 240 in pytest_cmdline_main
2020-04-05T15:06:38.7519684Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\pluggy\callers.py", line 187 in _multicall
2020-04-05T15:06:38.7520056Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\pluggy\manager.py", line 87 in <lambda>
2020-04-05T15:06:38.7520392Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\pluggy\manager.py", line 93 in _hookexec
2020-04-05T15:06:38.7520708Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\pluggy\hooks.py", line 286 in __call__
2020-04-05T15:06:38.7521029Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\_pytest\config\__init__.py", line 125 in main
2020-04-05T15:06:38.7521340Z   File "d:\a\pytest\pytest\.tox\py35-xdist\lib\site-packages\pytest\__main__.py", line 7 in <module>
2020-04-05T15:06:38.7521639Z   File "c:\hostedtoolcache\windows\python\3.5.4\x64\lib\runpy.py", line 85 in _run_code
2020-04-05T15:06:38.7521973Z   File "c:\hostedtoolcache\windows\python\3.5.4\x64\lib\runpy.py", line 193 in _run_module_as_main

https://github.com/pytest-dev/pytest/pull/7020/checks?check_run_id=562177373

@bluetech
Copy link
Member Author

bluetech commented Apr 5, 2020

@blueyed OK, worth a try at least. Closing this, will send a new PR just increasing the timeout.

@bluetech bluetech closed this as completed Apr 5, 2020
@blueyed
Copy link
Contributor

blueyed commented Apr 5, 2020

@bluetech
No, wait - I've only read/got your comment later.

I've actually looked at the test now: the timeout is faulthandler_timeout = 0.01, and it gets triggered - good.

But it appears to cause some issue then (SystemError above), which is not handled by pytest / the test.
It uses runpytest_subprocess though, so I would expect it to not having failed in the above case.

@blueyed blueyed reopened this Apr 5, 2020
@blueyed
Copy link
Contributor

blueyed commented Apr 5, 2020

The ubuntu-pypy failure:

[gw1] linux -- Python 3.6.9 /home/runner/work/pytest/pytest/.tox/pypy3-xdist/bin/python

testdir = <Testdir local('/tmp/pytest-of-runner/pytest-0/test_timeout0')>
enabled = True

    @pytest.mark.parametrize("enabled", [True, False])
    def test_timeout(testdir, enabled):
        """Test option to dump tracebacks after a certain timeout.
        If faulthandler is disabled, no traceback will be dumped.
        """
        testdir.makepyfile(
            """
        import os, time
        def test_timeout():
            time.sleep(1 if "CI" in os.environ else 0.1)
        """
        )
        testdir.makeini(
            """
            [pytest]
            faulthandler_timeout = 0.01
            """
        )
        args = ["-p", "no:faulthandler"] if not enabled else []
    
        result = testdir.runpytest_subprocess(*args)
        tb_output = "most recent call first"
        if sys.version_info[:2] == (3, 3):
            tb_output = "Thread"
        if enabled:
            result.stderr.fnmatch_lines(["*%s*" % tb_output])
        else:
            assert tb_output not in result.stderr.str()
>       result.stdout.fnmatch_lines(["*1 passed*"])
E       Failed: nomatch: '*1 passed*'
E           and: '============================= test session starts =============================='
E           and: 'platform linux -- Python 3.6.9[pypy-7.3.0-final], pytest-5.4.1.dev73+gcbdb54d26, py-1.8.1, pluggy-0.13.1'
E           and: 'rootdir: /tmp/pytest-of-runner/pytest-0/test_timeout0, inifile: tox.ini'
E           and: 'collected 1 item'
E           and: ''
E           and: 'test_timeout.py '
E       remains unmatched: '*1 passed*'

/home/runner/work/pytest/pytest/testing/test_faulthandler.py:80: Failed
----------------------------- Captured stdout call -----------------------------
running: /home/runner/work/pytest/pytest/.tox/pypy3-xdist/bin/python -mpytest --basetemp=/tmp/pytest-of-runner/pytest-0/test_timeout0/runpytest-0
     in: /tmp/pytest-of-runner/pytest-0/test_timeout0
============================= test session starts ==============================
platform linux -- Python 3.6.9[pypy-7.3.0-final], pytest-5.4.1.dev73+gcbdb54d26, py-1.8.1, pluggy-0.13.1
rootdir: /tmp/pytest-of-runner/pytest-0/test_timeout0, inifile: tox.ini
collected 1 item

test_timeout.py 
----------------------------- Captured stderr call -----------------------------
Timeout (0:00:00.010000)!
Thread 0x7F320BF2BDC0 (most recent call first, approximate line numbers):
=============================== warnings summary ===============================

@bluetech
Copy link
Member Author

bluetech commented Apr 5, 2020

When the subprocess crashes, the "1 passed" output is not matched and the test fails -- this seems intended to me?

blueyed added a commit to blueyed/pytest that referenced this issue Apr 5, 2020
It should happen as late as possible before the test runs.

Ref: pytest-dev#7022
@blueyed
Copy link
Contributor

blueyed commented Apr 5, 2020

The test passing is another issue - like you've mentioned before the faulthandler does not actually timeout anything, but only dumps.

In the pypy case above it appears to crash while trying to dump the traceback, and therefore the inner pytest output is missing (which would have the "passed").

@blueyed
Copy link
Contributor

blueyed commented Apr 5, 2020

@bluetech

@blueyed OK, worth a try at least. Closing this, will send a new PR just increasing the timeout.

Maybe worth a try still: maybe pytest exits while the traceback is being dumped?
(To really fix this the test should probably be rewritten to use a stamp file or similar - to not cause unnecessary waiting.)

@bluetech
Copy link
Member Author

bluetech commented Apr 7, 2020

Increasing the timeout didn't work.

@bluetech bluetech reopened this Apr 7, 2020
blueyed added a commit that referenced this issue Apr 9, 2020
It should happen as late as possible before the test runs.

Ref: #7022
@bluetech
Copy link
Member Author

bluetech commented Apr 9, 2020

I'm afraid 413ca8a didn't help either.

Unless anyone has any other idea, I propose we just skip this test for now for being flaky, as the broken CI is quite annoying. I'll send a PR.

@blueyed
Copy link
Contributor

blueyed commented Apr 9, 2020

How about retrying it for now?
Using e.g. https://github.com/pytest-dev/pytest-rerunfailures might help with catching regressions with it also.. :)

@bluetech
Copy link
Member Author

bluetech commented Apr 9, 2020

I don't like rerunning on failure (as a concept), personally.

@blueyed
Copy link
Contributor

blueyed commented Apr 9, 2020

Me neither. But not running it anymore will not even show when it starts working again then.
FWIW, I've never seen this on my fork, so it is likely not due to updated requirements, but some code change.
I agree though that it is annoying, so I'm -0 for disabling it - although it might just hide a bug in code after all (something that users will see then occasionally anyway).

@bluetech
Copy link
Member Author

bluetech commented Apr 9, 2020

Me neither. But not running it anymore will not even show when it starts working again then.

The issue will remain open, if anyone will want to look at it ;)

If users start seeing it then we will be able to bisect or debug more hopefully.

In any case, whether some code change caused it or not, it's a native code crash, which a Python-level code change can't bring about on its own -- there must be a bug somewhere outside of pytest proper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug problem that needs to be addressed
Projects
None yet
3 participants