Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dash-testing: Mock synchronisation issues on Windows only (Linux/OSX fine) #855

Open
cormacc opened this issue Aug 7, 2019 · 7 comments

Comments

@cormacc
Copy link

cormacc commented Aug 7, 2019

Describe your context

dash                                    1.1.1
dash-bootstrap-components               0.7.0
dash-core-components                    1.1.1
dash-html-components                    1.0.0
dash-renderer                           1.0.0
dash-table                              4.1.0
selenium                                3.141.0

Previously tested with dash 1.0.2 -- same behaviour observed.

  • Browser, Version and OS

    • OS: Windows 10 Pro
    • Browser: Chrome
    • Version: 76.0.3809.87, 75.0.3770.80 (matched chromedriver version used in both cases)
    • Issue does NOT occur using same Chrome versions on Manjaro Linux or OSX
    • Reproduced on two different machines/Windows 10 installations

Describe the bug

Unit tests using dash-testing / selenium clear_input() / send_keys() methods drive browser input as expected (verified visually -- inserting artificial time.sleep() statements on occasion to be sure). Mocked functions capture some but not all invocations. In some cases, inserting additional calls to clear_input() appears to cause some internal (?inter-process?) synchronisation of the mock to occur, capturing the outstanding invocations. However this is inconsistent -- the workaround shown in the self-contained example below works consistently in that example, but not in other very similar tests in our codebase.

Expected behavior

Mocks record invocations from all callback invocations without need for additional attempts to flush/synchronise. I.e. behaviour consistent with expectations / verified behaviour on Linux and OSX.

Self-contained example

import re
import random
import time
import pytest

import dash
import dash_html_components as html
import dash_core_components as dcc
from dash.dependencies import Input, Output, State

import platform

DEFAULT_VALUE = 20
NEW_VALUE = -20

def _build_app_layout():
    app = dash.Dash(__name__)
    app.layout = html.Div(id='top',children=[
      dcc.Input(id='some_input', value=str(DEFAULT_VALUE)),
      html.Div(id='alert_container')
    ])
    @app.callback(Output('alert_container', 'children'),
                  [Input('some_input', 'value')])
    def validate_input(new_value: str):
        if re.match(r'[+-]?\d+',new_value):
            #Nonsense call so we can count num invocations
            _throwaway = random.randint(int(new_value),DEFAULT_VALUE)
            return 'Unmodified' if new_value == str(DEFAULT_VALUE) else 'Modified'
        #Nonsense call so we can count num invocations
        _throwaway = random.randint(0,DEFAULT_VALUE)
        return 'Invalid'
    return app


@pytest.mark.webtest
def test_should_validate_inputs_when_modified(mocker, dash_duo):
    """This is a self-contained test intended to reproduce a problem we've been
    having, where mocked functions invoked from dash callbacks can be verified on
    Linux and OSX, but not on Windows.
    """
    # Given
    mocker.patch('random.randint', return_value=DEFAULT_VALUE)
    dash_duo.start_server(_build_app_layout())
    input_element = dash_duo.find_element('input#some_input')
    initial_value = input_element.get_attribute('value')

    # When
    dash_duo.clear_input(input_element)
    input_element.send_keys(f'{NEW_VALUE}')
    dash_duo.wait_for_contains_text('input#some_input', str(NEW_VALUE), timeout=5)
    # Uncommenting the next two lines causes test to pass on Windows
    #if platform.system() == 'Windows':
    #    dash_duo.clear_input(input_element)

    # Then
    # ... over-elaborated assertions to see error detail
    random.randint.assert_has_calls([
        mocker.call(DEFAULT_VALUE, DEFAULT_VALUE),
        mocker.call(0, DEFAULT_VALUE),# <clear>
        mocker.call(0, DEFAULT_VALUE),# -
        mocker.call(-2, DEFAULT_VALUE),
        mocker.call(-20, DEFAULT_VALUE)
    ])
    # ... actual assertion if we resolve this issue
    random.randint.assert_called_with(NEW_VALUE, DEFAULT_VALUE)
@cormacc
Copy link
Author

cormacc commented Aug 7, 2019

N.B. Uncommenting the lines beginning if platform.system() == 'Windows': causes the test to pass on Windows, but it doesn't appear to sync state completely -- i.e. the mocked function call history doesn't include a final call with arguments (0, DEFAULT_VALUE) after the clear.
Uncommenting and omitting the if guard causes the test to fail on Linux and OSX as you'd expect, as the mocked function call history DOES include the call resulting from the final clear_input.

@byronz
Copy link
Contributor

byronz commented Aug 8, 2019

@cormacc thanks for the elaborated issue, it's interesting you are using dash.testing for unit test, and across OS. I will try to reproduce it when I get a windows machine.

In the meantime, I was wondering if you can get more sightful details from pytest --pdb or anything else in http://doc.pytest.org/en/latest/usage.html

@cormacc
Copy link
Author

cormacc commented Aug 9, 2019

That's handy @byronz -- hadn't known about the --pdb arg to pytest (I'm mainly a c/embedded guy).

So it does seem to be some sort of time-dependent / synchronisation issue. With that self-contained example, the test is still failing, but by the time I get to inspect the mock object at the pdb prompt, the mock state has caught up with expectations....

E       AssertionError: Calls not found.
E       Expected: [call(20, 20), call(0, 20), call(0, 20), call(-2, 20), call(-20, 20)]
E       Actual: [call(20, 20), call(0, 20)]
<--- snip-->
(Pdb) random.randint.call_args_list
[call(20, 20), call(0, 20), call(0, 20), call(-2, 20), call(-20, 20)]

And it turns out that if I inject a time.sleep(2) just before the assertion, the test passes (time.sleep(1) isn't enough, on my test machine anyway).

This result is pretty repeatable, but doesn't appear to be deterministic -- I did get one failure at time.sleep(5).

@cormacc
Copy link
Author

cormacc commented Aug 9, 2019

Another interesting (but unrelated) bit of platform-specific weirdness that's cropped up in one of my actual tests -- almost identical to this one, but with a more complex app layout and mocking our own rather than a library function -- is that I need to call the .click() method on the input element before clearing it, but only on Windows.

@cormacc
Copy link
Author

cormacc commented Aug 9, 2019

Uggh -- behaviour on windows seems highly non-deterministic -- waits and click events work for one test and not another, or works once, then not again. Vaguely stateful behaviour? @byronz do you reckon this is more likely to be an upstream issue with selenium on windows than a dash-testing/selenium integration issue? Behaviour on linux and osx seems consistent and reliable.

Worst case I can just setup a CI build on a dedicated webtest branch and have my windows-based colleagues commit to that to run their tests. Though need to figure out #829 first.

@byronz
Copy link
Contributor

byronz commented Aug 9, 2019

TBH, the first impression without digging into your details was: Oh windows!

I have roughly these scenarios in mind:

  1. a bug of pytest-mock on windows, you can try replacing that with native python unittest.mock https://docs.python.org/3/library/unittest.mock.html
  2. from your comments, looks like it's not a timing order issue, but rather whether you can get the call traces from mocked object. I was wondering the time.sleep(5) case, have you ever tried to set up a poll mechanism and see what's the boundary? like can you get the result if the sleep is 10 minutes?

agree that #829 has higher priority for us, the remote scenario was not used internally in plotly though, I will try test with a selenium grid docker cluster and fix it

good luck!

@cormacc
Copy link
Author

cormacc commented Aug 9, 2019

Thanks @byronz
Re. 1 -- issue still in evidence replacing pytest-mock with unittest.mock
Re. 2 -- the issue illustrated by my self-contained example seems to relate purely to getting the call traces as you say .

However the similar example using my more complex actual layout behaves differently -- again robust/deterministic on the two posix platforms, but call traces AND selenium-driven browser behaviour is erratic on Windows -- e.g. sometimes the call to clear_input() does nothing and the subsequent call to send_keys() does, and the call traces (after artificial wait) don't match expectations, but partially match the observed behaviour in the browser.

This may be another manifestation of the root cause of my mock call tracing woes, or compounded by another platform-specific issue -- e.g. (off the top of my head) werkzeug on windows struggling to keep up with callbacks. We had some issues with this early on (used pywebview/trident and werkzeug earlier on, and observed some callback pre-emption issues with a dcc.Interval which we resolved by moving to pywebview/cef and cheroot).

Anyway, I'm going OT - thanks for the help - I'll keep an eye on #829

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants