Pytest-friendly Test API #5683

leycec · 2022-11-10T08:15:57Z

Streamlit is phenomenal. Everyone knows this. This is why my beautiful science wife ^😻 and I are building out our next open-source multiphysics bioelectricity simulator in Streamlit.

Streamlit: yup, it even does bioelectricity.

Streamlit Testing: A Chink in the Armour

But all is not quite so phenomenal on the testing front. Sadly, Streamlit currently offers no official means of testing Streamlit apps from a pytest test suite. Technically, there does exist:

A third-party streamlit-mock package. Of course, this package is effectively unmaintained. While we applaud the one lone GitHub user brave enough to star streamlit-mock, we are not that user.
Browser-based automation ala Cyprus and Selenium. Of course, then we would have to develop, debug, and maintain non-trivial end-to-end WebDriver tests renowned for superficially behaving non-deterministically in perpetuity. Grant funding only extends so far. It doesn't extend that far. Which is probably good, because we value our sanity and marital stability.

streamlit-mock: Visions of a Better Future

Nobody should use streamlit-mock, because nobody should use unmaintained things. But everybody should be inspired by streamlit-mock to create something similar that actually works and is well-maintained, because the core conceits behind streamlit-mock are sound. To quote their README.md:

Our simple streamlit application calling a REST API backend grew over time, became not so simple and needed a test suite. We wrote some Selenium tests, but these are tricky to get right and run relatively slowly. This package "mocks" most streamlit class to allow "pytest" to be used for testing.

Goals:

Allow tests to be written using pytest

Tests should run quickly so that multiple edge cases and variants can be tested

Tests should be concise and easy to write

Non-Goals:

Testing streamlit itself (the package removes all dependnecies on the real streamlitk)

Testing that the app uses Streamlit correctly (the package fakes input and records outputs)

To be as complete as an end-to-end Selenium test (Streamlit/web server are out of the loop)

Goals seem reasonable. Non-goals seem reasonable, too. We applaud these reasonable things.

What Do You Want Us to Do About Your Problems, Bro?

Okay. Let's get down to brass data science tacks. It'd be phenomenal if Streamlit itself offered an official pytest-friendly test API – lightly inspired by streamlit_mock but actually well-maintained and working. An official github.com/streamlit/pytest-streamlit plugin trivially enabling Streamlit apps to be tested under pytest would be especially phenomenal. pytest-streamlit would hypothetically provide support – presumably in the form of one or more pytest fixtures – for programmatically running any arbitrary Streamlit app as an integration test under a Streamlit mock API.

Indeed, we see that Streamlit itself tests itself with a pytest test suite. This sorta suggests that the requisite functionality for testing downstream Streamlit apps with pytest already exists... within Streamlit itself! Shock twist is shocking. 😮

Indeed, we see that the existing streamlit/lib/tests/streamlit/runtime/scriptrunner/script_runner_test.py integration test appears to already be doing most – maybe even all – of the heavy lifting we'd expect from an official github.com/streamlit/pytest-streamlit plugin.

Copy-and-pasting the deeply nested functionality of script_runner_test.py into our own pytest test suite justifiably frightens us, however. If something resembling that could be published as a stand-alone API for reuse by the general public, we will happily dance with our Maine Coon cats on the frozen lake outside our cottage in a TikTok video generally regarded as deplorable.

What We Used to Do No Longer Works, Which Is Sad

Previously, we used to exercise our entire Streamlit app with a trivial integration test in our pytest test suite resembling:

def test_app_lifecycle(capsys) -> None:
    '''
    Integration test exercising the **maximally trivial app lifecycle** (i.e.,
    workflow spinning up, running, and then immediately shutting down this app).

    Parameters
    ----------
    capsys
        Builtin fixture object permitting pytest-specific capturing (i.e.,
        hiding) of both standard output and error to be temporarily disabled.
    '''

    # Temporarily disable pytest-specific capturing (i.e., hiding) of both
    # standard output and error for the duration of this integration test. Why?
    # Because this test often takes an excessive amount of time and can,
    # moreover, fail to halt in various edge cases outside our control.
    # Capturing output unhelpfully obscures these concerns during local testing.
    with capsys.disabled():
        # Defer test-specific imports.
        #
        # Note that:
        # * Importing "hot_bioelectricity_app.main" has the beneficial side
        #   effect of running this Streamlit-based web app to completion.
        # * Attempting to run this app via the "streamlit run" subcommand does
        #   *NOT* halt as expected, as that subcommand understandably runs the
        #   passed app indefinitely.
        from hot_bioelectricity_app import main

        # Immediately return after doing so.
        return

Tragically, that test now fails with a non-human-readable CPython error resembling:

test/a90_func/lifecycle/test_lifecycle.py::test_app_lifecycle 2022-11-10 03:15:48.662 
  Warning: to view this Streamlit app on a browser, run it with the following
  command:

    streamlit run /usr/lib/python3.10/site-packages/pytest/__main__.py [ARGUMENTS]
Fatal Python error: Illegal instruction

Current thread 0x00007f3733aca740 (most recent call first):
  File "/usr/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 590 in convert_column
  File "/usr/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 609 in <listcomp>
  File "/usr/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 609 in dataframe_to_arrays
  File "/usr/lib/python3.10/site-packages/streamlit/type_util.py", line 634 in data_frame_to_bytes
  File "/usr/lib/python3.10/site-packages/streamlit/elements/arrow.py", line 399 in _marshall_display_values
  File "/usr/lib/python3.10/site-packages/streamlit/elements/arrow.py", line 215 in _marshall_styler
  File "/usr/lib/python3.10/site-packages/streamlit/elements/arrow.py", line 174 in marshall
  File "/usr/lib/python3.10/site-packages/streamlit/elements/arrow.py", line 110 in _arrow_dataframe
  File "/usr/lib/python3.10/site-packages/streamlit/runtime/metrics_util.py", line 311 in wrapped_func
  File "/usr/lib/python3.10/site-packages/streamlit/elements/dataframe_selector.py", line 105 in dataframe
  File "/usr/lib/python3.10/site-packages/streamlit/runtime/metrics_util.py", line 311 in wrapped_func
  File "/home/leycec/py/calculion/calculion/main.py", line 460 in main
  File "/home/leycec/py/calculion/calculion/main.py", line 489 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1078 in _handle_fromlist
  File "/home/leycec/py/calculion/calculion_test/a90_func/lifecycle/test_lifecycle.py", line 57 in test_app_lifecycle
  File "/usr/lib/python3.10/site-packages/_pytest/python.py", line 192 in pytest_pyfunc_call
  File "/usr/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/usr/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/usr/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/usr/lib/python3.10/site-packages/_pytest/python.py", line 1761 in runtest
  File "/usr/lib/python3.10/site-packages/_pytest/runner.py", line 166 in pytest_runtest_call
  File "/usr/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/usr/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/usr/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/usr/lib/python3.10/site-packages/_pytest/runner.py", line 259 in <lambda>
  File "/usr/lib/python3.10/site-packages/_pytest/runner.py", line 338 in from_call
  File "/usr/lib/python3.10/site-packages/_pytest/runner.py", line 258 in call_runtest_hook
  File "/usr/lib/python3.10/site-packages/_pytest/runner.py", line 219 in call_and_report
  File "/usr/lib/python3.10/site-packages/_pytest/runner.py", line 130 in runtestprotocol
  File "/usr/lib/python3.10/site-packages/_pytest/runner.py", line 111 in pytest_runtest_protocol
  File "/usr/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/usr/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/usr/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/usr/lib/python3.10/site-packages/_pytest/main.py", line 347 in pytest_runtestloop
  File "/usr/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/usr/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/usr/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/usr/lib/python3.10/site-packages/_pytest/main.py", line 322 in _main
  File "/usr/lib/python3.10/site-packages/_pytest/main.py", line 268 in wrap_session
  File "/usr/lib/python3.10/site-packages/_pytest/main.py", line 315 in pytest_cmdline_main
  File "/usr/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/usr/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/usr/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/usr/lib/python3.10/site-packages/_pytest/config/__init__.py", line 164 in main
  File "/usr/lib/python3.10/site-packages/_pytest/config/__init__.py", line 187 in console_main
  File "/usr/lib/python3.10/site-packages/pytest/__main__.py", line 5 in <module>
  File "/usr/lib/python3.10/site-packages/coverage/execfile.py", line 199 in run
  File "/usr/lib/python3.10/site-packages/coverage/cmdline.py", line 830 in do_run
  File "/usr/lib/python3.10/site-packages/coverage/cmdline.py", line 659 in command_line
  File "/usr/lib/python3.10/site-packages/coverage/cmdline.py", line 943 in main
  File "/usr/lib/python3.10/site-packages/coverage/__main__.py", line 8 in <module>
  File "/usr/lib/python3.10/runpy.py", line 86 in _run_code
  File "/usr/lib/python3.10/runpy.py", line 196 in _run_module_as_main

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg._cythonized_array_utils, scipy.linalg._flinalg, scipy.linalg._solve_toeplitz, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_lapack, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, numpy.linalg.lapack_lite, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation, scipy.optimize._direct, gmpy2.gmpy2, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.tslib, pandas._libs.lib, pandas._libs.hashing, pyarrow.lib, pyarrow._hdfsio, pandas._libs.ops, numexpr.interpreter, pyarrow._compute, bottleneck.move, bottleneck.nonreduce, bottleneck.nonreduce_axis, bottleneck.reduce, pandas._libs.arrays, pandas._libs.index, pandas._libs.join, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.internals, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.tslibs.strptime, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, PIL._imaging, _brotli, simplejson._speedups, google.protobuf.pyext._message, _testcapi, markupsafe._speedups, matplotlib._c_internal_utils, matplotlib._path, kiwisolver._cext, matplotlib._image, regex._regex, pvectorc (total: 132)
./pytest: line 185: 19640 Illegal instruction     "${PYTHON_ARGS[@]}" -m coverage run -m "${PYTEST_ARGS[@]}" .

Needless to say, we no longer run that test.

Thanks for all the escalating UIX brilliance, Streamlit devs! You make the data science world go round. 🌎

Community voting on feature requests enables the Streamlit team to understand which features are most important to our users.

If you'd like the Streamlit team to prioritize this feature request, please use the 👍 (thumbs up emoji) reaction in response to the initial post.

The text was updated successfully, but these errors were encountered:

willhuang1997 · 2022-11-10T09:03:50Z

Hi @leycec,
Thanks for posting this. I'm sorry that your tests are not currently working and that you want to use a support API for testing. I think @AnOctopus is actually working on something like this that will hopefully be pushed out next quarter of Snowflake or possibly earlier. She can give you some of her thoughts.

Again, thanks for the well thought out post and hopefully we can see this coming to fruition soon!

AnOctopus · 2022-11-10T19:17:37Z

Hey @leycec , thanks for opening this issue.

You are absolutely right that Streamlit's current public/external-dev support for testing apps is lacking. In fact, it is somewhat lacking for internal use too. As @willhuang1997 mentioned, I have a project in the works for unit testing interactions and more ergonomically querying the state of an app, because writing e2e tests with cypress just to test functionality involving multiple script runs is pretty painful.

We only have concrete plans for an initial internal version, but there's a lot of excitement for then making it a public api, and I'm hoping to have it polished and stable enough to make it officially available early next year. It will take a slightly different approach than streamlit-mock seems to, but it looks like it will largely cover the same testing needs (I'm not familiar with streamlit-mock, so I don't know exactly what it is used for testing).

At a high level, the upcoming API will actually build on the approach script_runner_test.py takes (which you correctly noticed does most of what you need for testing apps), but will provide a query api for the elements produced by the script, will have a more ergonomic interface (than the raw protobuf messages) for inspecting those elements, and will let the test interact with them to run the script with modified widget state.

This is eliding a lot of detail so this might be hard to answer, but I'm curious if this sounds like it would cover your testing needs, or if there's something missing (so I can account for that when planning future projects).

I'd also be happy to ping you once we have an internal implementation in case you want to try it out and give feedback prior to the public release. Just let me know if that interests you.

leycec · 2022-11-11T05:34:06Z

O frabjous day. Thanks so much for that deep dive into the applause-worthy future of Streamlit QA. Excitement intensifies. That's exactly what we were grepping for:

...the upcoming API will actually build on the approach script_runner_test.py takes...

😏

...but will provide a query api for the elements produced by the script

🍾 🥂 🍻

...will have a more ergonomic interface (than the raw protobuf messages) for inspecting those elements...

By Thor, yes. I hadn't actually realized that the script_runner_test.py API validates raw ProtoBuf messages rather than Pythonic objects. That... absolutely makes sense for a first-pass implementation. That also cements our decision to quietly wait for an API that no longer frightens us.

Google's protoc compiler is surprisingly heavyweight and non-agile by compare to the standard modus operandi of "anything goes, albeit slowly, because this is Python." Most Pythonistas would have a keyboard fit. This includes my wife. If I suggest that we inject a compilation stage into our pytest workflows, I may have to sleep on the floor with the cats.

...and will let the test interact with them to run the script with modified widget state.

🥳 🎈 🎂

...I'm curious if this sounds like it would cover your testing needs...

Absolutely. You are speaking of a future utopia that the world needs.

That would cover almost everyone else's testing needs, too. StackOverflow is overflowing with questions on this very topic. Okay. Technically, I could only find one. Pragmatically, your proposed API seems to satisfy that question's for automated inspection ala:

...a mechanism which can check if the app is working fine by entering the input values and verifying all the tabs that my app have in an automated way.

The only answer to that question suggests Selenium – probably because that's the only thing that works. Thus, I facepalm.

I'd also be happy to ping you once we have an internal implementation in case you want to try it out and give feedback prior to the public release.

That sounds great! I'm obsessed with QA in Python and would ❤️ to gently prod Streamlit forward there.

dpinol · 2023-07-05T12:41:40Z

Hi @AnOctopus any news about the unit testing project? thanks

Streamlit is notoriously hard to py-test at the moment. See, e.g., <streamlit/streamlit#5683>

willhuang1997 assigned AnOctopus Nov 10, 2022

willhuang1997 added type:enhancement Requests for feature enhancements or new features status:in-progress We're on it! type: testing labels Nov 10, 2022

carolinedlu changed the title ~~[Feature Request] Pytest-friendly Test API~~ Pytest-friendly Test API Nov 10, 2022

carolinedlu added added-voting-callout and removed type: testing labels Nov 10, 2022

carolinedlu added the area:debugging label Nov 10, 2022

This was referenced Nov 12, 2022

dev-libs/apache-arrow: So much badness in "apache-arrow-9.0.0-r1" 6-6-6/spark-overlay#21

Closed

dev-libs/apache-arrow: Fix all the apache-arrow-9.0.0-r1 badness 6-6-6/spark-overlay#22

Merged

carolinedlu removed the added-voting-callout label Nov 15, 2022

paulovcmedeiros added a commit to paulovcmedeiros/pyRobBot that referenced this issue Nov 6, 2023

Add rudimentary test for app

fea8f79

Streamlit is notoriously hard to py-test at the moment. See, e.g., <streamlit/streamlit#5683>

sultaniman mentioned this issue Apr 30, 2024

Do not show a notice if there is no resource state for schema dlt-hub/dlt#1300

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pytest-friendly Test API #5683

Pytest-friendly Test API #5683

leycec commented Nov 10, 2022 •

edited by carolinedlu

Loading

willhuang1997 commented Nov 10, 2022

AnOctopus commented Nov 10, 2022

leycec commented Nov 11, 2022

dpinol commented Jul 5, 2023

Pytest-friendly Test API #5683

Pytest-friendly Test API #5683

Comments

leycec commented Nov 10, 2022 • edited by carolinedlu Loading

Streamlit Testing: A Chink in the Armour

streamlit-mock: Visions of a Better Future

What Do You Want Us to Do About Your Problems, Bro?

What We Used to Do No Longer Works, Which Is Sad

willhuang1997 commented Nov 10, 2022

AnOctopus commented Nov 10, 2022

leycec commented Nov 11, 2022

dpinol commented Jul 5, 2023

leycec commented Nov 10, 2022 •

edited by carolinedlu

Loading