[Meta] Research: what can we test with Hypothesis? #107862

sobolevn · 2023-08-11T10:45:40Z

We now have both the code and CI job to run hypothesis tests.

In this issue I invite everyone to propose ideas: what can we test with it?
Criteria:

Tests should be suitable for property-based nature of hypothesis
The action under test should be fast and reliable (since hypothesis will produce a lot of cases to tests, we cannot do any slow / network related things)
Strategies for data generation should be rather simple (I think that later we can get to more complex strategies, but for now for the sake of simplicity and maintability I propose not to get too crazy with them)

What else?

Known applications

Property-based tests are great when there are certain patterns:

decode and encode
dumps and loads

There are also some hidden patterns as (example):

If 'a' in s then
s.index('a') must return an integer

Existing work

Good examples of exising stuff:

Existing hypothesis tests in test_zoneinfo by @pganssle
Existing tests in https://github.com/Zac-HD/stdlib-property-tests by @Zac-HD and others

Linked PRs

The text was updated successfully, but these errors were encountered:

sobolevn · 2023-08-11T10:49:10Z

Right now I've ported and modified several binascii tests from https://github.com/Zac-HD/stdlib-property-tests/blob/master/tests/test_encode_decode.py :)

Zac-HD · 2023-08-12T19:18:14Z

I'd suggest looking at PyPy's property-based tests, since they're known to be useful for developers of a Python implementation.

I'd also reconsider (3). While complicated strategies are obviously more work to develop and use, in my experience they're also disproportionately likely to find bugs - precisely because code working with complicated data structures are even more difficult to test without Hypothesis, and so there are usually weird edge cases.

Examples of this effect include #84838, #86384, #89901, and #83134. The hypothesmith source-code strategies are at a pre-alpha / proof-of-concept stage and have more third-party dependencies so I'm not sure they'd be a good fit for CPython at this time, but you get the idea. I don't have capacity to implement it myself, but could supervise a volunteer or contractor to build a production-ready strategy if there's interest.

rhettinger · 2023-08-13T19:25:22Z

Personally, I don't think we should go down this path. Hypothesis has a certain amount of randomness to it. In general, our tests should be specifically designed to cover every path of interest.

When someone is designing new code, it is perfectly reasonable to use Hypothesis. However, for existing code, we "let's use Hypothesis" isn't a goal directed as a specific problem. One issue that we've had with people pursuing a "let's use Hypothesis" goal is that they are rarely working on modules that they fully understand, that the invariants they mentally invent aren't the actual invariants and reflect bugs in their own understanding rather than actual bugs in the code. For example, we got reports on colorsys conversions not being exactly invertible; however, due to color gamut limitations they can't always be inverted (information is lost) and the conversion tables were dictated by published standards that were invertible. We also got false report on the random module methods by people who just let Hypothesis plug in extreme values without any thought of what the method was actually supposed to do in real use cases. There may have been one actual (but very minor) bug in the standard library found by Hypothesis, but everything else was just noise.

It would be perfectly reasonable to use Hypothesis outside of our test suite and then report an actual bug if found. Otherwise, I think actually checking in the h-tests would just garbage-up our test suite, make it run less deterministically, and no explicitly write-out all the cases being covered.

sobolevn · 2023-08-28T06:12:51Z

@rhettinger

It would be perfectly reasonable to use Hypothesis outside of our test suite and then report an actual bug if found. Otherwise, I think actually checking in the h-tests would just garbage-up our test suite, make it run less deterministically, and no explicitly write-out all the cases being covered.

But, we already have Hypothesis as a part of your test suite and CI. It already has some tests to it. Link:

cpython/.github/workflows/build.yml

Lines 361 to 467 in 5d936b6

    
             test_hypothesis: 
        
               name: "Hypothesis tests on Ubuntu" 
        
               runs-on: ubuntu-20.04 
        
               timeout-minutes: 60 
        
               needs: check_source 
        
               if: needs.check_source.outputs.run_tests == 'true' && needs.check_source.outputs.run_hypothesis == 'true' 
        
               env: 
        
                 OPENSSL_VER: 1.1.1v 
        
                 PYTHONSTRICTEXTENSIONBUILD: 1 
        
               steps: 
        
               - uses: actions/checkout@v3 
        
               - name: Register gcc problem matcher 
        
                 run: echo "::add-matcher::.github/problem-matchers/gcc.json" 
        
               - name: Install Dependencies 
        
                 run: sudo ./.github/workflows/posix-deps-apt.sh 
        
               - name: Configure OpenSSL env vars 
        
                 run: | 
        
                   echo "MULTISSL_DIR=${GITHUB_WORKSPACE}/multissl" >> $GITHUB_ENV 
        
                   echo "OPENSSL_DIR=${GITHUB_WORKSPACE}/multissl/openssl/${OPENSSL_VER}" >> $GITHUB_ENV 
        
                   echo "LD_LIBRARY_PATH=${GITHUB_WORKSPACE}/multissl/openssl/${OPENSSL_VER}/lib" >> $GITHUB_ENV 
        
               - name: 'Restore OpenSSL build' 
        
                 id: cache-openssl 
        
                 uses: actions/cache@v3 
        
                 with: 
        
                   path: ./multissl/openssl/${{ env.OPENSSL_VER }} 
        
                   key: ${{ runner.os }}-multissl-openssl-${{ env.OPENSSL_VER }} 
        
               - name: Install OpenSSL 
        
                 if: steps.cache-openssl.outputs.cache-hit != 'true' 
        
                 run: python3 Tools/ssl/multissltests.py --steps=library --base-directory $MULTISSL_DIR --openssl $OPENSSL_VER --system Linux 
        
               - name: Add ccache to PATH 
        
                 run: | 
        
                   echo "PATH=/usr/lib/ccache:$PATH" >> $GITHUB_ENV 
        
               - name: Configure ccache action 
        
                 uses: hendrikmuhs/ccache-action@v1.2 
        
               - name: Setup directory envs for out-of-tree builds 
        
                 run: | 
        
                   echo "CPYTHON_RO_SRCDIR=$(realpath -m ${GITHUB_WORKSPACE}/../cpython-ro-srcdir)" >> $GITHUB_ENV 
        
                   echo "CPYTHON_BUILDDIR=$(realpath -m ${GITHUB_WORKSPACE}/../cpython-builddir)" >> $GITHUB_ENV 
        
               - name: Create directories for read-only out-of-tree builds 
        
                 run: mkdir -p $CPYTHON_RO_SRCDIR $CPYTHON_BUILDDIR 
        
               - name: Bind mount sources read-only 
        
                 run: sudo mount --bind -o ro $GITHUB_WORKSPACE $CPYTHON_RO_SRCDIR 
        
               - name: Restore config.cache 
        
                 uses: actions/cache@v3 
        
                 with: 
        
                   path: ${{ env.CPYTHON_BUILDDIR }}/config.cache 
        
                   key: ${{ github.job }}-${{ runner.os }}-${{ needs.check_source.outputs.config_hash }} 
        
               - name: Configure CPython out-of-tree 
        
                 working-directory: ${{ env.CPYTHON_BUILDDIR }} 
        
                 run: | 
        
                   ../cpython-ro-srcdir/configure \ 
        
                     --config-cache \ 
        
                     --with-pydebug \ 
        
                     --with-openssl=$OPENSSL_DIR 
        
               - name: Build CPython out-of-tree 
        
                 working-directory: ${{ env.CPYTHON_BUILDDIR }} 
        
                 run: make -j4 
        
               - name: Display build info 
        
                 working-directory: ${{ env.CPYTHON_BUILDDIR }} 
        
                 run: make pythoninfo 
        
               - name: Remount sources writable for tests 
        
                 # some tests write to srcdir, lack of pyc files slows down testing 
        
                 run: sudo mount $CPYTHON_RO_SRCDIR -oremount,rw 
        
               - name: Setup directory envs for out-of-tree builds 
        
                 run: | 
        
                   echo "CPYTHON_BUILDDIR=$(realpath -m ${GITHUB_WORKSPACE}/../cpython-builddir)" >> $GITHUB_ENV 
        
               - name: "Create hypothesis venv" 
        
                 working-directory: ${{ env.CPYTHON_BUILDDIR }} 
        
                 run: | 
        
                   VENV_LOC=$(realpath -m .)/hypovenv 
        
                   VENV_PYTHON=$VENV_LOC/bin/python 
        
                   echo "HYPOVENV=${VENV_LOC}" >> $GITHUB_ENV 
        
                   echo "VENV_PYTHON=${VENV_PYTHON}" >> $GITHUB_ENV 
        
                   ./python -m venv $VENV_LOC && $VENV_PYTHON -m pip install -U hypothesis 
        
               - name: 'Restore Hypothesis database' 
        
                 id: cache-hypothesis-database 
        
                 uses: actions/cache@v3 
        
                 with: 
        
                   path: ./hypothesis 
        
                   key: hypothesis-database-${{ github.head_ref || github.run_id }} 
        
                   restore-keys: | 
        
                     - hypothesis-database- 
        
               - name: "Run tests" 
        
                 working-directory: ${{ env.CPYTHON_BUILDDIR }} 
        
                 run: | 
        
                   # Most of the excluded tests are slow test suites with no property tests 
        
                   # 
        
                   # (GH-104097) test_sysconfig is skipped because it has tests that are 
        
                   # failing when executed from inside a virtual environment. 
        
                   ${{ env.VENV_PYTHON }} -m test \ 
        
                     -W \ 
        
                     -o \ 
        
                     -j4 \ 
        
                     -x test_asyncio \ 
        
                     -x test_multiprocessing_fork \ 
        
                     -x test_multiprocessing_forkserver \ 
        
                     -x test_multiprocessing_spawn \ 
        
                     -x test_concurrent_futures \ 
        
                     -x test_socket \ 
        
                     -x test_subprocess \ 
        
                     -x test_signal \ 
        
                     -x test_sysconfig 
        
               - uses: actions/upload-artifact@v3 
        
                 if: always() 
        
                 with: 
        
                   name: hypothesis-example-db 
        
                   path: .hypothesis/examples/

I propose adding more cases, where it makes sense.

In general, our tests should be specifically designed to cover every path of interest.

I agree that our regular tests should cover all paths, but there are more to it. Path coverage is only as good as 100%. But, we are obviously limited by the number of data we can provide. We cannot come up with lots of data, our current test suites proove my point.

But, Hypothesis can. This is exactly what it is good at: proving lots of correctly structured data.

Hypothesis has a certain amount of randomness to it

There are different way on how we control the randomness. First, we use a database with examples:

cpython/Lib/test/support/hypothesis_helper.py

Lines 22 to 35 in 5d936b6

    
           from hypothesis.database import ( 
        
               GitHubArtifactDatabase, 
        
               MultiplexedDatabase, 
        
               ReadOnlyDatabase, 
        
           ) 
        
           hypothesis.settings.register_profile( 
        
               "cpython-local-dev", 
        
               database=MultiplexedDatabase( 
        
                   hypothesis.settings.default.database, 
        
                   ReadOnlyDatabase(GitHubArtifactDatabase("python", "cpython")), 
        
               ), 
        
           ) 
        
           hypothesis.settings.load_profile("cpython-local-dev")

Plus, Hypothesis itself controls how data is generated to be repeatable.

One issue that we've had with people pursuing a "let's use Hypothesis" goal is that they are rarely working on modules that they fully understand, that the invariants they mentally invent aren't the actual invariants and reflect bugs in their own understanding rather than actual bugs in the code

This is a valid concern, I hope that collaboration among developers can solve this.
And this is exactly why I created this issue: to figure out what is worth testing and what not.

There may have been one actual (but very minor) bug in the standard library found by Hypothesis, but everything else was just noise.

I don't think that this is actually correct. First, we cannot know what bugs were found by Hypothesis, because people might just post the simplest repro without naming the tool.

Second, we have these bugs that do mention Hypothesis:

Refs #86275

blaisep · 2024-05-21T21:05:28Z

btw, I'm new to Hypothesis and the test_zoneinfo from @pganssle has lots of different use cases, the location is
https://github.com/pganssle/zoneinfo/blob/master/tests/test_zoneinfo_property.py

Edit: Thanks for all your kind warnings, I didn't actually try to use it. I just stared into the abyss and it stared back at me. It's fascinating in an academic sense. The way that some people read about https://en.wikipedia.org/wiki/Australian_funnel-web_spider

pganssle · 2024-05-21T23:16:30Z

I do think we should be expanding hypothesis tests. There was no stipulation in the original acceptance that they would only be used for zoneinfo, the idea was to use property tests more widely.

We should at least start by migrating whichever tests in Zac's repo still make sense (probably most of them).

Since they don't run on all CI, they should always come with @examples, in which case they are essentially parametrized tests (the stubs run the examples).

Zac-HD · 2024-05-22T06:42:49Z

@encukou and I have been working on adding some more tests + adding a devguide section about Hypothesis. #119345 will also be nice to have.

sobolevn added type-feature A feature request or enhancement tests Tests in the Lib/test dir labels Aug 11, 2023

bedevere-bot mentioned this issue Aug 11, 2023

gh-107862: Add roundtrip hypothesis tests to test_binascii #107863

Merged

sobolevn added a commit to sobolevn/cpython that referenced this issue Aug 11, 2023

pythongh-107862: Add roundtrip hypothesis tests to test_binascii

4662195

This comment was marked as off-topic.

Sign in to view

brandonardenwalli mentioned this issue Aug 16, 2023

gh-107906 Remove unused maxsize argument from _init method of queue.Queue #107907

Closed

This comment was marked as off-topic.

Sign in to view

sobolevn mentioned this issue Aug 28, 2023

Use hypothesis for testing the standard library, falling back to stubs #86275

Closed

hauntsaninja pushed a commit that referenced this issue Sep 1, 2023

gh-107862: Add roundtrip hypothesis tests to test_binascii (#107863)

3b73f9f

bedevere-app bot mentioned this issue May 22, 2024

gh-107862: Add property-based round-trip tests for base64 #119406

Open

Zzhiter mentioned this issue May 24, 2024

Property based testing like AFL pytorch/pytorch#63337

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Meta] Research: what can we test with Hypothesis? #107862

[Meta] Research: what can we test with Hypothesis? #107862

sobolevn commented Aug 11, 2023 •

edited by bedevere-app bot

Loading

sobolevn commented Aug 11, 2023

Zac-HD commented Aug 12, 2023

rhettinger commented Aug 13, 2023

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

sobolevn commented Aug 28, 2023

blaisep commented May 21, 2024 •

edited

Loading

pganssle commented May 21, 2024 •

edited

Loading

Zac-HD commented May 22, 2024

[Meta] Research: what can we test with Hypothesis? #107862

[Meta] Research: what can we test with Hypothesis? #107862

Comments

sobolevn commented Aug 11, 2023 • edited by bedevere-app bot Loading

Known applications

Existing work

Linked PRs

sobolevn commented Aug 11, 2023

Zac-HD commented Aug 12, 2023

rhettinger commented Aug 13, 2023

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

sobolevn commented Aug 28, 2023

blaisep commented May 21, 2024 • edited Loading

pganssle commented May 21, 2024 • edited Loading

Zac-HD commented May 22, 2024

sobolevn commented Aug 11, 2023 •

edited by bedevere-app bot

Loading

blaisep commented May 21, 2024 •

edited

Loading

pganssle commented May 21, 2024 •

edited

Loading