Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI checks are slow, especially pytest and coverage #6211

Closed
pavoljuhas opened this issue Jul 19, 2023 · 20 comments · Fixed by #6331
Closed

CI checks are slow, especially pytest and coverage #6211

pavoljuhas opened this issue Jul 19, 2023 · 20 comments · Fixed by #6331
Assignees
Labels
kind/health For CI/testing/release process/refactoring/technical debt items triage/accepted there is consensus amongst maintainers that this is a real bug or a reasonable feature to add

Comments

@pavoljuhas
Copy link
Collaborator

pavoljuhas commented Jul 19, 2023

Description of the issue

Lately the CI coverage and pytest checks slowed down and can take up to 1 hour on GitHub actions.
The duration times fluctuate and may depend on the random seed used in the test.
pytest is already run in parallel and distributes test cases over available CPUs.

Example durations from check/pytest --durations=50 --durations-min=10

============================================ slowest 50 durations ============================================
173.11s call     cirq-ft/cirq_ft/algos/state_preparation_test.py::test_state_preparation_via_coherent_alias_sampling[3-0.003]
167.28s call     cirq-ft/cirq_ft/algos/state_preparation_test.py::test_state_preparation_via_coherent_alias_sampling[7-0.008]
40.77s call     cirq-ft/cirq_ft/algos/qubitization_walk_operator_test.py::test_qubitization_walk_operator[4-0.2]
40.45s call     cirq-ft/cirq_ft/algos/state_preparation_test.py::test_state_preparation_via_coherent_alias_sampling[4-0.005]
39.47s call     cirq-ft/cirq_ft/algos/qrom_test.py::test_qrom_multi_dim[2-data0]
37.43s call     cirq-ft/cirq_ft/algos/qubitization_walk_operator_test.py::test_qubitization_walk_operator[3-0.1]
31.06s call     cirq-ft/cirq_ft/algos/select_swap_qrom_test.py::test_select_swap_qrom[3-data1]
25.74s call     cirq-ft/cirq_ft/algos/state_preparation_test.py::test_state_preparation_via_coherent_alias_sampling[2-0.003]
24.89s call     cirq-ft/cirq_ft/algos/qrom_test.py::test_qrom_1d[2-data2]
14.36s call     cirq-ft/cirq_ft/algos/unary_iteration_gate_test.py::test_multi_dimensional_unary_iteration_gate[target_shape1]
14.26s call     cirq-ft/cirq_ft/algos/unary_iteration_gate_test.py::test_multi_dimensional_unary_iteration_gate[target_shape0]
14.16s call     cirq-ft/cirq_ft/algos/hubbard_model_test.py::test_notebook
13.95s call     cirq-ft/cirq_ft/algos/qubitization_walk_operator_test.py::test_notebook
11.07s call     cirq-ft/cirq_ft/algos/qrom_test.py::test_qrom_multi_dim[1-data0]
10.58s call     cirq-core/cirq/transformers/analytical_decompositions/quantum_shannon_decomposition_test.py::test_random_qsd_n_qubit[7]
10.34s call     cirq-ft/cirq_ft/algos/swap_network_test.py::test_notebook

(34 durations < 10s hidden.  Use -vv to show these durations.)

Cirq version

1.2 at e16fbf4

Ideas

  • identify slow tests and review them - can they be simplified to run faster?
  • increase the CPUs available on GitHub actions runners.
  • check/pytest uses -n auto by default. Is this using all runners on GHA or just 50%?
    This was tested in a withdrawn PR which showed no difference for -n $(nproc) vs -n auto.
    Use all cores in CI-run pytest jobs #6215 (comment).
  • move slow but necessary tests to CI-daily and skip them in PR checks
@pavoljuhas pavoljuhas added the kind/health For CI/testing/release process/refactoring/technical debt items label Jul 19, 2023
@pavoljuhas
Copy link
Collaborator Author

cc: @nafaynajam

@nafaynajam
Copy link
Contributor

nafaynajam commented Jul 24, 2023

@pavoljuhas Hi, I have forked repo recently, slowly learning about the project. Ran the tests for cirq-ft for different number of n. It appears to me that parallelization is not implemented for local unit tests as changing number of n is not changing execution time, I had expected it would distribute loads to multiple CPUs. For 0, it gave 413.17 seconds execution time, and gave 413.65 for n=8. Am I lacking any dependency such as xdist (problem is on my side?)? Or is parallelization not implemented at all?.
If No, Is there any limitation due to which its not implemented such as a need for state isolation for tests?

@pavoljuhas
Copy link
Collaborator Author

pavoljuhas commented Jul 24, 2023

In my case I see a different durations: 344s for -n0 and 130s for -n6.
This is for a Linux Python 3.9.16 at 43d0372 using

check/pytest --randomly-seed=173790869 -n VALUE cirq-ft/cirq_ft

The option -n count is provided by the xdist plugin so pytest -n ... should not even start if xdist were not installed.
The times you observed are effectively the same so it looks like your tests are just running on a single CPU for some reason. Note that the start of pytest output should show how many workers are running when in parallel mode.
Also to be able to compare different test runs, you need to provide a fixed --randomly-seed=VALUE because test durations and ordering depend on the seed.

To be sure you are using comparable dependencies as in the CI, I'd recommend to do the test in a dedicated virtual environment for Python 3.9 and install dependencies with

$ pip install -r dev_tools/requirements/dev.env.txt

PS: I am not sure why the 130s total time for pytest ... cirq-ft/cirq_ft is smaller than the longest individual test in #6211 (comment). Those tests were run on entire cirq so perhaps their duration was more prone to be affected by resource usage of parallel workers.

@pavoljuhas pavoljuhas added the triage/accepted there is consensus amongst maintainers that this is a real bug or a reasonable feature to add label Jul 26, 2023
@jackreeceejini
Copy link
Contributor

As a temp measure, I would suggest we move the code coverage section to CI-daily?

@pavoljuhas
Copy link
Collaborator Author

pavoljuhas commented Jul 27, 2023

As a temp measure, I would suggest we move the code coverage section to CI-daily?

Coverage is actually helpful for the PR to see if the changed lines are tested.

@tanujkhattar - the first 2 slowest tests in #6211 (comment) happen for a few parameters of state_preparation_test.py::test_state_preparation_via_coherent_alias_sampling -
can you please check if that test can be optimized or limited to parameters that run quickly?

I feel that for common unit tests we should only exercise all code lines (and branches) and collect coverage in a cheapest way possible. If we need to test correctness of some expensive calculations, we should move such tests to ci-daily;
(we can add a new test marker daily for such purpose).

@tanujkhattar
Copy link
Collaborator

If we need to test correctness of some expensive calculations, we should move such tests to ci-daily

This sounds like a good plan to me. We currently don't have a test maker daily to mark expensive tests that should be excluded from the CI, right?

@nafaynajam
Copy link
Contributor

@pavoljuhas Can I reach you through your email to discuss the issue?.

@tanujkhattar
Copy link
Collaborator

@nafaynajam Pavol is currently out of office but will be back in a couple of days. Feel free to ask your questions on the issue in the meantime.

@nafaynajam
Copy link
Contributor

nafaynajam commented Aug 8, 2023

@tanujkhattar Allow me some basic questions as I am getting to know the framework. I see in the issue that pavol is only running Cirq-ft tests in the screenshot, why not others as well. I have seen that we have tests for every algo in all directories. Do we only intend to only cut Cirq-ft tests?.
Secondly I see in the CI file that we are excluding Cirq-core tests as we have this command: "--ignore=cirq-core/cirq/contrib --rigetti-integration" in the yml file. Why is that?.
Thirdly I have been running Cirq-ft tests locally using this command: "pytest ../cirq-ft/cirq_ft" when inside check directory, what is the command to run the whole project tests and the commands as if I try to pick up all directories I get errors. I am running windows 11, PyCharm venv environment.

@pavoljuhas
Copy link
Collaborator Author

If we need to test correctness of some expensive calculations, we should move such tests to ci-daily

This sounds like a good plan to me. We currently don't have a test maker daily to mark expensive tests that should be excluded from the CI, right?

Yes, that is correct.

@pavoljuhas
Copy link
Collaborator Author

... I see in the issue that pavol is only running Cirq-ft tests in the screenshot, why not others as well. I have seen that we have tests for every algo in all directories. Do we only intend to only cut Cirq-ft tests?.

@nafaynajam - the output in my initial comment shows 16 tests with the slowest duration. All but one happen in the cirq-ft package. check/pytest ran all tests in the cirq repository, they are just sorted by execution time.

Secondly I see in the CI file that we are excluding Cirq-core tests as we have this command: "--ignore=cirq-core/cirq/contrib --rigetti-integration" in the yml file. Why is that?

--ignore=cirq-core/cirq/contrib excludes tests in the cirq.contrib subpackage. That package is for experimental functions that are not production ready so their failures should not block PR.
On the other hand, --rigetti-integration activates tests decorated with @pytest.mark.rigetti_integration which are otherwise skipped. The option is only effective for the cirq-rigetti package. The CI Python environment is setup with required dependencies so these tests can run.

Thirdly I have been running Cirq-ft tests locally using this command: "pytest ../cirq-ft/cirq_ft" when inside check directory, what is the command to run the whole project tests and the commands as if I try to pick up all directories I get errors. I am running windows 11, PyCharm venv environment.

I would recommend to use a Linux-like development environment, for example WSL, and then run tests with the check/pytest shell script.

@nafaynajam
Copy link
Contributor

@pavoljuhas Thank you :-)

@nafaynajam
Copy link
Contributor

nafaynajam commented Aug 13, 2023

@pavoljuhas @tanujkhattar
I propose the following solution to the issue. I will add the tests to ignore in the excluded_text.txt file which will only be excluded at local level, with echo, the excluded tests will also be printed out before test execution. The following code can be added to the Pytest file: 

Fetch excluded tests from the file if IGNORE_EXCLUDED_TESTS is not set

if [ -z "$IGNORE_EXCLUDED_TESTS" ]; then
 excluded_tests_file="${topdir}/check/excluded_tests.txt"
 excluded_tests=()
 while IFS= read -r line; do
 excluded_tests+=("$line")
 done < "$excluded_tests_file"

Print excluded tests

 echo "Excluded tests:"
 for test in "${excluded_tests[@]}"; do
 echo "- $test"
 done

Generate a string of --ignore arguments for excluded tests

 ignore_args=""
 for test in "${excluded_tests[@]}"; do
 ignore_args+="--ignore=\"$test\" "
 done

Exclude specified tests from execution

 PYTEST_ARGS+=($ignore_args)
fi

The daily CI file can be edited this way to reflect the changes in Pytest.

- name: Setup Excluded Tests
 run: |

Fetch excluded tests from the file if IGNORE_EXCLUDED_TESTS is not set

 if [ -z "$IGNORE_EXCLUDED_TESTS" ]; then
 excluded_tests_file="excluded_tests.txt"
 excluded_tests=()
 while IFS= read -r line; do
 excluded_tests+=("$line")
 done < "$excluded_tests_file"

Generate a string of --ignore arguments for excluded tests

 ignore_args=""
 for test in "${excluded_tests[@]}"; do
 ignore_args+="--ignore=\"$test\" "
 done

Store the ignore_args in an environment variable for later use

 echo "IGNORE_ARGS=$ignore_args" >> $GITHUB_ENV
 fi
- name: Pytest check

 run: |
 ### Use the IGNORE_ARGS environment variable if set
 if [ -n "$IGNORE_ARGS" ]; then
 check/pytest $IGNORE_ARGS --rigetti-integration
 else
 check/pytest --rigetti-integration
 fi

I am debugging some problems I am having, let me know if the route I am taking is correct & if you have any insights.

@pavoljuhas
Copy link
Collaborator Author

pavoljuhas commented Aug 16, 2023

We already have a slow marker defined here -

config.addinivalue_line("markers", "slow: mark tests as slow")

It is only provided for the dev_tools subpackage.

TODO

  • let's make the slow marker available in all cirq packages
  • adjust the CI scripts so that slow tests are excluded from PR checks, but they are executed in the daily CI runs
  • @tanujkhattar will review the cirq-ft tests and decorate them with the slow marker as needed

@nafaynajam
Copy link
Contributor

nafaynajam commented Sep 4, 2023

Hi @pavoljuhas @tanujkhattar , I propose the following solution:

Make the following changes to ci.yml line 170-171:
- name: Pytest check (without slow tests)
run: check/pytest -n auto --ignore=cirq-core/cirq/contrib -m "not slow"

Make the following changes to ci-daily.yml line 38-39:
- name: Pytest check (Include slow tests)
run: check/pytest -n auto --rigetti-integration

Exclude the tests with the following marker:
@pytest.mark.slow # Add the slow marker before the test function

To run locally without the slow tests:
check/pytest -k "not slow"

Let me know if this solution is feasible or there is an issue with it in your experience.

@pavoljuhas
Copy link
Collaborator Author

Hi @nafaynajam, apologies about a slow response. First the -m "not slow" expression is not necessary, because slow-marked tests are already skipped. To enable both slow and other tests one could do -m "slow or not slow" but that looks confusing. A better way is to add a custom option to pytest, say --enable-slow-tests which would tell pytest to run (ie, not skip) the slow-marked tests; this can act similarly as the --rigetti-integration option here.

The pytest setup could be cleaned up. As it is the custom markers rigetti_integration, slow, weekly, are defined in 2 different conftest.py files as are the pytest_collection_modifyitems hooks that select/deselect marked tests.
I suggest to

  1. convert all config.addinivalue_line("markers", ...) statements to equivalent lines in pyproject.toml. This will let pytest --markers show our markers too (currently not displayed, I guess the --markers output happens before loading conftest.py files).
  2. add a new --enable-slow-tests option to the toplevel conftest.py
  3. combine and move pytest_collection_modifyitems functions to the top-level conftest.py so that test selection/deselection based on markers and --rigetti-integration / --enable-slow-tests options happens at one place.
  4. use the new --enable-slow-tests option with pytest in ci-daily.yml and
  5. mark the 2 slowest tests from the first-comment as slow. I'd suggest to rerun the check/pytest --durations=50 ... to make sure they are still significantly slower than others.

@nafaynajam
Copy link
Contributor

@pavoljuhas I have the following tests coming up as slowest which seem to differ from your list, which ones should I mark slow for test purposes? .

tests

@pavoljuhas
Copy link
Collaborator Author

which ones should I mark slow for test purposes?

Test durations may depend on random seed which is different from run to run because we use pytest-randomly.
I ran all tests on my box several times in a single cpu mode (with -n 0) and then picked tests that took over 20 seconds in some of the runs.

Please see the list below. You may want to check https://docs.pytest.org/en/stable/example/markers.html#marking-individual-tests-when-using-parametrize if only some of the parametrized tests below need to be marked slow.

@tanujkhattar - are you OK with skipping these in PR checks?
The tests will be included in a daily-scheduled CI runs.

time (std)      test name
------------------------------------------------------------------------------------------------------------------------
500 (100)       cirq-ft/cirq_ft/algos/selected_majorana_fermion_test.py::test_selected_majorana_fermion_gate[target_gate0-4-9]
490 (44)        cirq-ft/cirq_ft/algos/selected_majorana_fermion_test.py::test_selected_majorana_fermion_gate[target_gate1-4-9]
40 (1)          cirq-ft/cirq_ft/algos/selected_majorana_fermion_test.py::test_selected_majorana_fermion_gate[target_gate0-3-8]
40 (1)          cirq-ft/cirq_ft/algos/selected_majorana_fermion_test.py::test_selected_majorana_fermion_gate[target_gate1-3-8]

97 (3)          cirq-ft/cirq_ft/algos/reflection_using_prepare_test.py::test_reflection_using_prepare[0.01-5]
97 (3)          cirq-ft/cirq_ft/algos/reflection_using_prepare_test.py::test_reflection_using_prepare[0.01-6]
32 (0.6)        cirq-ft/cirq_ft/algos/reflection_using_prepare_test.py::test_reflection_using_prepare[0.01-7]
27 (0.5)        cirq-ft/cirq_ft/algos/reflection_using_prepare_test.py::test_reflection_using_prepare[0.01-8]

53 (16)         cirq-ft/cirq_ft/algos/qrom_test.py::test_qrom_multi_dim[2-data0]

39 (2)          cirq-ft/cirq_ft/algos/state_preparation_test.py::test_state_preparation_via_coherent_alias_sampling[4-0.005]
27 (1)          cirq-ft/cirq_ft/algos/state_preparation_test.py::test_state_preparation_via_coherent_alias_sampling[2-0.003]

28 (10)         cirq-ft/cirq_ft/algos/qubitization_walk_operator_test.py::test_qubitization_walk_operator[3-0.1]
26 (0.2)        cirq-ft/cirq_ft/algos/qubitization_walk_operator_test.py::test_qubitization_walk_operator[4-0.2]

25 (0.4)        cirq-ft/cirq_ft/algos/select_swap_qrom_test.py::test_select_swap_qrom[3-data1]

22 (4)          cirq-ft/cirq_ft/algos/qrom_test.py::test_qrom_1d[2-data2]

pavoljuhas added a commit that referenced this issue Oct 27, 2023
#6328)

* Move all pytest_collection_modifyitems hooks to the top conftest.py and
  apply mark-based test-selection there
* Move custom marks definitions to pyproject.toml
  so they show in `pytest --markers`
* Add custom pytest option `--enable-slow-tests` to run slow-marked tests.
  These are otherwise skipped by default.
* Support a quick check of test selection using
  `pytest --co -m skip` and `pytest --co -m "not skip"`

Partially implements #6211
---------

Co-authored-by: Pavol Juhas <juhas@google.com>
@tanujkhattar
Copy link
Collaborator

Yes, this is a good list. Although these tests will go away soon since we'v migrated all the Cirq-FT code to Qualtran. But as a proof of concept, let's mark these as slow for now. Thanks!

@pavoljuhas
Copy link
Collaborator Author

Found 3 more tests that took well over 20 seconds (depending on random seed)

time (std)      test name
------------------------------------------------------------------------------------------------------------------------
1050 (13)       cirq-ft/cirq_ft/algos/unary_iteration_gate_test.py::test_multi_dimensional_unary_iteration_gate[target_shape0]

119 (2)         cirq-ft/cirq_ft/algos/state_preparation_test.py::test_state_preparation_via_coherent_alias_sampling[3-0.003]
167 (1)         cirq-ft/cirq_ft/algos/state_preparation_test.py::test_state_preparation_via_coherent_alias_sampling[7-0.008]

pavoljuhas added a commit that referenced this issue Oct 30, 2023
* Add slow marks to the unit tests with over 20 second duration.
* Update the ci-daily job so it runs all slow tests.
* Add ci-daily jobs for Windows and MacOS operating systems
  so that slow tests are tested on these platforms too.

Refs: 
* #6211 (comment)
* #6211 (comment)

Resolves #6211
pavoljuhas added a commit to pavoljuhas/Cirq that referenced this issue Oct 30, 2023
No change in the test execution, just a bit more output for the
`Pytest Ubuntu`, `Pytest Windows`, and `Pytest MacOS` jobs.

Related to quantumlib#6211
pavoljuhas added a commit that referenced this issue Oct 30, 2023
No change in the test execution, just a bit more output for the
`Pytest Ubuntu`, `Pytest Windows`, and `Pytest MacOS` jobs.

Related to #6211
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/health For CI/testing/release process/refactoring/technical debt items triage/accepted there is consensus amongst maintainers that this is a real bug or a reasonable feature to add
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants