Failing Unit Test Bugfixes and Performance Related Testing Refactors #372

coreyostrove · 2023-11-14T18:12:30Z

This PR includes two main changes:

Bugfixes for the failing notebook regression tests uncovered after merging in PR Feature globally germ aware fpr #350. The details of exactly what broke is complicated, but in a nutshell related to some new numerical stability problems with some of the older parts of the germ selection code (why this started randomly breaking now, who knows?). For more details see the thread at Nondeterministic Unit Test Failures #369.
A combination of unit test performance tweaks and refactors to address issue test_fiducialpairreduction takes a very long time #360. Speedups are primarily due to tweaks to the algorithms and drivers testing modules. This includes switching from legacy modelpacks to modern ones in a number of places.

It looks like all of the unit tests, including the extras and the notebook regression tests (which I manually dispatched on this branch) are passing now. Let me know if you have any questions.

One thing worth noting, I had originally merged in the branch bugfix-for-0.9.12.0, but found that this resulted in a number of unit tests in the extras package failing. Check out 'build and run test extras' 211 for more details on that. So I opted to revert that merge on this branch to keep the changes separate (and save myself the effort of tracking that down).

I am unable to reproduce the failure scenario locally on my windows machine, so try some remote debugging on the github runners.

Not the most graceful, but only import qibo if the correct version. Will be removed next major release due to growing divergence between actual and desired interface

The streamlines the tests in the algorithms test module and does a refactor of the test fixtures to use a more modern modelpack format. Also improves coverage on fiducial selection codebase.

Add the failing line of the germ selection notebook as a unit test to try and get more verbose output on the failure from the runner. Also temporarily tweak main.yml to only test against windows for faster turnaround.

Some more verbosity for debugging

minor typo fix

Try using the schur decomposition in the twirling superoperator construction instead. Also some unrelated fixes for the fiducial pair reduction unit tests.

…e-faster-algorithm-tests

To use a more modern modelpack, and also a smaller one which speeds up the testing a bit.

Streamlined the RB unit tests to test on a fewer number of qubits for a modest reduction in the runtime.

Change the distribution CLI option for xdist to ensure tests remain properly grouped across workers.

Reduce the number of parameterizations fit in the QutritGST demo notebook to reduce runtime (it was periodically timing out on certain cells on the github runners).

Update the implementation of the twirling superoperator to check the commutator of the input matrix and branch off of that to determine the decomposition to use. Also rename the new germ selection unit test.

Resume testing against the entire module.

Now that the evotype dependent serialization bug is resolved go ahead with disabling checkpointing on these tests.

…o feature-faster-algorithm-tests" This reverts commit b4325dc, reversing changes made to 38ca39d.

See above title.

rileyjmurray

Mostly looks good! I have two substantive comments for the eig vs Schur issue. I also have a question about the level of user-exposure for code in pygsti/algorithms/directx.py.

Somewhat tangentially, I stumbled across this comment block for the Label class, which says that it's supposed to be a base class only. Based on how the Label class is used in practice, it seems like this comment is wrong. Corey, what's your take?

rileyjmurray · 2023-11-14T18:28:39Z

pygsti/algorithms/directx.py

+                sigma, 
+                Label('GsigmaLbl') if sigma.line_labels == ('*',) else Label('GsigmaLbl', sigma.line_labels), 
+                dataset, prep_fiducials, meas_fiducials, target_model,


Interesting that you've made a change to this file, since Erik just marked it as minimal-exposure. @coreyostrove any chance you'd disagree with Erik's take on this file's user exposure? Here's the conventions we agreed to, for reference:

High exposure: API changes are likely to break workflows for some users, including "casual" users.

Low exposure: API changes are unlikely to affect casual users of pyGSTi but might affect some of pyGSTI's advanced users.

Minimal exposure: while it's conceivable that API changes could break user code, the possibility of this is sufficiently remote that we can change the API at will.

Good question. I agree with Erik's marking of this as minimal-exposure. This change arose as a bugfix for some failures encountered in the corresponding test package (test_directx.py) as part of the refactor moving from legacy modelpacks to the modern implementation in testing. This particular line of code turned out to be fragile when it came to circuit line labeling, and so broke when using circuits from the new modelpacks that have different default circuit line labeling behavior than they do in the legacy modelpacks.

In reality I suspect this is likely due to a disjoint bug in the circuit line label handling code itself in the circuits module that I've run into in other contexts and mentioned to @sserita before, but haven't ever properly documented. (This is a bug related to properly reconciling line labels between circuits with a placeholder '*' and those with fully specified labels). In the moment it was easier to patch in a workaround in directx than to context switch and try to track the circuit bug. That said, I am in the correct headspace for this sort of thing now, so if nothing else I should be able to get a minimal reproduction of the aforementioned bug documented pretty soon.

While I was working in the directx code I did wonder to myself if we even need any of this anymore? I.e. insofar as it is minimal exposure I suspect that is because few (possibly 0) people use it in practice and there are now generally better (more pygstithonic) ways to produce the same results. I've personally never interacted with this code in practical usage, and don't know of anyone who does use this code. Given that, maybe this would be a decent candidate for a module we can retire to legacy as part of out spring cleaning process. @enielse and @sserita, thoughts on this last question?

Update: I just read through @enielse's latest commit on the code marking branch and saw that he has marked directx as:

user-exposure: minimal, forsaken (EGN - we don't use this anymore and never really did)

So that is two votes in favor of decommissioning this part of the codebase. @sserita, how do you feel about formally marking this as deprecated in 0.9.12.0 and then removing it (remove = move to the legacy folder) either in 0.9.13.0 or some point thereafter?

I agree with the minimal tag and we can formally mark this as deprecated. I will do so in the bugfix-for-0.9.12 branch before I merge that in.

test/unit/algorithms/test_fiducialselection.py

pygsti/algorithms/germselection.py

This commit adds a new kwarg for specifying the tolerance to use in determining whether a matrix is normal for the purposes of selecting the correct decomposition algorithm. Also adds a proper docstring explaining the various function arguments and outputs and removes some holdover print statements from debugging.

coreyostrove · 2023-11-14T20:09:39Z

RE: Label. I use the base Label class directly pretty regularly, for what that is worth. I suspect that I am not alone in that regard, and that the comment block you mentioned probably accurately describes the initial design intentions, but not present behavior.

sserita

This all looks good, thanks for your hard work on this!

sserita and others added 20 commits November 9, 2023 14:52

Fix for #365 and #367

344be66

Fix #368

5828b15

Remote notebook debugging

a765aa1

I am unable to reproduce the failure scenario locally on my windows machine, so try some remote debugging on the github runners.

Fix #363.

4bfecfe

Not the most graceful, but only import qibo if the correct version. Will be removed next major release due to growing divergence between actual and desired interface

Speed up runtime for algorithms test module

60d0395

The streamlines the tests in the algorithms test module and does a refactor of the test fixtures to use a more modern modelpack format. Also improves coverage on fiducial selection codebase.

Additional notebook regression debugging

319d7be

Add the failing line of the germ selection notebook as a unit test to try and get more verbose output on the failure from the runner. Also temporarily tweak main.yml to only test against windows for faster turnaround.

More Notebook Regression debugging

61fc6cf

Additional debugging output

3e267de

Some more verbosity for debugging

Typo fix

d068180

minor typo fix

Try using Schur Decomposition

38ca39d

Try using the schur decomposition in the twirling superoperator construction instead. Also some unrelated fixes for the fiducial pair reduction unit tests.

Merge remote-tracking branch 'origin/bugfix-for-0.9.12.0' into featur…

b4325dc

…e-faster-algorithm-tests

Refactor drivers test package

90a2e02

To use a more modern modelpack, and also a smaller one which speeds up the testing a bit.

Minor RB Test Speedups

654578b

Streamlined the RB unit tests to test on a fewer number of qubits for a modest reduction in the runtime.

Update options in extras workflow

cdda5c1

Change the distribution CLI option for xdist to ensure tests remain properly grouped across workers.

Reduce QutritGST mode count

7b87b64

Reduce the number of parameterizations fit in the QutritGST demo notebook to reduce runtime (it was periodically timing out on certain cells on the github runners).

Update schur decomposition implementation

4dd35f8

Update the implementation of the twirling superoperator to check the commutator of the input matrix and branch off of that to determine the decomposition to use. Also rename the new germ selection unit test.

Resume testing against full suite

fa92a5f

Resume testing against the entire module.

Disable checkpointing on testing

7a5f81e

Now that the evotype dependent serialization bug is resolved go ahead with disabling checkpointing on these tests.

Revert "Merge remote-tracking branch 'origin/bugfix-for-0.9.12.0' int…

8f46b6a

…o feature-faster-algorithm-tests" This reverts commit b4325dc, reversing changes made to 38ca39d.

Auto opening reports is annoying when doing automated testing.

8404431

See above title.

coreyostrove requested review from rileyjmurray and sserita November 14, 2023 18:12

rileyjmurray reviewed Nov 14, 2023

View reviewed changes

rileyjmurray approved these changes Nov 14, 2023

View reviewed changes

sserita approved these changes Nov 16, 2023

View reviewed changes

sserita merged commit f2c28b5 into develop Nov 16, 2023
13 checks passed

sserita deleted the feature-faster-algorithm-tests branch November 16, 2023 18:58

coreyostrove mentioned this pull request Nov 17, 2023

Bugfix stricter line label enforcement #373

Merged

sserita mentioned this pull request Dec 2, 2023

Faster unit tests/notebooks #380

Closed

86 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing Unit Test Bugfixes and Performance Related Testing Refactors #372

Failing Unit Test Bugfixes and Performance Related Testing Refactors #372

coreyostrove commented Nov 14, 2023

rileyjmurray left a comment

rileyjmurray Nov 14, 2023

coreyostrove Nov 14, 2023 •

edited

Loading

coreyostrove Nov 14, 2023

sserita Nov 16, 2023

coreyostrove commented Nov 14, 2023

sserita left a comment

Failing Unit Test Bugfixes and Performance Related Testing Refactors #372

Failing Unit Test Bugfixes and Performance Related Testing Refactors #372

Conversation

coreyostrove commented Nov 14, 2023

rileyjmurray left a comment

Choose a reason for hiding this comment

rileyjmurray Nov 14, 2023

Choose a reason for hiding this comment

coreyostrove Nov 14, 2023 • edited Loading

Choose a reason for hiding this comment

coreyostrove Nov 14, 2023

Choose a reason for hiding this comment

user-exposure: minimal, forsaken (EGN - we don't use this anymore and never really did)

sserita Nov 16, 2023

Choose a reason for hiding this comment

coreyostrove commented Nov 14, 2023

sserita left a comment

Choose a reason for hiding this comment

coreyostrove Nov 14, 2023 •

edited

Loading