-
Notifications
You must be signed in to change notification settings - Fork 58
abandoned; see 671 #670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
abandoned; see 671 #670
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This resolves issue #655. There's a change to the public API for Report.write_notebook and changes to the code in the notebook it generates. Report.write_notebook only saves data as a pickle if it's called with kwarg ``use_pickle=True``. There's a new Report.results_dict_from_dir function that can accept _either_ the a filename with a ``.pkl`` extension or a directory name. If the pkl extension is present then we try to load from a pickle (to ensure backward-compatibility). Otherwise, we try to load from a directory generated by the new Report.write_results_dict function. I introduced a little testing infrastructure so our write_notebook test actually runs the notebooks that it generates. I also added a test that that calls Report.write_notebook with kwarg ``use_pickle=True``.
Note: For some reason the automatic merge from bugfix into develop failed on PR #660, so this PR is just manually performing that merge. While playing around with MPI parallelization across parameters (instead of circuits, which the default for the map simulator as of 0.9.13) I ran into a bug which I tracked down to a shared memory bug inadvertently introduced with PR #515. Attached is a script, which if run using MPI (I was using 4 ranks) runs a single iteration of two-qubit GST on a single thread and on all four threads and logs the result for comparison. [debug_dist_objective_fn.py](https://github.com/user-attachments/files/22715098/debug_dist_objective_fn.py). Running this script on 0.9.14.1 yields the following output: ``` Number of Ranks is 4 Fit Log Single Rank ----------------------------- Precomputing CircuitOutcomeProbabilityArray layouts for each iteration. Layout for iteration 0 Num Param Processors (1,) MapLayout: 1 processors divided into 1 x 1 (= 1) grid along circuit and parameter directions. 1 atoms, parameter block size limits (None,) *** Distributing 1 atoms to 1 atom-processing groups (1 cores) *** More atom-processors than hosts: each host gets ~1 atom-processors Atom-processors already occupy a single node, dividing atom-processor into 1 param-processors. *** Divided 1-host atom-processor (~1 procs) into 1 param-processing groups *** --- Iterative GST: Iter 1 of 1 555 circuits ---: --- chi2 GST --- --- Outer Iter 0: norm_f = 4.5053e+09, mu=1, |x|=0, |J|=3.86694e+08 - Inner Loop: mu=1.542e+12, norm_dx=4.69768e-06 (cont): norm_new_f=1289.29, dL=4.42172e+09, dF=4.5053e+09, reldL=0.981449, reldF=1 WARNING: Treating result as *converged* after maximum iterations (1) were exceeded. Accepted! gain ratio=1.0189 mu * 0.333333 => 5.14e+11 Least squares message = Maximum iterations (1) exceeded _objfn = 1289.29 (1665 data params - 1440 (approx) model params = expected mean of 225; p-value = 0) Completed in 1.4s Iteration 1 took 1.5s Last iteration: --- logl GST --- --- Outer Iter 0: norm_f = 644.825, mu=1, |x|=0.00216741, |J|=556063 - Inner Loop: mu=3.08599e+06, norm_dx=8.62909e-06 (cont): norm_new_f=549.175, dL=95.681, dF=95.65, reldL=0.148383, reldF=0.148335 WARNING: Treating result as *converged* after maximum iterations (1) were exceeded. Accepted! gain ratio=0.999676 mu * 0.333333 => 1.02866e+06 Least squares message = Maximum iterations (1) exceeded _objfn = 1098.35 (1665 data params - 1440 (approx) model params = expected mean of 225; p-value = 0) Completed in 1.3s Final optimization took 1.3s Fit Log Four Ranks ----------------------------- Precomputing CircuitOutcomeProbabilityArray layouts for each iteration. Layout for iteration 0 Num Param Processors (4,) MapLayout: 4 processors divided into 1 x 4 (= 4) grid along circuit and parameter directions. 1 atoms, parameter block size limits (None,) *** Distributing 1 atoms to 1 atom-processing groups (4 cores) *** *** Using shared memory to communicate between the 4 cores on each of 1 hosts *** More atom-processors than hosts: each host gets ~1 atom-processors Atom-processors already occupy a single node, dividing atom-processor into 4 param-processors. *** Divided 1-host atom-processor (~4 procs) into 4 param-processing groups *** --- Iterative GST: Iter 1 of 1 555 circuits ---: --- chi2 GST --- --- Outer Iter 0: norm_f = 6381.58, mu=1, |x|=0, |J|=3.66935e+11 - Inner Loop: mu=1.44141e+18, norm_dx=4.27565e-18 (cont): norm_new_f=6177.24, dL=4871.93, dF=204.333, reldL=0.763437, reldF=0.0320192 Accepted! gain ratio=0.0419408 mu * 0.3 => 4.32423e+17 WARNING: Treating result as *converged* after maximum iterations (1) were exceeded. Least squares message = Maximum iterations (1) exceeded _objfn = 6177.24 (1665 data params - 1440 (approx) model params = expected mean of 225; p-value = 0) Completed in 0.5s Iteration 1 took 0.6s Last iteration: --- logl GST --- C:\Users\ciostro\Documents\pyGSTi_automated_model_selection\pygsti\objectivefns\objectivefns.py:4471: RuntimeWarning: divide by zero encountered in divide p5over_lsvec = 0.5/lsvec C:\Users\ciostro\Documents\pyGSTi_automated_model_selection\pygsti\objectivefns\objectivefns.py:4471: RuntimeWarning: divide by zero encountered in divide p5over_lsvec = 0.5/lsvec C:\Users\ciostro\Documents\pyGSTi_automated_model_selection\pygsti\objectivefns\objectivefns.py:4471: RuntimeWarning: divide by zero encountered in divide p5over_lsvec = 0.5/lsvec C:\Users\ciostro\Documents\pyGSTi_automated_model_selection\pygsti\objectivefns\objectivefns.py:4471: RuntimeWarning: divide by zero encountered in divide p5over_lsvec = 0.5/lsvec --- Outer Iter 0: norm_f = 2617.65, mu=1, |x|=2.06777e-09, |J|=6.23454e+11 - Inner Loop: mu=4.05554e+18, norm_dx=3.83592e-19 (cont): norm_new_f=2605.54, dL=1399.13, dF=12.1112, reldL=0.534499, reldF=0.00462674 WARNING: Treating result as *converged* after maximum iterations (1) were exceeded. Accepted! gain ratio=0.00865623 mu * 0.3 => 1.21666e+18 Least squares message = Maximum iterations (1) exceeded _objfn = 5211.07 (1665 data params - 1440 (approx) model params = expected mean of 225; p-value = 0) Completed in 0.5s Final optimization took 0.5s ``` Obviously looking at the final objective function values the results are quite different, but what caught my eye is that this was true even at the very initial step before the optimizer had moved at all: Single threaded: norm_f = 4.5053e+09 Multi-threaded: norm_f = 6381.58 Shared memory is a real mind-bender, so I'll spare folks the details on how I tracked this down, but ultimately I tracked this down to [this line](https://github.com/sandialabs/pyGSTi/blob/339c8a3df6e84c9a154e04d42e090ab9d27f78b2/pygsti/objectivefns/objectivefns.py#L4415). The bug here arises because in the multi-threaded setting `lsvec` is an instance of `LocalNumpyArray`, which generally points to an array that lives in shared memory. The square-root on that line was not being guarded, and since each rank was pointing to the same memory that resulted in the square-root being applied as many as `num_ranks` times (in my testing this was actually nondeterministic, though usually 3 or 4, probably a race condition in play). I patterned matched off of other parts of `objectivefns.py` where shared memory is being used (mainly `terms` in `TimeIndependentMDCObjectiveFunction`) and think I got the correct guards added in (the patch seemed to work in my testing). I added back in a few comments that were removed in #515 related to the use of shared memory in this part of the code which I found useful in debugging this. P.S. As for why this wasn't caught sooner, the default forward simulator since 0.9.13 is `MapForwardSimulator`, and the default paralellization strategy for map is across circuits. It happens that when parallelizing only across circuits the portions of the shared memory being indexed into by each rank are non-overlapping, but that isn't true when parallelizing across parameters. So you'd have had to have manually changed the parallelization strategy to see this behavior, which most users don't do unless they have a specific reason to.
…obust to different labelings of the idenity gate, and make the returned model`s default_gauge_group have a direct sum structure; make the default leakage gaugeopt suite 2-stage, where the second stage is over the default gauge group in the target model)
|
I like option 1. Having this in with 0.9.14.2 would be preferable. |
|
Closing to start from a different branch. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is split off from #664. It fixes how wildcard error is computed in leakage modeling. When merged it will resolve #652. It also includes changes to the default leakage-aware gauge optimization suite. The latter changes are needed to prevent gauge optimization from trying to spurriously "spread out" relational errors across available gates.
Ping @coreyostrove. I made this PR by starting from develop. It seems that targeting bugfix after branching from develop is carrying those extra commits (I mean, that is the expected behavior, but I hadn't considered that when branching from develop). How would you like to proceed? Option 1 is I reply these commits on top of a new branch that starts from bugfix (which I think requires closing this PR and opening a new one). Option 2 is I change the target branch to develop. If we go with option 2 then we'd be asking leakage modeling collaborators to use develop until the release of 0.9.15.