Skip to content

Conversation

@nkanazawa1989
Copy link
Collaborator

Summary

This PR adds more flexible execution chain for experiment run.

close #151

Details and comments

Execution chain of experiment can slightly differ per each experiment and it seems to be difficult to implement the perfect base class logic for every experiments.

For example:

  • Currently scheduling constraints (alignment) is set for all experiments. However, this is not necessary for experiments that has no intentional delay instruction. This constraint inserts extra hook to transpiler, and slightly degrade the performance. Thus only T1 and similar experiment should have this configuration.
  • _postprocess_transpiled_circuits is only used by RB to count circuit operations and indeed this method name is too broad context. For example, if we want to implement an experiment that needs count with other feature (that is not RB subclass), we cannot reuse the counting code.
  • Improve calibration experiments #251 calibration experiment needs extra hook to update calibrations object.

In this PR, execution chain is divided into each single piece of processing, and developer can combine them to create a full chain. Each processing step (pre-process/transpile/execution/post-process) is stored in the class attribute as a list of function names for serialization.

Copy link
Contributor

@eggerdj eggerdj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would not make more sense to introduce more structure into what these hooks can be? Right now they seem like a loos collection of functions. Whose signature often varies. Looking at them a little closer I also see that the order of the arguments is inconsistent. For example job and experiment_data below:

def set_analysis(experiment, analysis, job, experiment_data, **kwargs):

def add_job_metadata(experiment, experiment_data, job, run_options, **kwargs):

{}
)
timing_constraints["acquire_alignment"] = getattr(
timing_constraints, "acquire_alignment", 16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this 16 hard-coded here? This should be configurable, e.g. in an option.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed. I just copied it from the original code.

@nkanazawa1989 nkanazawa1989 requested a review from wshanks August 19, 2021 13:41
@nkanazawa1989
Copy link
Collaborator Author

Thanks Daniel for the comment. In current implementation I assume no function takes *args, i.e. arguments are always given as keywords, so order of it doesn't matter. The returned value is used to override var dict (so function returns values in dict) in the event hook's namespace, thus functions always takes all variables generated during the execution chain.
https://github.com/Qiskit/qiskit-experiments/blob/cecdec35dfc53c6db5c5af2977eba7bfb973cf11/qiskit_experiments/framework/events/runner.py#L62-L64

So we can remove all arguments, and instead

def set_analysis(experiment, **kwargs):
    analysis = kwargs.get("analysis")
    job = kwargs.get("job")
    experiment_data.get("experiment_data")

but I guess there is much better way to pass arguments (perhaps using some function decorator).

@wshanks
Copy link
Collaborator

wshanks commented Aug 19, 2021

Hmm, here are some things I don't like about this:

  1. It splits up the logic of BaseExperiment and moves it into several different files. This makes it harder to piece together what the execution order is.
  2. It passes arguments around as one global dictionary which makes it harder to reason about what is being modified by each execution step.
  3. It seems to encourage composing execution from functions in the qiskit_experiments.framework.events namespace rather than custom functions (methods) in the class (though maybe with importlib you could specify a different full import path to the module with your experiment class).

All of those complaints could be mitigated, but at a cost and whether or not the cost is worth it depends on what the benefit of using this abstraction layer is.

One alternative approach is to continue to add more empty _ methods to BaseExperiment that subclasses can override to perform operations at different points of the execution. For example, the scheduling constraints could be something like a _pre_transpile(self, backend) method and the calibrations requirement could be something like _post_submit(self, experiment_data). Using more methods like this addresses the three points above by keeping the method calls in the base class definition and giving them specific inputs and outputs. The downside you noted for this approach is that it could be hard to anticipate what methods and what signatures should be added to give subclasses what they need for the required customization of experiment execution.

@nkanazawa1989
Copy link
Collaborator Author

Thanks @wshanks , I agree to your points. Actually this is similar approach to qiskit transpiler, in which each step is registered as a class initialized in advance with arbitrary vars in the pass manager class namespace. This class approach may improve the visibility, but still this cannot return arbitrary new vars (it is limited to circuit).

I find the another downside of the method approach, that is code reusability. For example, T1, T2, and Tphi characterization need timing constraints, but they are directly created from the BaseExperiment and we need to write the same code for them. I think creating another superclass only for adding timing constraints (e.g. something like ExperimentWithDelay) is overkill (if we define some macro to set the constraint, this is actually same as what is done in the PR). In addition, if we need multiple steps in the method hook, it might be hard to reuse the code even if it is defined in the superclass. Also function-wise unittest becomes harder.

@eliarbel
Copy link
Contributor

eliarbel commented Aug 23, 2021

Hi Noaki, while the suggested runner and events framework allows high degree of flexibility I do think it's an overkill and adds considerable complexity to the existing experiments framework design. Here are a few points that crossed my mind when reviewing the suggested code:

  1. The suggested approach is white-box in nature, meaning that a developer adding a new experiment will have to know a lot about the logic of ExperimentRunner, the corresponding event attributes in BaseExperiments and what the various handlers look like and do. This will make the job of adding a new experiment harder.
  2. Debugability - already today there is a ping-pong of method calls between BaseExperiments and its derived experiments. The new events framework adds many more hops to track and debug through.
  3. Maintainability and readability - this is general comment that applies to any non-trivial code, but something specific for this one: even with modern IDE it's hard to follow through the code, as the IDE cannot jump to a handler definition based on some string in the code (e.g. backend_run). So having this framework largely based on strings is a downside.

So as a big fan of the KISS principle, I think that what Will suggested in his comment (which also goes in line with what issue #151 suggests) is the right approach. Breaking the run flow into smaller pieces which are part of BaseExperiment interface seems a good way to address the current and maybe foreseen challenges while not adding too much complexity to the design and run flow

@nkanazawa1989
Copy link
Collaborator Author

Thanks Eli. I agree the event handler approach has a readability issue and we find we more prefer adding more method approach. Though I don't stick to the proposed code here, what do you think of the usability of code? For example, if we need

Experiment A
   def _post_transpile():
       - step a
       - step b
       - step d

Experiment B
   def _post_transpile():
       - step a
       - step c
       - step d

we cannot reuse the code. Perhaps we can avoid this situation by very carefully designing methods, i.e. provide as many method as possible for each atomic procedure, however this also makes the base class messy and makes it harder to track execution orders. Another problematic situation would be

Experiment A
   def _post_transpile():
       - step a

Experiment B
   def _post_transpile():
       - step a

given step a is just 5 lines of code, do we make a super class for A and B?

@eliarbel
Copy link
Contributor

What about making step a, step b etc. external methods that the various experiments can (re)use? Then the flow can be dictated by calling those building blocks in the _post_transpile method and the like. This level of reuse is the same as the event methods reuse you suggested in this PR (with set_scheduling_contraints and count_transpiled_ops), but by calling those events directly in the various flow methods (e.g. _post_transpile) we keep the code simpler and avoid using method registration with method names as strings as the the ExperimentRunner approach requires.

@nkanazawa1989
Copy link
Collaborator Author

nkanazawa1989 commented Aug 25, 2021

I think that is the reasonable approach. However, if we allow the methods to call externally defined functions, this is maybe the same with the event handler approach if it can take callback instead of strings. The only difference would be

what do you think?

@nkanazawa1989
Copy link
Collaborator Author

Alternative solution would be allowing each method to take pre/post callbacks, like

self._transpile(..., pre_callback=[step_a, step_b], post_callback=[step_c])

if external functions are allowed.

@eliarbel
Copy link
Contributor

If I understand correctly, for the alternative solution you are basically suggesting to have this code:

Class SomeExp(BaseExperiment):
   def _transpile():
       super()._transpile(...., pre_callback=[step_a, step_b], post_callback=[step_c])

instead of:

Class SomeExp(BaseExperiment):
   def _transpile():
       step_a()
       step_b()
       super()._transpile(....)
       step_c()

I may be missing some benefits of the callback approach, but not seeing them, personally I prefer the second option (i.e. explicit calls in the derived class). Maybe other developers can chime in as well, as I think we kind of agreed on the general approach but now it's down to style (again, unless I'm missing some fundamental advantage of the callback approach).

Actually now that I'm thinking about it a bit more, with the explicit method call approach, you can take the results of _transpile and modify them in the derived class, even send them to post-transpile events. Something like that:

Class SomeExp(BaseExperiment):
   def _transpile():
       step_a()
       step_b()
       circuits = super()._transpile(....)
       transorm_circuits_somehow(circuits)
       step_c(circuits)

I'm not sure how you will have this level of flexibility with the callback approach

@nkanazawa1989
Copy link
Collaborator Author

nkanazawa1989 commented Aug 25, 2021

I think this is just a preference of style as you said. I'm fine with the second one.
(I'm just trying to check as much possibilities as possible)

However, I find a issue in this style

Class SomeExp(BaseExperiment):
   def _transpile():
       step_a()
       step_b()
       circuits = super()._transpile(....)
       transorm_circuits_somehow(circuits)
       step_c(circuits)

In this example we need to check what have been done in the superclass to know full event chain. If the class is nested multiple times, tracking the whole events might be hard. I still prefer having explicit hook methods at the expense of simplicity...

@eliarbel
Copy link
Contributor

eliarbel commented Aug 26, 2021

I think that Oliver made a good point yesterday about the entry barrier for users when it comes to adding new experiments. Since deep class hierarchies will probably not be a very common case (I think it's reasonable to assume that most experiments will derive from BaseExperiment or maybe from a common experiment class deriving from BaseExperiment, e.g for cals), I'd think we should balance towards simplicity and user friendliness

@nkanazawa1989
Copy link
Collaborator Author

nkanazawa1989 commented Aug 26, 2021

Okey, that make sense. Given we don't have deep class hierarchies, the hook methods have another advantage. For example, if we want to add some extra feature (like add-on) to experiment, perhaps mix-in class is useful approach like we are discussing in #251 (still not sure if we adopt).

class MyExperiment1(BaseExperiment):
   pass

class AddOnFeature(BaseExperiment):
    def post_transpile(circuits):
       add_custom_gate_definition(circuits)
       return circuits

class MyExperiment2(MyExperiment1, AddOnFeature)
    pass

This allows us to separately define characterization and calibration experiments, i.e. append parameter update logic to base characterization logic to create calibration. Then no more characterization vs calibration discussion happens. Though this is just one of possibilities for now, if we remove hook methods, we cannot realize this framework. i.e. even if we provide add_custom_gate_definition feature in the transpile method of the AddOnFeature class, this cannot override the method provided by the first class MyExperiment1.

So something like

class BaseExperiment:

    def run(*args, **kwargs):
        ...
        # run post transpile
        pre_transpile = getatter(self, "_pre_transpile", None)
        if pre_transpile:
            pre_transpile(*args, **kwargs)
        
        # run standard transpile
        self._transpile(*args, **kwargs)
       
        # run post transpile
        post_transpile = getatter(self, "_post_transpile", None)
        if post_transpile:
            post_transpile(*args, **kwargs)
        ...

works. Perhaps this is not acceptable in the sense of user (contributor) friendliness.

@eliarbel
Copy link
Contributor

I got your point, makes sense. But why not simply overriding the run method in MyExperiment2 ? This gives you a full control over the experiment flow, keeps the code simple and resident in one place. The downside of this is some code duplication of the run method from BaseExperiment, but if we rewrite the run method of BaseExperiment to only call other methods and not do any (or almost anyh) work by itself, that code duplication is minimal.

In addition, I think that in your approach you are bound to how the run method of BaseExperiment is written and that assumptions it has. For example in your pseudo code above you're not using the transpiled circuits as inputs to post_transpile, what if you need to do that ? Then you could of course change the code of run to send the circuits to post_transpile. But then what if some other experiment has a post_transpile hook that does not need to look at the transpiled circuits ? That hook will have to have a circuit param anyway, just because the code of run, specifically the part that calls the post_transpile hook, is fixed. So again, with overriding run method you could control exactly what to do and how, without needing to deal with API commitments the run code in BaseExperiment has.

@nkanazawa1989
Copy link
Collaborator Author

if we rewrite the run method of BaseExperiment to only call other methods and not do any (or almost anyh) work by itself, that code duplication is minimal

I agree. However this doesn't fit in with the parallel experiment.

If we think of the situation we run RB and T1 experiment simultaneously with different qubits (e.g. RB on q0 and T1 on q1). These experiments need different execution logic.

  • RB; Count number of operations after transpile, and add this to metadata
  • T1; Supply scheduling options (alap) to transpile options

If we control the execution with a single run method, we cannot combine two experiments in the same job. However, if we separate run into, at least, three pieces,

  1. create target program (i.e. transpile)
  2. run job and get future object
  3. create experiment data (i.e. add analysis class to callback)

we can cope with this situation. Namely, we can execute 1 and 3 for each sub experiment, and 2 is provided by the base class and not overridable. Within each piece (1 and 3), perhaps calling externally defined methods is the most easy-to-understand approach as you mentioned.

Or, perhaps we can define entire processing routine externally and attach this to the class member, something like

class RB(BaseExperiment):
    __sub_routine_transpile__ = transpile_with_count
    __sub_routine_expdata__ = base_analysis

@nkanazawa1989
Copy link
Collaborator Author

We had enough discussion and PR is finalized in #380

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make BaseExperiment design and run() method more open

4 participants