-
Notifications
You must be signed in to change notification settings - Fork 131
Data processing #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data processing #22
Conversation
Co-authored-by: Naoki Kanazawa <nkanazawa1989@gmail.com>
|
@eggerdj Can you post some comments on the design and code examples of how this is intended to be used to make this easier to parse? |
* Removed the node_type and root node. * Amended tests accordingly.
nkanazawa1989
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eggerdj , this looks really great. Few comments for implementation details.
* Adapted unit tests accordingly.
nkanazawa1989
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! thanks for addressing all of my concerns and comments.
Co-authored-by: Will Shanks <wshaos@posteo.net>
chriseclectic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I have several suggestions that we can discuss more. My main ones are to do with making this perhaps a little more Functional (as in programming), so that individual nodes or the whole data processor could be used interchangeably with regular functions / callables in the code base. (Maybe this is similar to how numpy ufuncs are actually classes that store metadata about their input and output dimensions and dtypes.)
nkanazawa1989
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just minor comments before approval, overall looking good.
Co-authored-by: Naoki Kanazawa nkanazawa1989@gmail.com
Summary
This PR introduces the data processing package which was carved out from PR #20.
Details and comments
Data processing is the steps required to prepare the measured data for analysis. This is done using a
DataProcessorwhich is a chain ofDataActions, i.e. transformations applied to the data in place. A user can specify the actions to apply on the data to process it. For example, the codeCreates a data processor that would take level 0 data, apply a kernel to create IQ data, and then take the real part of this IQ Data while scaling it by a factor 1e-3. Similarly, the data processor
would take IQ data as input, discriminate it into counts and then convert these counts to a population.
An instance of
DataProcessoris then used to process data by doing, for example,Here,
exp_datais an instance ofExperimentData. The data processor will modifydata, an instance ofDict[str, Any], in place. Each node in the processor looks for the type of data it uses as input indata. For example, thePopulation()node will usedata['counts']to create populations which it will insert into data by doingdata['populations'] = .... This makes an instance ofDataProcessorreusable on different input data. Furthermore, since the different steps are contained in the data we can easily check the outcome of each processing step. Finally, each node in theDataProcessordefines the key under which it stores its output data so that we can easily retrieve, from the processor, the output data. Indeedprocessor.output_key()returns the data output key of the last node in the processing chain. For the examples above this output key would bememory_realandpopulations, respectively.