Binary output #37

rikigigi · 2020-10-02T08:40:29Z

@lorisercole
Right now, the default binary output is a pickle dumped blob that, for a first time user, I think it is difficult to understand. Its content is:

['KAPPA_SCALE',
 'TEMPERATURE',
 'TSKIP',
 'UNITS',
 'VOLUME',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'cepstral_log',
 'j_DT_FS',
 'j_Nyquist_f_THz',
 'j_PSD_FILTER_W_THz',
 'j_cospectrum',
 'j_fcospectrum',
 'j_flogpsd',
 'j_fpsd',
 'j_freqs_THz',
 'j_logpsd',
 'j_psd',
 'jf_DT_FS',
 'jf_Nyquist_f_THz',
 'jf_dct_Kmin_corrfactor',
 'jf_dct_aic_Kmin',
 'jf_dct_kappa',
 'jf_dct_kappa_THEORY_std',
 'jf_dct_logpsd',
 'jf_dct_logpsdK',
 'jf_dct_logpsdK_THEORY_std',
 'jf_dct_logtau',
 'jf_dct_logtau_THEORY_std',
 'jf_dct_psd',
 'jf_flogpsd',
 'jf_fpsd',
 'jf_freqs_THz',
 'jf_logpsd',
 'jf_psd',
 'jf_resample_log',
 'kappa_Kmin',
 'kappa_Kmin_std',
 'units',
 'write_old_binary']

Is it used by anyone or anywhere in the code? Is it safe to change the default binary output to the one equivalent to the human readable one but with numpy arrays?

The text was updated successfully, but these errors were encountered:

lorisercole · 2020-10-02T13:58:46Z

The content of the default bin format is simply an object with those attributes.
However, I would also avoid splitting the binary output in many files: it does not make sense.

I think we can simplify this by saving many arrays/variables in a numpy or json file (we need to test this). Like this:

tc_dict = {
    'j': {
        'DT_FS': j.DT_FS,
        'KAPPA_SCALE': j.KAPPA_SCALE,
        'psd': j.psd,
         ...
    },
    'jf': {
        'DT_FS': j.DT_FS,
        'KAPPA_SCALE': j.KAPPA_SCALE,
        'psd': j.psd,
         ...
    },
    ...
}

Or with less-readable code:

tc_dict = {
    'j': {},
    'jf': {},
    ...
}
attrs_to_save = ['DT_FS', 'KAPPA_SCALE', 'psd', ...]
for key in tc_dict.keys():
    for attr in attrs_to_save:
        tc_dict[key][attr] = getattr(locals()[key], attr)

(we should find a smarter solution if the dictionary is more deeply-nested)

Then save it using numpy.save('binary_output.npy', **tc_dict) or json.dump(open('binary_output.json', 'w')).

We will then need functions to reconstruct the Currents objects, etc, from this binary file...

What do you think?

Working on #37. A first draft of SportranBinaryFile. The useful storable attributes (input_parameters, settings, current, current_resampled, output_results) should be further defined/corrected. TODO: - Define proper data structures: SportranInput, SportranSettings, SportranOutput, ... these can be seen as input/outputs of a "Workflow". A workflow for example is defined in analysis.py - Define functions that collect all the data useful to save the current calculation/namespace and dumps it into a SportranBinaryFile. - Define functions that extract and use the data of a SportranBinaryFile to restore a calculation/data namespace.

rikigigi assigned lorisercole and rikigigi Oct 2, 2020

lorisercole added the type/refactoring label Oct 2, 2020

lorisercole added this to the New core API milestone Oct 2, 2020

lorisercole mentioned this issue Nov 10, 2020

analysis CLI: update #42

Closed

4 tasks

lorisercole added the requires discussion label Dec 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary output #37

Binary output #37

rikigigi commented Oct 2, 2020

lorisercole commented Oct 2, 2020

Binary output #37

Binary output #37

Comments

rikigigi commented Oct 2, 2020

lorisercole commented Oct 2, 2020