Series handling #11

ScottSoren · 2021-06-30T14:37:43Z

Hi Kenneth,

This PR replaces #6 .

It implements most of what we agreed on in our workshops last month :

Make __getitem__ king
Use reader-provided aliases for essential series which need to be accessed under another name
Use a ConstantValue refering to a tseries as a stand-in for missing but known essential series (such as current in OCP)
Implement a cach (_cached_series) for __getitem__
Assign the job of calibration to a separate Calibration object (this became calibrations, derived from _calibration_list, and a new Saveable class family.)
Use a MemoryBackend to keep track of objects that might need to be passed via a dictionary representation. The methods cut, select, as_cv and __add__ all make use of this (via Saveable.as_dict(), fill_object_list() and PlaceHolderObject to ensure the returned Measurement can find the calibrations, etc of the first Measurement.

The robustness of the ixdat in this PR is ensured by the test test_biologic_ec_measurement.py using example data contained the repository, and tools.rst is updated to help other developer noobs like myself get that working with a pre-push hook.

This PR does not address a fix of the table definitions needed to implement an SQL backend. That will be handled in a subsequent PR. There are FIXME's and TODO's in comments to indicate where code will chage for that.

There are some fields in .ix files (saved outside of version control) which are are not back-compatible after improvements made on [ec_tools] in response to Kenneth's review of #5

So that a PR of series_handling to master is a simple fast-forward merge. The commits in question are mainly just to get the README which points to user_ready.

KennethNielsen

First part of the review. Looks good. Mostly smallish stuff. Will finish after vacation.

TOOLS.rst

development_scripts/ec_tools.py

src/ixdat/backends/__init__.py

src/ixdat/backends/backend_base.py

src/ixdat/backends/directory_backend.py

src/ixdat/db.py

src/ixdat/exporters/csv_exporter.py

ScottSoren · 2021-07-06T14:21:02Z

Awesome! Just looked through the comments, can't wait to get to work on it. :D
I will push the updates to the parts you reviewed here while you're on vacation, unless you have a better workflow in mind?

KennethNielsen

GREAT. Finally done with this. I think this looks wonderful with only a few non-trivial things to discuss, which I will write about in a general comment.

src/ixdat/measurements.py

src/ixdat/plotters/ec_plotter.py

src/ixdat/techniques/cv.py

KennethNielsen · 2021-07-28T08:31:13Z

Awesome! Just looked through the comments, can't wait to get to work on it. :D
I will push the updates to the parts you reviewed here while you're on vacation, unless you have a better workflow in mind?

Sorry. Didn't see this before vacation. I would not really have mattered since I went through the files in order, and so if the comments got obsoleted due to changed code, that would have been fine. HOWEVER, I annoyingly had to go back and comment on a few more things in the first part of the review, so I think there is no way around you having to quickly go through the comments for the first part (down to, but not including, measurements.py) again to check for updates before pushing the changes. Sorry about that.

KennethNielsen · 2021-07-28T08:37:33Z

Ok. This PR is great and it really shows that we spent all that time designing it. That being said, it is probably only natural that one of the non-trivial comments concern one of the new items i.e. making getitem king.

__getitem__ and to some extent select_value and select_values

When I read through __getitem__ I thought it was generally difficult to determine if the method does as intended, because I couldn't quite figure out what is intended. Specifically questions like:

For which types of keys should values be cached
For which types of keys should the -t suffix be possible
What is the order of trying different things

Since this is such an important piece of code for usability I think it would be useful to consider these questions disconnected from the code. I.e. of you were doing TDD you would write the tests first, maybe not that, but at least consider what should be valid keys and how and when should they be cached. Then after considering, put it in the docstring as examples, everyone loves examples. This is the part of the comment that I thinks applies to select_value and select_values as well.

Second, as an input to the implementation, I see that you maybe ran into a problem that I have encountered before. That is, how do you implement a complex search, with a lot of exits, when you need to do something with the result afterwards (caching), which prevents just using return as a convenient escape mechanism.

An idea for fixing that, would be to put parts of the search algorithm into a sub-method, which can then use the return, because you still have the option of caching in the main method. If everything is to be cached, you could also put the caching mechanism in a decorator and forget about it altogether, but I fell like that may be a little heavy handed for this use case. I think, that (depending on what you decide about which key type lookups should be cached), you might refactor into a _get_item_to_cache sub-method and put the parts of the code that searches for items which should be cached in there. This could use return-escape and you could cache in the main method. If I'm not making sense in text, here is the code the illustrates:

def __getitem__(self, key):
    # pre-processing stuff

    # Complex search algorithm
    if ...:
        for item in item:
            if ...:
                # You can't return e.g. here, because that would prevent caching
            else:
                # You can't return e.g. here, because that would prevent caching
        else:
            # You can't return e.g. here, because that would prevent caching
    else:
        ...

    # Caching

turn into

def __getitem__(self, key):
    # pre-processing stuff

    result = self._get_item_to_cache(key)

    # Caching

def _get_item_to_cache(key):
    # Complex search algorithm
    if ...:
        for item in item:
            if ...:
                # You can return here
            else:
                # Here
        else:
            # And everywhere
    else:
        ...

The second thing, is about the -t and -s suffix, which it is a little difficult to decipher which keys that might work for. How about deciding that it applies to all keys and then making it a pure pre- and post-processing thing and thus separating it from the search algorithm:

# Cache check

# Pre-process selector suffixes
if key[-2:] in ("-t", "-s"):
    pure_key = key[:-2]
else:
    pure_key = key

# Search for pure_key (probably not a good name, but you get the point)
item = search(pure_key)

# Post process 
if key != pure_key:
    item_to_cache = ...  # Do the .time of .value extraction depending on the suffix
else:
    item_to_cache = item

# Cache on key (not pure_key)
self.cache[key] = item_to_cache
return item_to_cache

Names

We have already talked about names of things and improved stuff a lot, but I think there is still a few things worth considering.

I have, maybe not completely consistently, commented with NAME in the comments to lead to these places.

In general it falls into three categories:

Is the name really appropriate and descriptive and all the right stuff

That might be stuff like address or identity and full_identity in backend. It seems like address is actually sort of an identity, but then what are the other ones? I'm not saying it is wrong, just something to think about.

Use user oriented names instead of programmer oriented names

This reduces pretty much to the use of the _str suffix, which I really think should be _name and there are a few variable names where I think _name should be suffixed to disambiguate it.

Unhelpful and unnecessary abbreviations

This one I think likely is going to be a point of discussion, but here goes. I acknowledge the need to be practical and deviates from the normal "names should always be description and be written out in full" dogma, in some cases, because I recognize that it is really useful for things that get's used a lot both by the user and internally. These are things like the time->t and value->v (not the I, J etc. because they are just their natural scientific letter, so that is fine on its own). And then there are internal things which are generally recognized programming abbreviations like "attrs", "spec" etc. But besides from these categories there are also things which I think shouldn't be abbreviated:

sel_str e.g. really should be something like default_selector_name. It isn't used a lot, but when it is, it really would be nice with a descriptive name.

The abbreviation of measurement->m, calibration->c and series->s in e.g. m_ids, c_ids, s_ids I think is also not good. At least, being the reader, I would really prefer that they were written out in full.

There maybe more of these around. For my self I would say that there should be no more non-trivial abbreviations around than absolutely necessary.

Prefix with or without underscore

This may seem kind of like a not-pick, but it actually can be useful when you are writing code (either as user or programmer) and knowing the prefix and what it is that you are after, you can sort of guess the name with having to try both with and without underscore. The one place where I know that I have this is with t, so it is t_id, t_ids but tseries. But there may be more around. Now it may be that a convention here is impractical to introduce, then ok, but I thought that I would mention it.

Backwards compatability

It may seem difficult to change this naming stuff if it is user facing, but if you agree, now is the time to do it (before 1.0), and you can always do it in a backwards compatible way with a property and a deprecation warning.

It should also be mentioned that all of this naming stuff, if you agree with any of it, can of course be handled in a separate PR, to get this big one landed.

ScottSoren

Thanks for the review! Much learning and coming improvement from this. I completely agree with the top-level comments, though may come to you for clarification while implementing.

I've replied to all the in-line comments that I think may require a discussion, (plus a few where I couldn't stop myself).
I think every in-line comment that I didn't reply to is something I agree to and can easily implement.
Excepting those discussions, my next steps are:

__getitem__
select_values() and related functions
do and resolve comments for all the renaming and easy code improvements

src/ixdat/backends/memory_backend.py

src/ixdat/db.py

src/ixdat/measurements.py

ScottSoren

Woohoo, done with this great big review!

Only two comments left unresolved:

Series handling #11 (comment), because I have no idea how to write tests and would for us to get that right another time.
Series handling #11 (comment), which I think gets at something fundamental about keeping track of data from potentially multiple sources, i.e. working with multiple backends. It would be nice to think hard solve that the best way for ixdat.

I think those should be part of future PR's. There are a bunch of TODO's and FIXME's added as well in response to other comments. But the vast majority could be implemented right away, to huge benifit to the readability and sometimes functionality of the code. I've responded to every comment.
Thanks for the review. Hope you agree this PR is ready to merge so we can get to the next one!

ScottSoren · 2021-11-11T12:14:28Z

NOTE: last commit going with above comment (878b578) just pushed.

KennethNielsen · 2021-11-11T21:50:36Z

Woohoo, done with this great big review!

Only two comments left unresolved:
* [Series handling #11 (comment)](https://github.com/ixdat/ixdat/pull/11#discussion_r677967807), because I have no idea how to write tests and would for us to get that right another time.

* [Series handling #11 (comment)](https://github.com/ixdat/ixdat/pull/11#discussion_r663478498), which I think gets at something fundamental about keeping track of data from potentially multiple sources, i.e. working with multiple backends. It would be nice to think hard solve that the best way for ixdat.
I think those should be part of future PR's. There are a bunch of TODO's and FIXME's added as well in response to other comments. But the vast majority could be implemented right away, to huge benifit to the readability and sometimes functionality of the code. I've responded to every comment. Thanks for the review. Hope you agree this PR is ready to merge so we can get to the next one!

I agree with all of that, especially getting it merged fast :) The only thing I'm missing now, is that I would like to give the new getitem a proper read tomorrow when my head is fresh.

Besides from that, I clicked through all the resolved comments above and replied a few of them (I guess you get notifications for those). I also unresolved a few of them, where you asked for some information, just to make sure you see them before you merge and close. But all in all I think this is extremely close to merge. Will do a new github "review" tomorrow, just for the getitem stuff and then done.

KennethNielsen

I read the new __getitem__ and the helper method and ok looks excellent. Everything else is in the resolved/unresolved status of the previous comments, so provided they are good this is good to go 🤸

ScottSoren added 30 commits January 26, 2021 18:09

ECMeasurement properties: potential and current

a424abf

first working EC plotter

2947325

current=0 during OCP periods

b9f5ae2

file_number and selector in ECMeasurement

23ae425

Merge branch 'ec_tools' into ec_features

01eff7a

There are some fields in .ix files (saved outside of version control) which are are not back-compatible after improvements made on [ec_tools] in response to Kenneth's review of #5

tspan cutting and selection in Measurement

48381a4

ohmic drop correction, debug save&load

550ef7f

colored axes and plot_vs_potential for ec!

b0c185e

first use of (id, backend_name) combo in Measurement

0f8dffa

start CyclicVoltammagram technique

1ff49ba

CyclicVoltammagram cycle indexing and slicing

1953d9c

working exporter for EC data

da0df21

docstrings for cv.py, plotters and exporters

af60510

docstrings in ec and new Measurement methods; debug time zero-ing

ad2b64f

debugging for ipython notebook tutorial

430d2b3

add TODOs from design WS3 and 1st rev of PR #6

953d577

update tools.rst

20a7966

update tools

9a05ab0

write test for coming fixes

6443968

THE CORRECT __getitem__. MEASUREMENT BROKEN.

6db74c8

move build functions to data_series, add new meas attributes

451fb33

refactoring in prep for EC series handling

33a13b1

aliases in ec measurements

a5f130b

recursive lookup in __getitem__ for alias->calibration->alias

e3b5a8e

Systematic series constructors. No ECMeasurement__getitem__. Working!

50b9ac2

fix selector and sel_str handling

cb82ae9

ec calibration working with new __getitem__

3132636

debug tox

4d903c5

clean up after debugging tox

8fb8997

debug to run ec_tools_dev.py

cce47c6

ScottSoren added 7 commits June 29, 2021 22:37

vseries saves child tseries, ec_toos_dev passes

2c4fbf0

problematic attempt to use memory backend

839d508

better memory backend use via Saveable.as_dict()

bcd4840

Merge branch 'master' into series_handling

a17b388

So that a PR of series_handling to master is a simple fast-forward merge. The commits in question are mainly just to get the README which points to user_ready.

write test for save-load cycle and write comments

0796611

minor cleanup docu and comments in backends and db

47036bf

minor cleanup docu and comments in measurement and ec

d8e25b2

ScottSoren requested a review from KennethNielsen June 30, 2021 14:37

ScottSoren mentioned this pull request Jun 30, 2021

basic EC features #6

Closed

KennethNielsen requested changes Jul 4, 2021

View reviewed changes

KennethNielsen requested changes Jul 28, 2021

View reviewed changes

ScottSoren commented Jul 29, 2021

View reviewed changes

ScottSoren and others added 7 commits November 9, 2021 13:52

preparing for implementation of Kenneth's review

cab5a65

rewrite __getitem__

8f06e28

implement Kenneth's review comments: backends

d0f8f43

implement kni's comments to #11: db

3ba19a7

differing returns back in Savable.short_identity

4ec3209

implementation of #11 review: measurement

2a2266e

continue implementation of #11 review: ec

399290e

ScottSoren commented Nov 11, 2021

View reviewed changes

finish implementation of review to #11

878b578

KennethNielsen approved these changes Nov 12, 2021

View reviewed changes

t_zero in cut, add TODO comments, black format

f16247d

ScottSoren merged commit 3689c2a into master Nov 12, 2021

ScottSoren deleted the series_handling branch December 2, 2021 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Series handling #11

Series handling #11

ScottSoren commented Jun 30, 2021

KennethNielsen left a comment

ScottSoren commented Jul 6, 2021

KennethNielsen left a comment

KennethNielsen commented Jul 28, 2021

KennethNielsen commented Jul 28, 2021 •

edited

ScottSoren left a comment

ScottSoren left a comment

ScottSoren commented Nov 11, 2021 •

edited

KennethNielsen commented Nov 11, 2021

KennethNielsen left a comment •

edited

Series handling #11

Series handling #11

Conversation

ScottSoren commented Jun 30, 2021

KennethNielsen left a comment

Choose a reason for hiding this comment

ScottSoren commented Jul 6, 2021

KennethNielsen left a comment

Choose a reason for hiding this comment

KennethNielsen commented Jul 28, 2021

KennethNielsen commented Jul 28, 2021 • edited

ScottSoren left a comment

Choose a reason for hiding this comment

ScottSoren left a comment

Choose a reason for hiding this comment

ScottSoren commented Nov 11, 2021 • edited

KennethNielsen commented Nov 11, 2021

KennethNielsen left a comment • edited

Choose a reason for hiding this comment

KennethNielsen commented Jul 28, 2021 •

edited

ScottSoren commented Nov 11, 2021 •

edited

KennethNielsen left a comment •

edited