New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Series handling #11
Series handling #11
Conversation
There are some fields in .ix files (saved outside of version control) which are are not back-compatible after improvements made on [ec_tools] in response to Kenneth's review of #5
So that a PR of series_handling to master is a simple fast-forward merge. The commits in question are mainly just to get the README which points to user_ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First part of the review. Looks good. Mostly smallish stuff. Will finish after vacation.
Awesome! Just looked through the comments, can't wait to get to work on it. :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GREAT. Finally done with this. I think this looks wonderful with only a few non-trivial things to discuss, which I will write about in a general comment.
Sorry. Didn't see this before vacation. I would not really have mattered since I went through the files in order, and so if the comments got obsoleted due to changed code, that would have been fine. HOWEVER, I annoyingly had to go back and comment on a few more things in the first part of the review, so I think there is no way around you having to quickly go through the comments for the first part (down to, but not including, measurements.py) again to check for updates before pushing the changes. Sorry about that. |
Ok. This PR is great and it really shows that we spent all that time designing it. That being said, it is probably only natural that one of the non-trivial comments concern one of the new items i.e. making getitem king.
When I read through
Since this is such an important piece of code for usability I think it would be useful to consider these questions disconnected from the code. I.e. of you were doing TDD you would write the tests first, maybe not that, but at least consider what should be valid keys and how and when should they be cached. Then after considering, put it in the docstring as examples, everyone loves examples. This is the part of the comment that I thinks applies to Second, as an input to the implementation, I see that you maybe ran into a problem that I have encountered before. That is, how do you implement a complex search, with a lot of exits, when you need to do something with the result afterwards (caching), which prevents just using An idea for fixing that, would be to put parts of the search algorithm into a sub-method, which can then use the return, because you still have the option of caching in the main method. If everything is to be cached, you could also put the caching mechanism in a decorator and forget about it altogether, but I fell like that may be a little heavy handed for this use case. I think, that (depending on what you decide about which key type lookups should be cached), you might refactor into a def __getitem__(self, key):
# pre-processing stuff
# Complex search algorithm
if ...:
for item in item:
if ...:
# You can't return e.g. here, because that would prevent caching
else:
# You can't return e.g. here, because that would prevent caching
else:
# You can't return e.g. here, because that would prevent caching
else:
...
# Caching turn into def __getitem__(self, key):
# pre-processing stuff
result = self._get_item_to_cache(key)
# Caching
def _get_item_to_cache(key):
# Complex search algorithm
if ...:
for item in item:
if ...:
# You can return here
else:
# Here
else:
# And everywhere
else:
... The second thing, is about the -t and -s suffix, which it is a little difficult to decipher which keys that might work for. How about deciding that it applies to all keys and then making it a pure pre- and post-processing thing and thus separating it from the search algorithm: # Cache check
# Pre-process selector suffixes
if key[-2:] in ("-t", "-s"):
pure_key = key[:-2]
else:
pure_key = key
# Search for pure_key (probably not a good name, but you get the point)
item = search(pure_key)
# Post process
if key != pure_key:
item_to_cache = ... # Do the .time of .value extraction depending on the suffix
else:
item_to_cache = item
# Cache on key (not pure_key)
self.cache[key] = item_to_cache
return item_to_cache Names We have already talked about names of things and improved stuff a lot, but I think there is still a few things worth considering. I have, maybe not completely consistently, commented with NAME in the comments to lead to these places. In general it falls into three categories:
That might be stuff like
This reduces pretty much to the use of the
This one I think likely is going to be a point of discussion, but here goes. I acknowledge the need to be practical and deviates from the normal "names should always be description and be written out in full" dogma, in some cases, because I recognize that it is really useful for things that get's used a lot both by the user and internally. These are things like the time->t and value->v (not the I, J etc. because they are just their natural scientific letter, so that is fine on its own). And then there are internal things which are generally recognized programming abbreviations like "attrs", "spec" etc. But besides from these categories there are also things which I think shouldn't be abbreviated:
The abbreviation of measurement->m, calibration->c and series->s in e.g. m_ids, c_ids, s_ids I think is also not good. At least, being the reader, I would really prefer that they were written out in full. There maybe more of these around. For my self I would say that there should be no more non-trivial abbreviations around than absolutely necessary.
This may seem kind of like a not-pick, but it actually can be useful when you are writing code (either as user or programmer) and knowing the prefix and what it is that you are after, you can sort of guess the name with having to try both with and without underscore. The one place where I know that I have this is with t, so it is Backwards compatability It may seem difficult to change this naming stuff if it is user facing, but if you agree, now is the time to do it (before 1.0), and you can always do it in a backwards compatible way with a property and a deprecation warning. It should also be mentioned that all of this naming stuff, if you agree with any of it, can of course be handled in a separate PR, to get this big one landed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! Much learning and coming improvement from this. I completely agree with the top-level comments, though may come to you for clarification while implementing.
I've replied to all the in-line comments that I think may require a discussion, (plus a few where I couldn't stop myself).
I think every in-line comment that I didn't reply to is something I agree to and can easily implement.
Excepting those discussions, my next steps are:
__getitem__
select_values()
and related functions- do and resolve comments for all the renaming and easy code improvements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woohoo, done with this great big review!
Only two comments left unresolved:
- Series handling #11 (comment), because I have no idea how to write tests and would for us to get that right another time.
- Series handling #11 (comment), which I think gets at something fundamental about keeping track of data from potentially multiple sources, i.e. working with multiple backends. It would be nice to think hard solve that the best way for ixdat.
I think those should be part of future PR's. There are a bunch of TODO's and FIXME's added as well in response to other comments. But the vast majority could be implemented right away, to huge benifit to the readability and sometimes functionality of the code. I've responded to every comment.
Thanks for the review. Hope you agree this PR is ready to merge so we can get to the next one!
NOTE: last commit going with above comment (878b578) just pushed. |
I agree with all of that, especially getting it merged fast :) The only thing I'm missing now, is that I would like to give the new getitem a proper read tomorrow when my head is fresh. Besides from that, I clicked through all the resolved comments above and replied a few of them (I guess you get notifications for those). I also unresolved a few of them, where you asked for some information, just to make sure you see them before you merge and close. But all in all I think this is extremely close to merge. Will do a new github "review" tomorrow, just for the getitem stuff and then done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the new __getitem__
and the helper method and ok looks excellent. Everything else is in the resolved/unresolved status of the previous comments, so provided they are good this is good to go 🤸
Hi Kenneth,
This PR replaces #6 .
It implements most of what we agreed on in our workshops last month :
__getitem__
kingaliases
for essential series which need to be accessed under another nameConstantValue
refering to atseries
as a stand-in for missing but known essential series (such as current in OCP)_cached_series
) for__getitem__
calibrations
, derived from_calibration_list
, and a new Saveable class family.)MemoryBackend
to keep track of objects that might need to be passed via a dictionary representation. The methodscut
,select
,as_cv
and__add__
all make use of this (viaSaveable.as_dict()
,fill_object_list()
andPlaceHolderObject
to ensure the returned Measurement can find the calibrations, etc of the first Measurement.The robustness of the ixdat in this PR is ensured by the test
test_biologic_ec_measurement.py
using example data contained the repository, andtools.rst
is updated to help other developer noobs like myself get that working with a pre-push hook.This PR does not address a fix of the table definitions needed to implement an SQL backend. That will be handled in a subsequent PR. There are FIXME's and TODO's in comments to indicate where code will chage for that.