Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series handling #11

Merged
merged 51 commits into from Nov 12, 2021
Merged

Series handling #11

merged 51 commits into from Nov 12, 2021

Conversation

ScottSoren
Copy link
Member

Hi Kenneth,

This PR replaces #6 .

It implements most of what we agreed on in our workshops last month :

  • Make __getitem__ king
  • Use reader-provided aliases for essential series which need to be accessed under another name
  • Use a ConstantValue refering to a tseries as a stand-in for missing but known essential series (such as current in OCP)
  • Implement a cach (_cached_series) for __getitem__
  • Assign the job of calibration to a separate Calibration object (this became calibrations, derived from _calibration_list, and a new Saveable class family.)
  • Use a MemoryBackend to keep track of objects that might need to be passed via a dictionary representation. The methods cut, select, as_cv and __add__ all make use of this (via Saveable.as_dict(), fill_object_list() and PlaceHolderObject to ensure the returned Measurement can find the calibrations, etc of the first Measurement.

The robustness of the ixdat in this PR is ensured by the test test_biologic_ec_measurement.py using example data contained the repository, and tools.rst is updated to help other developer noobs like myself get that working with a pre-push hook.

This PR does not address a fix of the table definitions needed to implement an SQL backend. That will be handled in a subsequent PR. There are FIXME's and TODO's in comments to indicate where code will chage for that.

There are some fields in .ix files (saved outside of version control)
which are are not back-compatible after improvements made on [ec_tools]
in response to Kenneth's review of #5
Copy link

@KennethNielsen KennethNielsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First part of the review. Looks good. Mostly smallish stuff. Will finish after vacation.

TOOLS.rst Outdated Show resolved Hide resolved
development_scripts/ec_tools.py Outdated Show resolved Hide resolved
src/ixdat/backends/__init__.py Show resolved Hide resolved
src/ixdat/backends/backend_base.py Outdated Show resolved Hide resolved
src/ixdat/backends/directory_backend.py Show resolved Hide resolved
src/ixdat/db.py Outdated Show resolved Hide resolved
src/ixdat/db.py Show resolved Hide resolved
src/ixdat/db.py Show resolved Hide resolved
src/ixdat/exporters/csv_exporter.py Outdated Show resolved Hide resolved
src/ixdat/exporters/csv_exporter.py Show resolved Hide resolved
@ScottSoren
Copy link
Member Author

Awesome! Just looked through the comments, can't wait to get to work on it. :D
I will push the updates to the parts you reviewed here while you're on vacation, unless you have a better workflow in mind?

Copy link

@KennethNielsen KennethNielsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GREAT. Finally done with this. I think this looks wonderful with only a few non-trivial things to discuss, which I will write about in a general comment.

src/ixdat/measurements.py Outdated Show resolved Hide resolved
src/ixdat/measurements.py Show resolved Hide resolved
src/ixdat/measurements.py Outdated Show resolved Hide resolved
src/ixdat/measurements.py Outdated Show resolved Hide resolved
src/ixdat/measurements.py Outdated Show resolved Hide resolved
src/ixdat/measurements.py Outdated Show resolved Hide resolved
src/ixdat/measurements.py Outdated Show resolved Hide resolved
src/ixdat/measurements.py Outdated Show resolved Hide resolved
src/ixdat/plotters/ec_plotter.py Show resolved Hide resolved
src/ixdat/techniques/cv.py Outdated Show resolved Hide resolved
@KennethNielsen
Copy link

Awesome! Just looked through the comments, can't wait to get to work on it. :D
I will push the updates to the parts you reviewed here while you're on vacation, unless you have a better workflow in mind?

Sorry. Didn't see this before vacation. I would not really have mattered since I went through the files in order, and so if the comments got obsoleted due to changed code, that would have been fine. HOWEVER, I annoyingly had to go back and comment on a few more things in the first part of the review, so I think there is no way around you having to quickly go through the comments for the first part (down to, but not including, measurements.py) again to check for updates before pushing the changes. Sorry about that.

@KennethNielsen
Copy link

KennethNielsen commented Jul 28, 2021

Ok. This PR is great and it really shows that we spent all that time designing it. That being said, it is probably only natural that one of the non-trivial comments concern one of the new items i.e. making getitem king.

__getitem__ and to some extent select_value and select_values

When I read through __getitem__ I thought it was generally difficult to determine if the method does as intended, because I couldn't quite figure out what is intended. Specifically questions like:

  1. For which types of keys should values be cached
  2. For which types of keys should the -t suffix be possible
  3. What is the order of trying different things

Since this is such an important piece of code for usability I think it would be useful to consider these questions disconnected from the code. I.e. of you were doing TDD you would write the tests first, maybe not that, but at least consider what should be valid keys and how and when should they be cached. Then after considering, put it in the docstring as examples, everyone loves examples. This is the part of the comment that I thinks applies to select_value and select_values as well.

Second, as an input to the implementation, I see that you maybe ran into a problem that I have encountered before. That is, how do you implement a complex search, with a lot of exits, when you need to do something with the result afterwards (caching), which prevents just using return as a convenient escape mechanism.

An idea for fixing that, would be to put parts of the search algorithm into a sub-method, which can then use the return, because you still have the option of caching in the main method. If everything is to be cached, you could also put the caching mechanism in a decorator and forget about it altogether, but I fell like that may be a little heavy handed for this use case. I think, that (depending on what you decide about which key type lookups should be cached), you might refactor into a _get_item_to_cache sub-method and put the parts of the code that searches for items which should be cached in there. This could use return-escape and you could cache in the main method. If I'm not making sense in text, here is the code the illustrates:

def __getitem__(self, key):
    # pre-processing stuff

    # Complex search algorithm
    if ...:
        for item in item:
            if ...:
                # You can't return e.g. here, because that would prevent caching
            else:
                # You can't return e.g. here, because that would prevent caching
        else:
            # You can't return e.g. here, because that would prevent caching
    else:
        ...

    # Caching

turn into

def __getitem__(self, key):
    # pre-processing stuff

    result = self._get_item_to_cache(key)

    # Caching

def _get_item_to_cache(key):
    # Complex search algorithm
    if ...:
        for item in item:
            if ...:
                # You can return here
            else:
                # Here
        else:
            # And everywhere
    else:
        ...

The second thing, is about the -t and -s suffix, which it is a little difficult to decipher which keys that might work for. How about deciding that it applies to all keys and then making it a pure pre- and post-processing thing and thus separating it from the search algorithm:

# Cache check

# Pre-process selector suffixes
if key[-2:] in ("-t", "-s"):
    pure_key = key[:-2]
else:
    pure_key = key

# Search for pure_key (probably not a good name, but you get the point)
item = search(pure_key)

# Post process 
if key != pure_key:
    item_to_cache = ...  # Do the .time of .value extraction depending on the suffix
else:
    item_to_cache = item

# Cache on key (not pure_key)
self.cache[key] = item_to_cache
return item_to_cache

Names

We have already talked about names of things and improved stuff a lot, but I think there is still a few things worth considering.

I have, maybe not completely consistently, commented with NAME in the comments to lead to these places.

In general it falls into three categories:

  1. Is the name really appropriate and descriptive and all the right stuff

That might be stuff like address or identity and full_identity in backend. It seems like address is actually sort of an identity, but then what are the other ones? I'm not saying it is wrong, just something to think about.

  1. Use user oriented names instead of programmer oriented names

This reduces pretty much to the use of the _str suffix, which I really think should be _name and there are a few variable names where I think _name should be suffixed to disambiguate it.

  1. Unhelpful and unnecessary abbreviations

This one I think likely is going to be a point of discussion, but here goes. I acknowledge the need to be practical and deviates from the normal "names should always be description and be written out in full" dogma, in some cases, because I recognize that it is really useful for things that get's used a lot both by the user and internally. These are things like the time->t and value->v (not the I, J etc. because they are just their natural scientific letter, so that is fine on its own). And then there are internal things which are generally recognized programming abbreviations like "attrs", "spec" etc. But besides from these categories there are also things which I think shouldn't be abbreviated:

sel_str e.g. really should be something like default_selector_name. It isn't used a lot, but when it is, it really would be nice with a descriptive name.

The abbreviation of measurement->m, calibration->c and series->s in e.g. m_ids, c_ids, s_ids I think is also not good. At least, being the reader, I would really prefer that they were written out in full.

There maybe more of these around. For my self I would say that there should be no more non-trivial abbreviations around than absolutely necessary.

  1. Prefix with or without underscore

This may seem kind of like a not-pick, but it actually can be useful when you are writing code (either as user or programmer) and knowing the prefix and what it is that you are after, you can sort of guess the name with having to try both with and without underscore. The one place where I know that I have this is with t, so it is t_id, t_ids but tseries. But there may be more around. Now it may be that a convention here is impractical to introduce, then ok, but I thought that I would mention it.

Backwards compatability

It may seem difficult to change this naming stuff if it is user facing, but if you agree, now is the time to do it (before 1.0), and you can always do it in a backwards compatible way with a property and a deprecation warning.

It should also be mentioned that all of this naming stuff, if you agree with any of it, can of course be handled in a separate PR, to get this big one landed.

Copy link
Member Author

@ScottSoren ScottSoren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Much learning and coming improvement from this. I completely agree with the top-level comments, though may come to you for clarification while implementing.

I've replied to all the in-line comments that I think may require a discussion, (plus a few where I couldn't stop myself).
I think every in-line comment that I didn't reply to is something I agree to and can easily implement.
Excepting those discussions, my next steps are:

  • __getitem__
  • select_values() and related functions
  • do and resolve comments for all the renaming and easy code improvements

src/ixdat/backends/memory_backend.py Show resolved Hide resolved
src/ixdat/db.py Show resolved Hide resolved
src/ixdat/db.py Outdated Show resolved Hide resolved
src/ixdat/db.py Outdated Show resolved Hide resolved
src/ixdat/db.py Show resolved Hide resolved
src/ixdat/measurements.py Show resolved Hide resolved
src/ixdat/measurements.py Outdated Show resolved Hide resolved
src/ixdat/measurements.py Outdated Show resolved Hide resolved
src/ixdat/measurements.py Show resolved Hide resolved
src/ixdat/measurements.py Show resolved Hide resolved
Copy link
Member Author

@ScottSoren ScottSoren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woohoo, done with this great big review!

Only two comments left unresolved:

  • Series handling #11 (comment), because I have no idea how to write tests and would for us to get that right another time.
  • Series handling #11 (comment), which I think gets at something fundamental about keeping track of data from potentially multiple sources, i.e. working with multiple backends. It would be nice to think hard solve that the best way for ixdat.

I think those should be part of future PR's. There are a bunch of TODO's and FIXME's added as well in response to other comments. But the vast majority could be implemented right away, to huge benifit to the readability and sometimes functionality of the code. I've responded to every comment.
Thanks for the review. Hope you agree this PR is ready to merge so we can get to the next one!

@ScottSoren
Copy link
Member Author

ScottSoren commented Nov 11, 2021

NOTE: last commit going with above comment (878b578) just pushed.

@KennethNielsen
Copy link

Woohoo, done with this great big review!

Only two comments left unresolved:

* [Series handling #11 (comment)](https://github.com/ixdat/ixdat/pull/11#discussion_r677967807), because I have no idea how to write tests and would for us to get that right another time.

* [Series handling #11 (comment)](https://github.com/ixdat/ixdat/pull/11#discussion_r663478498), which I think gets at something fundamental about keeping track of data from potentially multiple sources, i.e. working with multiple backends. It would be nice to think hard solve that the best way for ixdat.

I think those should be part of future PR's. There are a bunch of TODO's and FIXME's added as well in response to other comments. But the vast majority could be implemented right away, to huge benifit to the readability and sometimes functionality of the code. I've responded to every comment. Thanks for the review. Hope you agree this PR is ready to merge so we can get to the next one!

I agree with all of that, especially getting it merged fast :) The only thing I'm missing now, is that I would like to give the new getitem a proper read tomorrow when my head is fresh.

Besides from that, I clicked through all the resolved comments above and replied a few of them (I guess you get notifications for those). I also unresolved a few of them, where you asked for some information, just to make sure you see them before you merge and close. But all in all I think this is extremely close to merge. Will do a new github "review" tomorrow, just for the getitem stuff and then done.

Copy link

@KennethNielsen KennethNielsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the new __getitem__ and the helper method and ok looks excellent. Everything else is in the resolved/unresolved status of the previous comments, so provided they are good this is good to go 🤸

@ScottSoren ScottSoren merged commit 3689c2a into master Nov 12, 2021
@ScottSoren ScottSoren deleted the series_handling branch December 2, 2021 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants