-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LabelRowV2: support bundled operations #310
Conversation
f26eef8
to
a0b96d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice so far. I'm stopping the review as I'm getting a bit confused to follow it. Maybe we can have a chat about the code or you can clarify some questions.
I'm pretty sure this is approvable but I haven't wrapped my head around it fully and I'd like to see at least a few integration tests. Are there available somewhere?
|
||
def __exit__(self, exc_type, exc_value, exc_traceback): | ||
if exc_type is not None: | ||
log.warning(f"Cancelling operation due to exception: {exc_type.__name__}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love context managers until I need to think about exceptions. Here I'm not sure what exactly will happen, but it seems to me like you'd be swallowing the exception that arises within the context which might not be what you want. Can you double check this behaviour and be aware of what we want here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree.
Context manager is kind of convenience feature here though, for people who really need it.
It's possible to go without them, by explicitly calling bundle.execute()
and this is where all exceptions occur.
I'll add this to the documentation and list caveats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can make the error more explicit by raising it here from the arguments that you get instead of just finishing with a warning
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exception is re-raised here implicitly, according to the documentation: https://docs.python.org/3/library/stdtypes.html#contextmanager.__exit__
The exception passed in should never be reraised explicitly - instead, this method should return a false value to indicate that the method completed successfully and does not want to suppress the raised exception.
Since this method return None
, which is equivalent to False
, all exceptions will be re-raised.
So it only the matter of either we want to add the warning or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. Thanks for sharing the link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice tests! Thank you. We could also check the context manager but tbh it's not really important.
|
||
class Bundle: | ||
""" | ||
This class allows to perform operations in bundles to improve performance by reducing number of network calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a comment saying that this class should not be created by the clients directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder whether we should make the users aware that this bundle exists from the "initialise_labels" and "save" method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are docs for the "bundle" parameters referencing this one.
But perhaps it's worth to add a separate how-to explaining all the caveats with bundled approach, will do that
) | ||
|
||
@staticmethod | ||
def __batch_save_rows_reducer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small thing: you're slightly inconsistent with the user of _
and __
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migrated to "_" everywhere for now. Don't have any preference though, so if you think "__" would be better - totally happy to go that way too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice so far. I'm stopping the review as I'm getting a bit confused to follow it. Maybe we can have a chat about the code or you can clarify some questions.
I'm pretty sure this is approvable but I haven't wrapped my head around it fully and I'd like to see at least a few integration tests. Are there available somewhere?
def _bundle_create_rows_reducer( | ||
bundle_payload: BundledCreateRowsPayload, payload: BundledCreateRowsPayload | ||
) -> BundledCreateRowsPayload: | ||
bundle_payload.uids += payload.uids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you have corrected this bug in our call. However, if this was not caught by any of the tests I'd like to make sure that we revisit the tests to make them a bit more complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, added additional checks to the unit tests
|
||
|
||
@dataclass | ||
class BatchSaveRowsPayload: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the lists have to have an equal length, you may want to consider having a list[BatchSaveRowsPayload]
where each dataclass has a single uid and payload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically the reason to not to do that is BundledGetRowsPayload, that contains both list of uids
, and general settings, like get_signed_url
that are set for the whole batch.
And this is why we need all these reducers rather than able to just operate on list of "stuff".
I think current approach makes sense as it provides enough control over how exactly payloads are getting constructed. But yeah, definitely has its downsides
encord/http/bundle.py
Outdated
@dataclass | ||
class BundledOperation(Generic[T, R]): | ||
operation: Callable[..., List[R]] | ||
reducer: Callable[[T, T], T] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might also want to call this result_mapper
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the below line of course :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some more comments. Now that I understand it, I'm happy with it (after considering the comments at least).
Not sure if it can be simplified much further, but I think the more explicit naming is definitely helping me.
encord/http/bundle.py
Outdated
@dataclass | ||
class BundledOperation(Generic[T, R]): | ||
operation: Callable[..., List[R]] | ||
reducer: Callable[[T, T], T] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the below line of course :)
|
||
def __exit__(self, exc_type, exc_value, exc_traceback): | ||
if exc_type is not None: | ||
log.warning(f"Cancelling operation due to exception: {exc_type.__name__}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can make the error more explicit by raising it here from the arguments that you get instead of just finishing with a warning
.
request_reducer=self._bundle_get_rows_reducer, | ||
result_mapper=BundleResultMapper( | ||
mapping_function=self._bundle_get_rows_mapper, | ||
result_handler=BundleResultHandler(predicate=self.label_hash, handler=self.from_labels_dict), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what is going on here, but I think this part confused me the most. Maybe I find it unusual that we have the predicate
and the mapping_function
in two different places and I'd have expected them to be one place. But maybe it's also just hard to simplify this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done!
|
||
def __exit__(self, exc_type, exc_value, exc_traceback): | ||
if exc_type is not None: | ||
log.warning(f"Cancelling operation due to exception: {exc_type.__name__}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. Thanks for sharing the link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done!
Support bundled
initialise_labels()
andsave()
operations for LabelRowV2Tests
Added unit tests
Added integration tests