Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LabelRowV2: support bundled operations #310

Merged
merged 16 commits into from
May 15, 2023

Conversation

sergei-encord
Copy link
Collaborator

Support bundled initialise_labels() and save() operations for LabelRowV2

Tests

Added unit tests
Added integration tests

@github-actions
Copy link

github-actions bot commented May 11, 2023

Unit test report

100 tests   89 ✔️  4s ⏱️
    1 suites  11 💤
    1 files      0

Results for commit 16e4b34.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@denis-cord denis-cord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice so far. I'm stopping the review as I'm getting a bit confused to follow it. Maybe we can have a chat about the code or you can clarify some questions.
I'm pretty sure this is approvable but I haven't wrapped my head around it fully and I'd like to see at least a few integration tests. Are there available somewhere?

encord/http/bundle.py Show resolved Hide resolved
encord/http/bundle.py Outdated Show resolved Hide resolved
encord/objects/ontology_labels_impl.py Show resolved Hide resolved
encord/objects/ontology_labels_impl.py Outdated Show resolved Hide resolved

def __exit__(self, exc_type, exc_value, exc_traceback):
if exc_type is not None:
log.warning(f"Cancelling operation due to exception: {exc_type.__name__}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love context managers until I need to think about exceptions. Here I'm not sure what exactly will happen, but it seems to me like you'd be swallowing the exception that arises within the context which might not be what you want. Can you double check this behaviour and be aware of what we want here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree.
Context manager is kind of convenience feature here though, for people who really need it.
It's possible to go without them, by explicitly calling bundle.execute() and this is where all exceptions occur.
I'll add this to the documentation and list caveats

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can make the error more explicit by raising it here from the arguments that you get instead of just finishing with a warning.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception is re-raised here implicitly, according to the documentation: https://docs.python.org/3/library/stdtypes.html#contextmanager.__exit__
The exception passed in should never be reraised explicitly - instead, this method should return a false value to indicate that the method completed successfully and does not want to suppress the raised exception.
Since this method return None, which is equivalent to False, all exceptions will be re-raised.
So it only the matter of either we want to add the warning or not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. Thanks for sharing the link

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice tests! Thank you. We could also check the context manager but tbh it's not really important.


class Bundle:
"""
This class allows to perform operations in bundles to improve performance by reducing number of network calls.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment saying that this class should not be created by the clients directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder whether we should make the users aware that this bundle exists from the "initialise_labels" and "save" method

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are docs for the "bundle" parameters referencing this one.
But perhaps it's worth to add a separate how-to explaining all the caveats with bundled approach, will do that

)

@staticmethod
def __batch_save_rows_reducer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small thing: you're slightly inconsistent with the user of _ and __

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrated to "_" everywhere for now. Don't have any preference though, so if you think "__" would be better - totally happy to go that way too

encord/http/bundle.py Outdated Show resolved Hide resolved
encord/http/bundle.py Outdated Show resolved Hide resolved
Copy link
Contributor

@denis-cord denis-cord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice so far. I'm stopping the review as I'm getting a bit confused to follow it. Maybe we can have a chat about the code or you can clarify some questions.
I'm pretty sure this is approvable but I haven't wrapped my head around it fully and I'd like to see at least a few integration tests. Are there available somewhere?

def _bundle_create_rows_reducer(
bundle_payload: BundledCreateRowsPayload, payload: BundledCreateRowsPayload
) -> BundledCreateRowsPayload:
bundle_payload.uids += payload.uids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you have corrected this bug in our call. However, if this was not caught by any of the tests I'd like to make sure that we revisit the tests to make them a bit more complete.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, added additional checks to the unit tests



@dataclass
class BatchSaveRowsPayload:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the lists have to have an equal length, you may want to consider having a list[BatchSaveRowsPayload] where each dataclass has a single uid and payload.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the reason to not to do that is BundledGetRowsPayload, that contains both list of uids, and general settings, like get_signed_url that are set for the whole batch.
And this is why we need all these reducers rather than able to just operate on list of "stuff".

I think current approach makes sense as it provides enough control over how exactly payloads are getting constructed. But yeah, definitely has its downsides

@dataclass
class BundledOperation(Generic[T, R]):
operation: Callable[..., List[R]]
reducer: Callable[[T, T], T]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might also want to call this result_mapper here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the below line of course :)

Copy link
Contributor

@denis-cord denis-cord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some more comments. Now that I understand it, I'm happy with it (after considering the comments at least).

Not sure if it can be simplified much further, but I think the more explicit naming is definitely helping me.

@dataclass
class BundledOperation(Generic[T, R]):
operation: Callable[..., List[R]]
reducer: Callable[[T, T], T]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the below line of course :)


def __exit__(self, exc_type, exc_value, exc_traceback):
if exc_type is not None:
log.warning(f"Cancelling operation due to exception: {exc_type.__name__}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can make the error more explicit by raising it here from the arguments that you get instead of just finishing with a warning.

request_reducer=self._bundle_get_rows_reducer,
result_mapper=BundleResultMapper(
mapping_function=self._bundle_get_rows_mapper,
result_handler=BundleResultHandler(predicate=self.label_hash, handler=self.from_labels_dict),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what is going on here, but I think this part confused me the most. Maybe I find it unusual that we have the predicate and the mapping_function in two different places and I'd have expected them to be one place. But maybe it's also just hard to simplify this problem.

Copy link
Contributor

@denis-cord denis-cord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done!


def __exit__(self, exc_type, exc_value, exc_traceback):
if exc_type is not None:
log.warning(f"Cancelling operation due to exception: {exc_type.__name__}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. Thanks for sharing the link

Copy link
Contributor

@denis-cord denis-cord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done!

@sergei-encord sergei-encord merged commit 52c0f81 into master May 15, 2023
3 checks passed
@sergei-encord sergei-encord deleted the label-row-bundled-operations branch May 15, 2023 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants