-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
88: add entities methods create_many, delete, merge #93
Conversation
@yanokwa @tobiasmcnulty does it seem like this new |
a192675
to
531cb2a
Compare
@lindsay-stevens This looks great! Thanks for adding support for these methods directly in the library. I might not be able to test right away, but if real-world feedback would be helpful and is not too urgent, I can try to prioritize testing it and report back. My only initial comment is that single-threaded deletes and updates can be quite slow (think 6 minutes instead of 6 seconds for ~5000 rows, when comparing def delete_entities(
client,
project_id,
entity_list_name,
uuids_to_delete,
):
def delete_entity(uuid):
delete_resp = client.delete(
f"/projects/{project_id}/datasets/{entity_list_name}/entities/{uuid}",
)
maybe_fail_loudly_for_odk_resp(delete_resp)
pool = ThreadPool(20)
print(f"Deleting {len(uuids_to_delete)} UUIDs...")
pool.map(delete_entity, uuids_to_delete)
print(f"Done.") |
@tobiasmcnulty thanks! Testing would not be urgent, but appreciated if you have a chance. About the use cases:
Incorporating some kind of async might be out of scope for this PR. Since pyodk is a library, if it opens threadpools, and user app code then opens threadpools to invoke pyodk, we can end up with unexpectedly a lot of threads. Also a flood of requests to Central. I don't know if there are plans for it, but maybe Central could have a batch update and batch delete endpoint, like there is for create. Another approach could be make pyodk use aiohttp or similar, but that's a large refactor. With the current code, an approach could be to monkey-patch |
We may have upwards of 3M entities to update (don't worry, not all in one list), so 6 minutes -> 6 seconds is a big improvement. I agree that long term it would be better to implement these changes server side, so take take the feedback with a grain of salt. Perhaps just a note in the docs so warn users before they can't figure out why |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the create_many
and merge
signatures should be more similar. I've made a few suggestions to that effect and am interested to hear what you think, @lindsay-stevens
How about also adding a brief note about performance in the merge
and delete
doc comments, maybe also adding one to update
? Something to the effect of "this method sends requests one at a time and may be slow for larger numbers of Entities. Consider making parallel calls and describing your use case on the ODK forum."
I added a note to |
- chg: move insert data reshaping to create_many so that it's easier to use and accepts same kind of data as merge. - add: create_source/source_size parameter for both methods to allow specifying this info separately to the data. - chg: docstring improvements
- from mkdocstrings-python dependency on griffe>=0.37 - griffe public API changed in v1.0.0 released 2 weeks ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making those updates!
Closes #88
What has been done to verify that this works as intended?
Unit tests for the data processor method
_prep_data_for_merge
, and integration tests intest_client.py
.Why is this the best possible solution? Were any other approaches considered?
The original ticket called for an example script but 1) this seemed like functionality that would be widely useful enough to be a part of the library itself, and 2) it was somewhat complicated to implement and test (lots of options and edge cases) so as part of the library it should be easier to actively maintain.
Also, I added the use case to the existing example script
create_entities_from_submissions.py
because there was an almost complete overlap in the use case - it didn't involve updating data before, but you can do the same thing withmerge
. The tests contain examples of numerous other usage / call patterns.How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?
Provides some handy new ways to interact with Entities
Do we need any specific form for testing your changes? If so, please attach one.
No, the tests include all necessary data.
Does this change require updates to documentation? If so, please file an issue here and include the link below.
Before submitting this PR, please make sure you have:
tests
python -m unittest
and verified all tests passruff format pyodk tests
andruff check pyodk tests
to lint code