Add pagination and bulk update support for User(discord user) serializer.#378
Conversation
Pagination is done via PageNumberPagination, i.e, each page contains a specific number of `user` objects.
implemented a method to handle bulk updates on user model via a new endpoint: /bot/users/bulk_patch
|
Thank you for contributing to Python Discord! Please check out the following documents:
|
…ser object from database.
|
Okay, before I really jump into this review, I need you to take a look at the |
|
To be clear, I'm not saying you should necessarily be using |
|
I managed to simplify the code in the bulk update PR, I no longer need to override the Coming to the So, I would say that we can get rid of |
|
Okay @RohanJnr - Then I think you should get rid of the bulk plugin. |
…mplify UserListSerializer `to_internal_value()` function is not longer overriden in UserListSerializer, this is due to explicitly stating the `id` field in UserSerializer as mentioned in the documentation. Override `create()` method in UserListSerializer and override `get_serializer()` method in `UserViewSet` to support bulk creation.
…creation of User Models.
|
This PR is ready for review. |
c1442bb to
982ef75
Compare
the Pipfile.lock conflict was resolved by re-locking the pipfile.
|
The bulk update method needs optimization. It calls the database on every instance that needs to be updated. |
…thod The Model.objects.bulk_update() method greatly reduces the number of SQL queries by updating all required instances in 1 query.
|
here are some rough benchmarks of the API changes coming in this PR Creating 10000 users(bulk). Updating 1000 users all in 1 SQL query. Updating 10000 users all in 1 SQL query. Updating 10000 users by updating 1000 users in 1 SQL query. Updating 10000 users by updating 2000 users in 1 SQL query. Looks like 2000 users in 1 SQL query is the sweet spot. |
MrHemlock
left a comment
There was a problem hiding this comment.
Everything seems to be in order.
MarkKoz
left a comment
There was a problem hiding this comment.
Apologies for so many incremental reviews. I've finally done a comprehensive review and test of the API. Along with my request changes, here are the issues I encountered:
-
Posting a single user (i.e. no bulk_create) that already exists results in a 500 response because of an
IntegrityError. This should be re-raises as 400 error instead. -
Posting multiple users which exist with
bulk_createworks but the response still includes objects for users which were duplicates. It's just a copy of the data given in the request rather than what's in the database.For example, if you try to set
in_guildfrom false to true for an existing user, the response will return the object within_guild: trueeven though it never changed and is still false in the DB.This isn't too important since we never check the return value, but if you know of an easy fix then you should go for it. In fact, you can respond with nothing since we don't rely on the response. This was profiled not too long ago, and it turns out that serialisation takes a significant amount of time. Granted, it won't have as much of a performance benefit at this scale (IIRC we benchmarked serialisation of all users in a single response).
-
bulk_patchreturns duplicate objects in the response. More specifically, the amount of duplicates it returns depends on the number of fields that were given in the request. Again, not too important, but fix it if you can. -
Checking for duplicate IDs in the request doesn't work because
filtered_instances = queryset.filter(id__in=object_ids)effectively filters out duplicates before they ever reach the serialiser. This is also indicative of a hole in your tests; there should be a test that catches this issue.I suppose you could merge the duplicate check with the creation of
[item["id"] for item in request.data]if you use a normal loop instead of a comprehension and use the same concept with a set as before.
| """ | ||
| Update multiple User objects in a single request. | ||
|
|
||
| ## Route |
There was a problem hiding this comment.
This detailed endpoint info should be in the class docstring rather than in the method's docstring.
| ## Route | ||
| ### PATCH /bot/users/bulk_patch | ||
| Update all users with the IDs. | ||
| `id` field is mandatory, rest are optional. |
There was a problem hiding this comment.
Not quite accurate because it raises a validation error (400) if there are no fields besides the ID. Therefore, the ID and at least one other field are mandatory.
| ### POST /bot/users | ||
| Adds a single or multiple new users. | ||
| The roles attached to the user(s) must be roles known by the site. | ||
| User creation process will be skipped if user is already present in the database. |
There was a problem hiding this comment.
The way you worded it is unclear in the sense that it may imply the entire operation will abort.
| User creation process will be skipped if user is already present in the database. | |
| Users that already exist in the database will be skipped. |
| - page_size: Number of Users in one page. | ||
| - page: Page number |
There was a problem hiding this comment.
| - page_size: Number of Users in one page. | |
| - page: Page number | |
| - page_size: number of Users in one page | |
| - page: page number |
There was a problem hiding this comment.
It should also be noted these are optional parameters and that the default page size is 10k.
| #### Status codes | ||
| - 201: returned on success | ||
| - 400: if one of the given roles does not exist, or one of the given fields is invalid | ||
| - 400: if multiple user objects with the same id are given. |
There was a problem hiding this comment.
| - 400: if multiple user objects with the same id are given. | |
| - 400: if multiple user objects with the same id are given |
| - 200: returned on success. | ||
| - 400: if the request body was invalid, see response body for details. | ||
| - 404: if the user with the given id does not exist. |
There was a problem hiding this comment.
| - 200: returned on success. | |
| - 400: if the request body was invalid, see response body for details. | |
| - 404: if the user with the given id does not exist. | |
| - 200: returned on success | |
| - 400: if the request body was invalid, see response body for details | |
| - 404: if the user with the given id does not exist |
| except KeyError: | ||
| # user ID not provided in request body. | ||
| resp = { | ||
| "Error": "User ID not provided." |
There was a problem hiding this comment.
For consistency with the rest of DRF, it should be named "detail".
| "Error": "User ID not provided." | |
| "detail": "The id field is missing from at least one object." |
There was a problem hiding this comment.
Alternatively, you can raise a validation error with
raise ValidationError({"id": ["This field is required."]})Which may or may not be more idiomatic here. It's in a list cause that's how DRF normally formats such errors. I am leaning more towards the ValidationError approach.
|
|
||
| if not fields_to_update: | ||
| # Raise ValidationError when only id field is given. | ||
| raise ValidationError({"data": "Insufficient data provided."}) |
There was a problem hiding this comment.
By convention, using "data" as a key name implies that there's something wrong with a field named data. If you need to raise ValidationErrors that aren't specific to a field, use the configured NON_FIELD_ERRORS_KEY as the key for consistency.
Furthermore, maybe the value should be in a list since this seems to be DRF convention. I think the idea is that there could be several errors for the same field/key, hence the list.
| try: | ||
| user = instance_mapping[user_data["id"]] | ||
| except KeyError: | ||
| raise NotFound({"id": f"User with id {user_data['id']} not found."}) |
There was a problem hiding this comment.
Only ValidationErrors use the convention of the field name as the key. 404s raised by DRF elsewhere use detail as the key.
| raise NotFound({"id": f"User with id {user_data['id']} not found."}) | |
| raise NotFound({"detail": f"User with id {user_data['id']} not found."}) |
There was a problem hiding this comment.
I'm adding this comment for posterity: the reason it's okay to raise a 404 here and risk failing the entire request is because the syncer shouldn't ever run into a race condition where it's trying to update a user that has been deleted (or not created yet). The former is impossible because it never deletes users; it only sets in_guild as false. The latter is impossible because if it sees the user doesn't exist in the DB, it will try to create rather than update.
| ... 'id': int, | ||
| ... 'name': str, | ||
| ... 'discriminator': int, | ||
| ... 'roles': List[int], | ||
| ... 'in_guild': bool | ||
| ... }, | ||
| ... { | ||
| ... 'id': int, | ||
| ... 'name': str, | ||
| ... 'discriminator': int, | ||
| ... 'roles': List[int], | ||
| ... 'in_guild': bool |
There was a problem hiding this comment.
The dict items should be indented.
| ... 'id': int, | |
| ... 'name': str, | |
| ... 'discriminator': int, | |
| ... 'roles': List[int], | |
| ... 'in_guild': bool | |
| ... }, | |
| ... { | |
| ... 'id': int, | |
| ... 'name': str, | |
| ... 'discriminator': int, | |
| ... 'roles': List[int], | |
| ... 'in_guild': bool | |
| ... 'id': int, | |
| ... 'name': str, | |
| ... 'discriminator': int, | |
| ... 'roles': List[int], | |
| ... 'in_guild': bool | |
| ... }, | |
| ... { | |
| ... 'id': int, | |
| ... 'name': str, | |
| ... 'discriminator': int, | |
| ... 'roles': List[int], | |
| ... 'in_guild': bool |
| user = instance_mapping[user_data["id"]] | ||
| except KeyError: | ||
| raise NotFound({"id": f"User with id {user_data['id']} not found."}) | ||
| raise NotFound({"detail": [f"User with id {user_data['id']} not found."]}) |
There was a problem hiding this comment.
Using a list is a ValidationError convention only
| raise NotFound({"detail": [f"User with id {user_data['id']} not found."]}) | |
| raise NotFound({"detail": f"User with id {user_data['id']} not found."}) |
|
|
||
| return User.objects.bulk_create(new_users, ignore_conflicts=True) | ||
| users = User.objects.bulk_create(new_users, ignore_conflicts=True) | ||
| return User.objects.filter(id__in=[user.id for user in users]) |
There was a problem hiding this comment.
You misunderstood what I meant by duplicates. I mean that it still returns objects which weren't created because the ID is already in the database. What you're trying to solve is removing duplicate objects in the response, which is impossible.
To fix this, you'd have to know which users are already in the DB before you do a bulk_create. This sounds relatively slow, so it may not be worth fixing (or return nothing) since we don't even read the response.
There was a problem hiding this comment.
alright, I will return an empty list then.
There was a problem hiding this comment.
Okay, don't forget to document that.
| for user_dict in validated_data: | ||
| if user_dict["id"] in seen: | ||
| raise ValidationError( | ||
| {"id": f"User with ID {user_dict['id']} given multiple times."} |
There was a problem hiding this comment.
| {"id": f"User with ID {user_dict['id']} given multiple times."} | |
| {"id": [f"User with ID {user_dict['id']} given multiple times."]} |
| object_ids = set() | ||
| for data in request.data: | ||
| try: | ||
| if data["id"] in object_ids: | ||
| # If request data contains users with same ID. | ||
| raise ValidationError( | ||
| {"id": [f"User with ID {data['id']} given multiple times."]} | ||
| ) | ||
| except KeyError: | ||
| # If user ID not provided in request body. | ||
| raise ValidationError( | ||
| {"id": ["This field is required."]} | ||
| ) | ||
| object_ids.add(data["id"]) | ||
|
|
||
| filtered_instances = queryset.filter(id__in=object_ids) |
There was a problem hiding this comment.
Since it's performing validation, I think it makes more sense for this code to be in the serialiser. In the hypothetical case that the serialiser needs to be used elsewhere, this code would have to be duplicated.
|
I also got this warning: |
…onsistent results with an unordered object_list: <class 'pydis_site.apps.api.models.bot.user.User'> QuerySet.
MarkKoz
left a comment
There was a problem hiding this comment.
Excellent work. Thanks for putting up with my scrutiny.
closes #375
Implementation
Pagination
Bulk Update
UserSerializerhas been implemented to support bulk updates.(ref to docs: drf-Customizing multiple update)Additional
Tests for new features.