Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/collection_versions/all/ endpoint is now streamed #562

Merged
merged 1 commit into from May 14, 2021

Conversation

gerrod3
Copy link
Contributor

@gerrod3 gerrod3 commented Apr 16, 2021

This is super fast I think. Potentially fast enough to rethink when or if we will need publications.

@pulpbot
Copy link
Member

pulpbot commented Apr 16, 2021

WARNING!!! This PR is not attached to an issue. In most cases this is not advisable. Please see our PR docs for more information about how to attach this PR to an issue.

@daviddavis
Copy link
Contributor

Is it correct that the /collection_versions/all/ no longer accepts a page option? If so, we probably need a removal changelog entry.

@gerrod3
Copy link
Contributor Author

gerrod3 commented Apr 16, 2021

Is it correct that the /collection_versions/all/ no longer accepts a page option? If so, we probably need a removal changelog entry.

Pretty sure it never accepted a page option. This endpoint is suppose to be unpaginated, but the code still had remnants of the paginated code, probably from copying and pasting.

@gerrod3
Copy link
Contributor Author

gerrod3 commented Apr 16, 2021

@rochacbruno Still doing some testing to see if I can get it even faster. With this change I no longer get timed-out when accessing /collection_versions/all/ with 3000 collection_versions. Going to test it with 6000 collection versions and check that it still works, but I'm feeling that this might be the best current solution we can go with. Here are some results using django-silk and a distribution with 94 collection versions.

Current
Screenshot from 2021-04-15 17-46-17
Streamed
Screenshot from 2021-04-16 09-00-18

One other thing is that the streamed response doesn't use DRF's fancy styling, it looks pretty bad in the browser. Maybe I could fix this, but I don't think it's such a big deal.

Copy link
Member

@fao89 fao89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good, why does it still as draft?

@@ -6,10 +6,12 @@
from django.db.models import F
from django.db.models.expressions import Window
from django.db.models.functions.window import FirstValue
from django.http import StreamingHttpResponse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool!

Comment on lines 452 to 456
cvs_template_string = "[{{ versions|map('tojson')|join(',') }}]"
cvs_template = Template(cvs_template_string)
serialized_map = map(lambda x: self.get_serializer(x, context=context).data, queryset)
streamed = map(lambda x: x.encode("utf-8"), cvs_template.stream(versions=serialized_map))
return StreamingHttpResponse(streamed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we take these to UnpaginatedCollectionViewSet also? Or the timeout only happens here?

@pep8speaks
Copy link

pep8speaks commented Apr 21, 2021

Hello @gerrod3! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-05-13 20:45:20 UTC

@gerrod3 gerrod3 force-pushed the streamed_metadata branch 3 times, most recently from 84aef9a to 753a820 Compare April 21, 2021 16:39
@gerrod3 gerrod3 marked this pull request as ready for review May 13, 2021 18:43
@gerrod3 gerrod3 requested a review from a team as a code owner May 13, 2021 18:43
@gerrod3
Copy link
Contributor Author

gerrod3 commented May 13, 2021

@fao89 Should I tag this PR with the original issue for the unpaginated endpoint timeout https://pulp.plan.io/issues/8439?

@fao89
Copy link
Member

fao89 commented May 13, 2021

@fao89 Should I tag this PR with the original issue for the unpaginated endpoint timeout https://pulp.plan.io/issues/8439?

I think so!

Copy link
Member

@fao89 fao89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@gerrod3 gerrod3 force-pushed the streamed_metadata branch 2 times, most recently from ed3d0cd to 8f79a04 Compare May 13, 2021 20:40
Copy link
Member

@bmbouter bmbouter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice! Great job!

I believe we still need publications for 2 reasons:

  • to resolve "the deprecated field issues"
  • pre-computing something this large once and saving it is more efficient and will speed up the sync too. This is more of a nice-to-have really.

@gerrod3 gerrod3 merged commit c52a47d into pulp:master May 14, 2021
@gerrod3 gerrod3 deleted the streamed_metadata branch May 14, 2021 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants