Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search: return relatives URLS #7376

Merged
merged 1 commit into from Aug 13, 2020
Merged

Search: return relatives URLS #7376

merged 1 commit into from Aug 13, 2020

Conversation

stsewd
Copy link
Member

@stsewd stsewd commented Aug 10, 2020

Closes #7311

Copy link
Member

@ericholscher ericholscher left a comment

This looks good except for the hacky logic to get the path & domain :)

highlights = PageHighlightSerializer(source='meta.highlight', default=dict)
blocks = serializers.SerializerMethodField()

def get_link(self, obj):
def get_domain(self, obj):
Copy link
Member

@ericholscher ericholscher Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely seems like it's adding a bunch of queries on both of these functions to do the full resolve and then ignore parts of it (eg. we don't care about subprojects for the domain). Is there a reason not to just call the resolver resolve_path and resolve_domain directly here?

Copy link
Member Author

@stsewd stsewd Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result from _get_full_path is cached, and we already pass project_data into the context, so this won't generate any extra queries.

Copy link
Member Author

@stsewd stsewd Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def get_serializer_context(self):
context = super().get_serializer_context()
context['projects_data'] = self._get_all_projects_data()
return context

Copy link
Member Author

@stsewd stsewd Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling resolve_path and resolve_domain here will generate extra queries.

Copy link
Member

@ericholscher ericholscher Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like projects_data is only used in this one place, so I don't understand why we're caching it prior to calling this code? Seems like we could just remove all the pre-setting and only set it here when we actually use it?

Copy link
Member Author

@stsewd stsewd Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The serializer only knows about one object, not about all of them. But the caller of this class has the list of all objects that the serializer is going to use, so it can retrieve all the data in one query.

Copy link
Member

@ericholscher ericholscher Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but you're doing queries for every Domain for every subproject with this approach, instead of querying the doctype for every Version, which will lead to the same number of queries?

Copy link
Member Author

@stsewd stsewd Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see what you mean, yeah, that can be optimized to query the domain only once.

Copy link
Member

@ericholscher ericholscher Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to worry too much about making this super efficient -- I'm actually saying we should make the code simpler rather than try and make it super fast. We don't do that many searches, so having simple code is probably better. I guess it might matter for projects with a lot of subprojects.

Copy link
Member

@ericholscher ericholscher Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think we need to think a bit more deeply about how to make this stuff faster at the resolver level, rather than trying to optimize specific areas. We've done this a few times, and really the solution should be "calling the resolver is always fast"

Copy link
Member

@ericholscher ericholscher left a comment

This seems fine for now, though a little complicated :)

@stsewd stsewd merged commit 393c7ed into master Aug 13, 2020
2 checks passed
@stsewd stsewd deleted the search-relative-urls branch Aug 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Return relative URLs in indoc search results
2 participants