Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index path with original path name #5785

Merged
merged 12 commits into from Jun 18, 2019

Conversation

@stsewd
Copy link
Member

@stsewd stsewd commented Jun 11, 2019

Wen we index to es from the ImportedFiles models, we don't save the path from the model, but instead the page name from the fjson file. The page name doesn't include the extension, we used to rely on the doctype in the resolver, but that got removed.

We talked about reindexing all with path being the real path from the model, but that would require more changes, and we are not sure if that's the solution (removing page name = path).

So, for now I'm just adding a new field with the original path value.

Fix #5397

@stsewd
Copy link
Member Author

@stsewd stsewd commented Jun 12, 2019

This is going to break all projects, and they will need to trigger a build to be fixed... Guessing we don't want that. So, my thoughts:

I think we should serve this file from our servers
https://github.com/rtfd/readthedocs.org/blob/master/readthedocs/core/static-src/core/js/doc-embed/search.js

so, if we change something in the search api, we have control on how we represen it
Currently search relies in https://github.com/rtfd/readthedocs.org/blob/8258a3c98eae25175cafbdab70ad671cf40445a2/readthedocs/core/static-src/core/js/doc-embed/search.js#L41-L41
but we already know the full file path (with the extension!)
Or we can just keep serving that file from each build, but don't depend on anything else from the local js
like DOCUMENTATION_OPTIONS
So, another way to not break existing projects, create a new attribute doc.docurl
that returns the full correct link
and keep around doc.link with the current old behavior till we figure out how to migrate that

@ericholscher
Copy link
Member

@ericholscher ericholscher commented Jun 12, 2019

Yea, I think we should start indexing the full filename in a new field in ES, then add it as a new endpoint in the docsearch API. Our new code can use this, and the old stuff will continue to work.

@stsewd
Copy link
Member Author

@stsewd stsewd commented Jun 17, 2019

Let me know if you want me to upload the minified files here

if 'current_page_name' in data:
path = data['current_page_name']
else:
log.info('Unable to index file due to no name %s', filename)
return None
Copy link
Member Author

@stsewd stsewd Jun 18, 2019

If we return none from this function, get_processed_json is going to return None too, we always expect a dict from here. And get_processed_json has a default https://github.com/stsewd/readthedocs.org/blob/a2e0e3f5e442b0072a4b7cbac9efadbe2a41c224/readthedocs/projects/models.py#L1254-L1261

@stsewd stsewd requested review from ericholscher and Jun 18, 2019
@humitos
Copy link
Member

@humitos humitos commented Jun 18, 2019

We already have this information from the model

Could you edit the description of the PR to include a summary of the problem and what's the solution proposed by the PR? It's hard to review without this context to me.

@stsewd
Copy link
Member Author

@stsewd stsewd commented Jun 18, 2019

@humitos updated ^

@stsewd
Copy link
Member Author

@stsewd stsewd commented Jun 18, 2019

Hold the review for a moment, I think there is a bettter way to put the new field

@stsewd
Copy link
Member Author

@stsewd stsewd commented Jun 18, 2019

Done!

Copy link
Member

@ericholscher ericholscher left a comment

This is a much simpler change, and lets us slowly roll out the change across all the places we use search. 👍

log.info('Unable to read file: %s', filename)
return None
log.info('Unable to read file: %s', fjson_filename)
raise
Copy link
Member

@ericholscher ericholscher Jun 18, 2019

Won't this cause all indexing to fail if a single file is missing?

Copy link
Member

@ericholscher ericholscher Jun 18, 2019

Ah, I see we're catching it at a higher level, 👍

Copy link
Member Author

@stsewd stsewd Jun 18, 2019

We were returning None, but we expect to always have a dict.

This is caught by https://github.com/stsewd/readthedocs.org/blob/96a85fa8af3cac8b139bdf99598d119eae0e0163/readthedocs/projects/models.py#L1244-L1260

Which returns a default dict

@stsewd stsewd merged commit a908684 into readthedocs:master Jun 18, 2019
1 check passed
@stsewd stsewd deleted the index-with-real-path-name branch Jun 18, 2019
@stsewd
Copy link
Member Author

@stsewd stsewd commented Jun 18, 2019

Added to the deploy card that we need to do a full re-index to see this taking effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants