Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh search index after pages have been (re)moved #2013

Closed
prashanthpai opened this issue Feb 25, 2016 · 12 comments
Closed

Refresh search index after pages have been (re)moved #2013

prashanthpai opened this issue Feb 25, 2016 · 12 comments
Assignees
Labels
Improvement Minor improvement to code

Comments

@prashanthpai
Copy link

Details

Gluster uses RTD to host it's documentation. We noticed that search results points to old pages that have been removed or moved. How can the search index be rebuilt to reflect actual pages in repo ?

Example search query:
https://readthedocs.org//search/?q=DHT&check_keywords=yes&area=default&project=gluster&version=latest&type=file

The results of the above search query contains links that are dead because pages have been removed.

Thanks.

@agjohnson
Copy link
Contributor

There is some code that should be updating the indexes -- including deleting removed pages -- but there should be a nuclear option here as well, to rebuild the index.

@agjohnson agjohnson added the Improvement Minor improvement to code label Feb 26, 2016
@agjohnson agjohnson added this to the Search milestone Feb 26, 2016
@prashanthpai
Copy link
Author

@agjohnson If I understand correctly, the part of code that should update the index is broken (bug) and the nuclear option to rebuild the index from scratch is an enhancement (workaround for the bug) targeted for the future ?

Search is very important to users and documents keep changing all the time and index should reflect it, at least eventually if rebuilding index is an expensive backend operation.

Thanks 👍

@agjohnson
Copy link
Contributor

The index is updated as expected -- that is, all updated files get updated in the search index -- but we need to make some effort to detect deleted files in the repo and remove them from the index. This is a missing feature currently.

I say rebuild the index, but I meant wiping the index of the project + version build, and updating the index with the new build. This might be the most resilient way around this, deletion deletion might be hard to ensure.

@shaunix
Copy link

shaunix commented Mar 4, 2016

There seems to be code to do this here:

https://github.com/rtfd/readthedocs.org/blob/master/readthedocs/restapi/utils.py#L157

Along with a TODO that indicates it's untested. But my reading is that delete is set to False here:

https://github.com/rtfd/readthedocs.org/blob/master/readthedocs/core/management/commands/reindex_elasticsearch.py#L50

Set to False in this commit, but without much explanation on why:

1d422dc

Any tips on how I can help getting this working correctly?

@prashanthpai
Copy link
Author

Hi all, any update on this ?

Search is a very important functionality to all gluster users. We've had users (recently by @monotek) repeatably bring this up.

We're even contemplating converting all our docs from markdown to .rst to get rid of mkdocs and use sphinx which I believe has search built in. But this conversion is a humongous task that will need manual inspection despite tools available for such conversion.

RTD has been working well for GlusterFS mini-project libgfapi-python which uses .rst and sphinx.

It would really be helpful if an estimate can be provided when the broken search can be fixed or if it'll be fixed at all.

@ericholscher
Copy link
Member

Just want to make sure that we address this issue with our implementation. I believe our prototype using Elastic Search 6 (#4183) will fix this issue, but want to confirm that it will so bringing it up here.

@safwanrahman safwanrahman self-assigned this Jun 15, 2018
@safwanrahman
Copy link
Member

I strongly bet that the search index get automatically removed as soon as the file is removed.
I will add a test to make sure it works perfectly

safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jun 20, 2018
@safwanrahman safwanrahman moved this from Up next to In progress in Search update Jun 20, 2018
safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jun 20, 2018
safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jun 20, 2018
ericholscher added a commit that referenced this issue Jun 21, 2018
[Fix #2328 #2013] Refresh search index and test for case insensitive search
@ericholscher
Copy link
Member

This has been fixed in our new search code. It will be deployed in the next month or so, so closing this issue as it's been addressed.

Search update automation moved this from In progress to Done Jun 21, 2018
@safwanrahman
Copy link
Member

There was a bug in removing the index after file is removed. Fixed it in #4277.
Thanks @prashanthpai for filling the issue.

safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018
safwanrahman pushed a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018
[Fix readthedocs#2328 readthedocs#2013] Refresh search index and test for case insensitive search
safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018
safwanrahman pushed a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018
[Fix readthedocs#2328 readthedocs#2013] Refresh search index and test for case insensitive search
@jcampbell
Copy link

I continue to observe this behavior (search results including deleted pages) in our hosted docs. Is there something that might account for that and/or could I provide useful diagnostic information to help identify/resolve the issue?

@stsewd
Copy link
Member

stsewd commented Sep 13, 2019

@jcampbell please see #6069

@jcampbell
Copy link

Thanks for flagging that @stsewd : I have tried wiping and rebuilding, but to no effect (moved pages still show up twice in search results, with one link being broken); will comment on that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Improvement Minor improvement to code
Projects
No open projects
Search update
  
Done
Development

No branches or pull requests

7 participants