New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh search index after pages have been (re)moved #2013

Closed
prashanthpai opened this Issue Feb 25, 2016 · 9 comments

Comments

5 participants
@prashanthpai
Copy link

prashanthpai commented Feb 25, 2016

Details

Gluster uses RTD to host it's documentation. We noticed that search results points to old pages that have been removed or moved. How can the search index be rebuilt to reflect actual pages in repo ?

Example search query:
https://readthedocs.org//search/?q=DHT&check_keywords=yes&area=default&project=gluster&version=latest&type=file

The results of the above search query contains links that are dead because pages have been removed.

Thanks.

@agjohnson

This comment has been minimized.

Copy link
Contributor

agjohnson commented Feb 26, 2016

There is some code that should be updating the indexes -- including deleting removed pages -- but there should be a nuclear option here as well, to rebuild the index.

@agjohnson agjohnson added this to the Search milestone Feb 26, 2016

@prashanthpai

This comment has been minimized.

Copy link
Author

prashanthpai commented Feb 26, 2016

@agjohnson If I understand correctly, the part of code that should update the index is broken (bug) and the nuclear option to rebuild the index from scratch is an enhancement (workaround for the bug) targeted for the future ?

Search is very important to users and documents keep changing all the time and index should reflect it, at least eventually if rebuilding index is an expensive backend operation.

Thanks 👍

@agjohnson

This comment has been minimized.

Copy link
Contributor

agjohnson commented Feb 26, 2016

The index is updated as expected -- that is, all updated files get updated in the search index -- but we need to make some effort to detect deleted files in the repo and remove them from the index. This is a missing feature currently.

I say rebuild the index, but I meant wiping the index of the project + version build, and updating the index with the new build. This might be the most resilient way around this, deletion deletion might be hard to ensure.

@shaunix

This comment has been minimized.

Copy link

shaunix commented Mar 4, 2016

There seems to be code to do this here:

https://github.com/rtfd/readthedocs.org/blob/master/readthedocs/restapi/utils.py#L157

Along with a TODO that indicates it's untested. But my reading is that delete is set to False here:

https://github.com/rtfd/readthedocs.org/blob/master/readthedocs/core/management/commands/reindex_elasticsearch.py#L50

Set to False in this commit, but without much explanation on why:

1d422dc

Any tips on how I can help getting this working correctly?

@prashanthpai

This comment has been minimized.

Copy link
Author

prashanthpai commented Jun 23, 2016

Hi all, any update on this ?

Search is a very important functionality to all gluster users. We've had users (recently by @monotek) repeatably bring this up.

We're even contemplating converting all our docs from markdown to .rst to get rid of mkdocs and use sphinx which I believe has search built in. But this conversion is a humongous task that will need manual inspection despite tools available for such conversion.

RTD has been working well for GlusterFS mini-project libgfapi-python which uses .rst and sphinx.

It would really be helpful if an estimate can be provided when the broken search can be fixed or if it'll be fixed at all.

@agjohnson agjohnson modified the milestones: Search, Search improvements Mar 29, 2018

@safwanrahman safwanrahman added this to Backlog in Search update via automation May 21, 2018

@ericholscher ericholscher moved this from Backlog to Up next in Search update Jun 15, 2018

@ericholscher

This comment has been minimized.

Copy link
Member

ericholscher commented Jun 15, 2018

Just want to make sure that we address this issue with our implementation. I believe our prototype using Elastic Search 6 (#4183) will fix this issue, but want to confirm that it will so bringing it up here.

@safwanrahman safwanrahman self-assigned this Jun 15, 2018

@safwanrahman

This comment has been minimized.

Copy link
Member

safwanrahman commented Jun 15, 2018

I strongly bet that the search index get automatically removed as soon as the file is removed.
I will add a test to make sure it works perfectly

safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jun 20, 2018

@safwanrahman safwanrahman moved this from Up next to In progress in Search update Jun 20, 2018

safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jun 20, 2018

safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jun 20, 2018

ericholscher added a commit that referenced this issue Jun 21, 2018

Merge pull request #4277 from safwanrahman/insensitive
[Fix #2328 #2013] Refresh search index and test for case insensitive search
@ericholscher

This comment has been minimized.

Copy link
Member

ericholscher commented Jun 21, 2018

This has been fixed in our new search code. It will be deployed in the next month or so, so closing this issue as it's been addressed.

Search update automation moved this from In progress to Done Jun 21, 2018

@safwanrahman

This comment has been minimized.

Copy link
Member

safwanrahman commented Jun 21, 2018

There was a bug in removing the index after file is removed. Fixed it in #4277.
Thanks @prashanthpai for filling the issue.

safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018

safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018

Merge pull request rtfd#4277 from safwanrahman/insensitive
[Fix rtfd#2328 rtfd#2013] Refresh search index and test for case insensitive search

safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018

safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018

Merge pull request rtfd#4277 from safwanrahman/insensitive
[Fix rtfd#2328 rtfd#2013] Refresh search index and test for case insensitive search
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment