Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify es indexing #5798

Merged
merged 12 commits into from Jun 13, 2019
Merged

Simplify es indexing #5798

merged 12 commits into from Jun 13, 2019

Conversation

@stsewd
Copy link
Member

@stsewd stsewd commented Jun 12, 2019

  • Replace signals with function calls (we don't use them in any other part of the code)
  • Catch common exceptions to not stop the index process
  • Recreate and reindex search objects on each build, so we are safe from weird edge cases.
  • I'm leaving the commit and md5 attributes, could be used for something else or debugging... Or we can removet hem in another PR, this is already big :)

Close #5781
Fix #5668

@@ -61,6 +60,7 @@
from readthedocs.doc_builder.loader import get_builder_class
from readthedocs.doc_builder.python_environments import Conda, Virtualenv
from readthedocs.projects.models import APIProject, Feature
from readthedocs.search.signals import index_new_files, remove_indexed_files
Copy link
Member Author

@stsewd stsewd Jun 12, 2019

Those should be moved to another module, but another PR or after review, so is easy to see the diff :)

)
return

path = project.rtd_build_path(version.slug)
if path:
Copy link
Member Author

@stsewd stsewd Jun 12, 2019

path is always different from None or empty

@stsewd stsewd requested review from ericholscher and Jun 12, 2019
domain_verify,
files_changed,
Copy link
Member

@ericholscher ericholscher Jun 13, 2019

I believe this was actually used in our CDN app, so we might want to keep it around.

Copy link
Member Author

@stsewd stsewd Jun 13, 2019

Forgot about to check the rtd-ext, checking now

Copy link
Member

@ericholscher ericholscher Jun 13, 2019

In particular, we need this to be able to purge old files I believe. Perhaps we can just try another approach if we need CDN again though.

Copy link
Member

@ericholscher ericholscher left a comment

This is a good simplification, but it doesn't quite work. We need to not have time where the ES index is empty.


# Objects from the db must exists before calling this function,
# because the task will query the DB for the objects before deleting
remove_indexed_files(model=SphinxDomain, version=version)
Copy link
Member

@ericholscher ericholscher Jun 13, 2019

This is going to cause any search that happens after this and before indexing to fail. That isn't a good UX, that's the reason we do the entire delete by commit approach.

@@ -1191,6 +1191,7 @@ class ImportedFile(models.Model):
path = models.CharField(_('Path'), max_length=4096)
md5 = models.CharField(_('MD5 checksum'), max_length=255)
commit = models.CharField(_('Commit'), max_length=255)
build = models.IntegerField(_('Build id'), null=True)
Copy link
Member Author

@stsewd stsewd Jun 13, 2019

I put this as a integer field instead of a foreign key, to keep it simple, we don't require access to the object at all, just the unique number

Copy link
Member

@ericholscher ericholscher Jun 13, 2019

I wonder if it should be called build_id to clarify that.

@stsewd
Copy link
Member Author

@stsewd stsewd commented Jun 13, 2019

I didn't see any error in ES related to the new field, so I guess we are fine there :)

Copy link
Member

@ericholscher ericholscher left a comment

Like this approach much better. It looks great to me. I don't care too much about the model name, so feel free to ignore that suggestion 👍

@@ -1191,6 +1191,7 @@ class ImportedFile(models.Model):
path = models.CharField(_('Path'), max_length=4096)
md5 = models.CharField(_('MD5 checksum'), max_length=255)
commit = models.CharField(_('Commit'), max_length=255)
build = models.IntegerField(_('Build id'), null=True)
Copy link
Member

@ericholscher ericholscher Jun 13, 2019

I wonder if it should be called build_id to clarify that.

readthedocs/projects/tasks.py Show resolved Hide resolved
readthedocs/projects/tasks.py Outdated Show resolved Hide resolved
@stsewd
Copy link
Member Author

@stsewd stsewd commented Jun 13, 2019

I already did the migration locally, so I'm keeping the name, bc I'm lazy :/. Tests fixed!

@stsewd stsewd merged commit 130e62e into readthedocs:master Jun 13, 2019
1 check passed
@stsewd stsewd deleted the simplify-es-indexing branch Jun 13, 2019
stsewd added a commit to stsewd/readthedocs.org that referenced this issue Jun 13, 2019
@stsewd stsewd mentioned this pull request Jun 13, 2019
saadmk11 added a commit to saadmk11/readthedocs.org that referenced this issue Jun 17, 2019
saadmk11 added a commit to saadmk11/readthedocs.org that referenced this issue Jun 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

2 participants