Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade elastic search to 7.x #5620

Closed
stsewd opened this issue Apr 22, 2019 · 20 comments · Fixed by #7582
Closed

Upgrade elastic search to 7.x #5620

stsewd opened this issue Apr 22, 2019 · 20 comments · Fixed by #7582
Labels
Accepted Accepted issue on our roadmap Improvement Minor improvement to code

Comments

@stsewd
Copy link
Member

stsewd commented Apr 22, 2019

https://www.elastic.co/blog/elasticsearch-7-0-0-released

Changelog https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking-changes-7.0.html

@stsewd stsewd added Improvement Minor improvement to code Priority: low Low priority labels Apr 22, 2019
@dojutsu-user
Copy link
Member

dojutsu-user commented Apr 23, 2019

We are using django-elasticsearch-dsl and unfortunately it is not actively maintained (last commit was on 8th Nov 2018).

What should we do in this case? I can see three options.

  • Wait for the update of django-elasticsearch-dsl.
  • Find another library.
  • Switch to only using official low-level library (elasticsearch-py), which is updated, but this involves lot of work.

Edit: django-elasticsearch-dsl is updated. 🎉

@dojutsu-user
Copy link
Member

Just a note.
My elasticsearch version is:

$ curl localhost:9200
{
  "name" : "j9iyXmN",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "b4kGzEhFSoiZufVXVlERfg",
  "version" : {
    "number" : "6.7.1",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "2f32220",
    "build_date" : "2019-04-02T15:59:27.961366Z",
    "build_snapshot" : false,
    "lucene_version" : "7.7.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

And all the tests pass.

@stsewd
Copy link
Member Author

stsewd commented Apr 25, 2019

To execute the elastic search tests, you need to pass an extra option to tox
tox -r -e py36 --including-search

@dojutsu-user
Copy link
Member

@stsewd
Yes... I am aware of that.
All tests are passing including the search tests. 😄

@humitos humitos added this to the Search improvements milestone Apr 27, 2019
@stale
Copy link

stale bot commented Jul 17, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: stale Issue will be considered inactive soon label Jul 17, 2019
@dojutsu-user dojutsu-user removed the Status: stale Issue will be considered inactive soon label Jul 17, 2019
@humitos
Copy link
Member

humitos commented Jul 18, 2019

@dojutsu-user is there anything actionable on this issue now? I'm not sure, but I think it's not possible to upgrade now. In that case, we should add why it's not possible to upgrade and what are the problems here to track them, and propose a plan --or close it, instead of having it open without adding value.

@dojutsu-user
Copy link
Member

@humitos
I don't think upgradation should pose any problems.
During the whole gsoc period, I have been using Elasticsearch 6.7

$ curl localhost:9200
{
  "name" : "j9iyXmN",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "b4kGzEhFSoiZufVXVlERfg",
  "version" : {
    "number" : "6.7.2",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "56c6e48",
    "build_date" : "2019-04-29T09:05:50.290371Z",
    "build_snapshot" : false,
    "lucene_version" : "7.7.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

I just read the django-elasticsearch-dsl is going to have a new release pretty soon -- django-es/django-elasticsearch-dsl#177 (comment) (But not for ES version 7)
I think we have to wait for few days until django-elasticsearch-dsl starts supporting ES v7

@humitos
Copy link
Member

humitos commented Jul 22, 2019

This is blocked on django-es/django-elasticsearch-dsl#170

@humitos humitos added the Status: blocked Issue is blocked on another issue label Jul 22, 2019
@dojutsu-user
Copy link
Member

I am unblocking this as django-es/django-elasticsearch-dsl#170 is closed and django-elasticsearch-dsl is supporting elasticsearch version 7 (https://pypi.org/project/django-elasticsearch-dsl/)

@dojutsu-user dojutsu-user removed the Status: blocked Issue is blocked on another issue label Aug 31, 2019
@stale
Copy link

stale bot commented Oct 15, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: stale Issue will be considered inactive soon label Oct 15, 2019
@dojutsu-user dojutsu-user removed the Status: stale Issue will be considered inactive soon label Oct 15, 2019
@stsewd stsewd added the Accepted Accepted issue on our roadmap label Oct 15, 2019
@stsewd stsewd removed the Priority: low Low priority label Jun 25, 2020
@stsewd
Copy link
Member Author

stsewd commented Jun 25, 2020

We received an email from ES that we need to migrate to a recent version, since 6.x is EOL, this isn't low priority anymore.

@stsewd
Copy link
Member Author

stsewd commented Aug 24, 2020

Just checked, we are running v6.5.4 in production, we need to update to 6.8.12 before updating to a mayor version.

@stsewd
Copy link
Member Author

stsewd commented Aug 26, 2020

Changelog for 6.6, 6.7 and 6.8

We are good to upgrade from 6.5 to 6.8. And we don't need a re-index or downtime.

Migration between minor versions — e.g. 6.x to 6.y — can be performed by upgrading one node at a time.

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/breaking-changes.html

A rolling upgrade allows an Elasticsearch cluster to be upgraded one node at a time so upgrading does not interrupt service.

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/rolling-upgrades.html

@stsewd
Copy link
Member Author

stsewd commented Oct 26, 2020

How to deploy avoiding downtime

Before the deploy

This can be done a day or two before the deploy

  • Create new deploy in ES cloud with ES 7.x
  • Change the ops repo to point to the new ES host
  • Deploy web-extra or a new instance with the code from ES 7.x
  • Trigger a re-index to the new deploy

During/after the deploy

  • Deploy the new instances using 7.x. Here we will have two instances running, one for 6.x and the other one with 7.x (but each one will be pointing to a different deploy in ES cloud)
  • When only the 7.x instances are running, trigger a re-index.
    Here we only need to re-index the projects with new builds from the last 24/48 hours,
    we can use the script from Upgrade elastic search to 7.x #5620 (comment)
  • Make sure everything is working
  • Delete the old deploy

This won't cause downtime, but it will give outdated results from a time period
(while we deploy the new instances and re-index).
We could communicate this to users beforehand if we want.

@ericholscher
Copy link
Member

@stsewd on the re-index during "deploy", we should only need to reindex the past 1 day of data, right? So that should be pretty quick. I think this plan sounds good to me. The full reindex might take somewhere around 8-10 hours tho, so we should plan ahead for that.

@stsewd
Copy link
Member Author

stsewd commented Oct 26, 2020

on the re-index during "deploy", we should only need to reindex the past 1 day of data, right?

Yes, I'll see if I can change the management command to accept that argument or just write a script

@ericholscher
Copy link
Member

Pretty sure it already supports this, or we have some kind of code that can handle it already.

@ericholscher
Copy link
Member

Yea, I have this in my notes:

from datetime import datetime, timedelta
from readthedocs.search.documents import PageDocument
from readthedocs.search.utils import index_new_files

kwargs = {'hours': 48}
since = datetime.now() - timedelta(**kwargs)

ps = Project.objects.filter(versions__builds__date__gte=since).distinct()
print("Indexing %s" % len(ps))
for project_obj in ps:
  for version_obj in project_obj.versions.filter(active=True, built=True):
    index_new_files(HTMLFile, version_obj, build=version_obj.builds.latest().pk)

Something similar should work.

@stsewd
Copy link
Member Author

stsewd commented Oct 26, 2020

Great, I have updated my comment with that.

@ericholscher
Copy link
Member

Great -- the only other thing we should consider is what QA will look like on the new vs old cluster. We've had issues in the past with reindexing, so it would be good to have 5-10 queries that we want to test to make sure the results look similar. In particular, the number of results for broad searches, and also the range of versions.

Some of this is that we don't do a great job of cleaning up our indexes. So the current index certainly have some invalid/old/deleted data, but we also need to make sure we aren't missing important things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap Improvement Minor improvement to code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants