-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add migrations for mapping changes #60
Conversation
} | ||
|
||
log.Info("Task incomplete, waiting before polling again", zap.String("taskId", taskCreationResponse.Task)) | ||
timeSleep(time.Second * 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, we arbitrarily picked how long to wait and for how many attempts.
To try to prove that out more, we did some load testing. The reindex operation takes documents in batches of 1000 so there's not a performance difference until an index has tens of thousands of documents:
# occurrences | run 1 | run 2 | run 3 |
---|---|---|---|
10 | 10.372776324s |
10.390366305s |
10.449805181s |
100 | 10.369810367s |
10.436173865s |
10.386952243s |
1000 | 10.392087759s |
10.452114159s |
10.457958686s |
10,000 | 10.531893704s |
10.421936475s |
10.436905281s |
100,000 | 20.535902343s |
20.458385589s |
20.463723459s |
We ran these tests against Docker Desktop for Mac with a cumulative 6 cores and 8GB of RAM assigned to the Linux VM, which might be more than the typical ES node would have.
Yeah definitely something for us to consider as we expand out the responsibility of Rode. For another time though |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This is the second part of #58 after splitting it out.
Originally we tried to implement migrations using a lock in order to prevent multiple instances of
grafeas-elasticsearch
from attempting to do migrations all at once. However, as we started to dive into error cases, we realized that would require some manual intervention if the app crashed or was terminated in the middle of a migration.After that, we stumbled on this migration RFC for Kibana, which outlines a lockless migration that can happen in parallel with multiple instances. There's potentially some duplicate work, but the end result should be the same and shouldn't require manual intervention. We deviated some from what's outlined there since it's also concerned with data migrations, but it seems like a good template to follow if we want to add those in the future. We can also certainly go down the lock route as well, as long as we document what needs to be done if the migration is interrupted.
Some assumptions:
grafeas-elasticsearch
start withgrafeas
and have a value ofgrafeas
in the fieldmappings._meta.type
on the indexgrafeas-elasticsearch
indices have a version and it's specified in the index name (i.e.,grafeas-$VERSION)
.Migration steps:
read_only
block as well, but that would prevent us from switching the alias name over).We ran a few manual tests for this, the setup for those is in the
migration-tests
branch. The first tests runs a migration, the next starts multiple instances ofgrafeas-elasticsearch
that will all attempt to migrate, and the final one fails the migration after every step to check that the process is reentrant. Potentially if we were to extract this into its own library, we could flesh those tests out more.Lastly, some of this is too specific to
grafeas-elasticsearch
, but if we wanted to extract this out for use in Rode, then I think we could make theIndexManager
implementation more generic and pass in a function that could be used to filter indices.