Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add migrations for mapping changes #60

Merged
merged 6 commits into from
Mar 18, 2021
Merged

Add migrations for mapping changes #60

merged 6 commits into from
Mar 18, 2021

Conversation

alexashley
Copy link
Contributor

This is the second part of #58 after splitting it out.

Originally we tried to implement migrations using a lock in order to prevent multiple instances of grafeas-elasticsearch from attempting to do migrations all at once. However, as we started to dive into error cases, we realized that would require some manual intervention if the app crashed or was terminated in the middle of a migration.

After that, we stumbled on this migration RFC for Kibana, which outlines a lockless migration that can happen in parallel with multiple instances. There's potentially some duplicate work, but the end result should be the same and shouldn't require manual intervention. We deviated some from what's outlined there since it's also concerned with data migrations, but it seems like a good template to follow if we want to add those in the future. We can also certainly go down the lock route as well, as long as we document what needs to be done if the migration is interrupted.

Some assumptions:

  • indices that belong to grafeas-elasticsearch start with grafeas and have a value of grafeas in the field mappings._meta.type on the index
  • all grafeas-elasticsearch indices have a version and it's specified in the index name (i.e., grafeas-$VERSION).
  • each index only has a single document kind/type (e.g., occurrences, notes, or projects)

Migration steps:

  • load the latest mappings for all document kinds
  • query all indices
  • if the index belongs to grafeas, determine its version by parsing the index name
  • if the version doesn't match the latest version for its document kind, add it to a list of indices to migrate
  • for each index that should be migrated:
    • check if the index already has a write block, if not place the block on it (there's a read_only block as well, but that would prevent us from switching the alias name over).
    • create the new index, continuing if it already exists
    • start a reindex from the old to the new, configured so that we move documents over that aren't already in the new index
    • poll the reindex task until its complete or we run out of attempts (right now that's set to 10 attempts with a 10 second sleep in between -- we may want to do some load testing to determine what the right number should be)
    • delete the reindex task document
    • cut the alias over to the new index
    • delete the old index

We ran a few manual tests for this, the setup for those is in the migration-tests branch. The first tests runs a migration, the next starts multiple instances of grafeas-elasticsearch that will all attempt to migrate, and the final one fails the migration after every step to check that the process is reentrant. Potentially if we were to extract this into its own library, we could flesh those tests out more.

Lastly, some of this is too specific to grafeas-elasticsearch, but if we wanted to extract this out for use in Rode, then I think we could make the IndexManager implementation more generic and pass in a function that could be used to filter indices.

}

log.Info("Task incomplete, waiting before polling again", zap.String("taskId", taskCreationResponse.Task))
timeSleep(time.Second * 10)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, we arbitrarily picked how long to wait and for how many attempts.

To try to prove that out more, we did some load testing. The reindex operation takes documents in batches of 1000 so there's not a performance difference until an index has tens of thousands of documents:

# occurrences run 1 run 2 run 3
10 10.372776324s 10.390366305s 10.449805181s
100 10.369810367s 10.436173865s 10.386952243s
1000 10.392087759s 10.452114159s 10.457958686s
10,000 10.531893704s 10.421936475s 10.436905281s
100,000 20.535902343s 20.458385589s 20.463723459s

We ran these tests against Docker Desktop for Mac with a cumulative 6 cores and 8GB of RAM assigned to the Linux VM, which might be more than the typical ES node would have.

@BFisch14 BFisch14 linked an issue Mar 18, 2021 that may be closed by this pull request
@aalsabag aalsabag self-requested a review March 18, 2021 16:14
@aalsabag
Copy link
Contributor

Lastly, some of this is too specific to grafeas-elasticsearch, but if we wanted to extract this out for use in Rode, then I think we could make the IndexManager implementation more generic and pass in a function that could be used to filter indices.

Yeah definitely something for us to consider as we expand out the responsibility of Rode. For another time though

Copy link
Contributor

@aalsabag aalsabag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Elasticsearch mapping migrations
3 participants