Add migrations for mapping changes #60

alexashley · 2021-03-17T22:08:09Z

This is the second part of #58 after splitting it out.

Originally we tried to implement migrations using a lock in order to prevent multiple instances of grafeas-elasticsearch from attempting to do migrations all at once. However, as we started to dive into error cases, we realized that would require some manual intervention if the app crashed or was terminated in the middle of a migration.

After that, we stumbled on this migration RFC for Kibana, which outlines a lockless migration that can happen in parallel with multiple instances. There's potentially some duplicate work, but the end result should be the same and shouldn't require manual intervention. We deviated some from what's outlined there since it's also concerned with data migrations, but it seems like a good template to follow if we want to add those in the future. We can also certainly go down the lock route as well, as long as we document what needs to be done if the migration is interrupted.

Some assumptions:

indices that belong to grafeas-elasticsearch start with grafeas and have a value of grafeas in the field mappings._meta.type on the index
all grafeas-elasticsearch indices have a version and it's specified in the index name (i.e., grafeas-$VERSION).
each index only has a single document kind/type (e.g., occurrences, notes, or projects)

Migration steps:

load the latest mappings for all document kinds
query all indices
if the index belongs to grafeas, determine its version by parsing the index name
if the version doesn't match the latest version for its document kind, add it to a list of indices to migrate
for each index that should be migrated:
- check if the index already has a write block, if not place the block on it (there's a read_only block as well, but that would prevent us from switching the alias name over).
- create the new index, continuing if it already exists
- start a reindex from the old to the new, configured so that we move documents over that aren't already in the new index
- poll the reindex task until its complete or we run out of attempts (right now that's set to 10 attempts with a 10 second sleep in between -- we may want to do some load testing to determine what the right number should be)
- delete the reindex task document
- cut the alias over to the new index
- delete the old index

We ran a few manual tests for this, the setup for those is in the migration-tests branch. The first tests runs a migration, the next starts multiple instances of grafeas-elasticsearch that will all attempt to migrate, and the final one fails the migration after every step to check that the process is reentrant. Potentially if we were to extract this into its own library, we could flesh those tests out more.

Lastly, some of this is too specific to grafeas-elasticsearch, but if we wanted to extract this out for use in Rode, then I think we could make the IndexManager implementation more generic and pass in a function that could be used to filter indices.

alexashley · 2021-03-17T22:24:11Z

go/v1beta1/storage/migration/migration.go

+		}
+
+		log.Info("Task incomplete, waiting before polling again", zap.String("taskId", taskCreationResponse.Task))
+		timeSleep(time.Second * 10)


Initially, we arbitrarily picked how long to wait and for how many attempts.

To try to prove that out more, we did some load testing. The reindex operation takes documents in batches of 1000 so there's not a performance difference until an index has tens of thousands of documents:

# occurrences run 1 run 2 run 3

10 10.372776324s 10.390366305s 10.449805181s

100 10.369810367s 10.436173865s 10.386952243s

1000 10.392087759s 10.452114159s 10.457958686s

10,000 10.531893704s 10.421936475s 10.436905281s

100,000 20.535902343s 20.458385589s 20.463723459s

We ran these tests against Docker Desktop for Mac with a cumulative 6 cores and 8GB of RAM assigned to the Linux VM, which might be more than the typical ES node would have.

aalsabag · 2021-03-18T17:40:32Z

Lastly, some of this is too specific to grafeas-elasticsearch, but if we wanted to extract this out for use in Rode, then I think we could make the IndexManager implementation more generic and pass in a function that could be used to filter indices.

Yeah definitely something for us to consider as we expand out the responsibility of Rode. For another time though

aalsabag

lgtm

BFisch14 and others added 5 commits March 17, 2021 16:55

Add migrations

f60116a

Add migration orchestrator and testing fixes

b570c1c

Cleanup and backfill test

6b5d007

Add coverage rule to Makefile

173044b

Regenerate mocks and update for document kind changes

a378669

alexashley commented Mar 17, 2021

View reviewed changes

Refactor migration

7bec85b

BFisch14 linked an issue Mar 18, 2021 that may be closed by this pull request

Elasticsearch mapping migrations #57

Closed

aalsabag self-requested a review March 18, 2021 16:14

aalsabag approved these changes Mar 18, 2021

View reviewed changes

alexashley merged commit c072ef5 into main Mar 18, 2021

alexashley deleted the migration branch March 18, 2021 20:19

alexashley mentioned this pull request May 10, 2021

Extract Elasticsearch IndexManager and migrations into its own package #78

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add migrations for mapping changes #60

Add migrations for mapping changes #60

alexashley commented Mar 17, 2021

alexashley Mar 17, 2021

aalsabag commented Mar 18, 2021

aalsabag left a comment

# occurrences	run 1	run 2	run 3
10	`10.372776324s`	`10.390366305s`	`10.449805181s`
100	`10.369810367s`	`10.436173865s`	`10.386952243s`
1000	`10.392087759s`	`10.452114159s`	`10.457958686s`
10,000	`10.531893704s`	`10.421936475s`	`10.436905281s`
100,000	`20.535902343s`	`20.458385589s`	`20.463723459s`

Add migrations for mapping changes #60

Add migrations for mapping changes #60

Conversation

alexashley commented Mar 17, 2021

alexashley Mar 17, 2021

Choose a reason for hiding this comment

aalsabag commented Mar 18, 2021

aalsabag left a comment

Choose a reason for hiding this comment