Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reindex Wizard #46755

Open
cjcenizal opened this issue Sep 27, 2019 · 8 comments
Open

Reindex Wizard #46755

cjcenizal opened this issue Sep 27, 2019 · 8 comments
Labels
enhancement New value added to drive a business result Feature:Index Management Index and index templates UI Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more

Comments

@cjcenizal
Copy link
Contributor

cjcenizal commented Sep 27, 2019

Replaces #8110

Summary

The Reindex API is powerful but complex. A UI will make this tool easier to use by guiding and educating the user on its features.

Users will navigate to Index Management, select one or multiple indices, and execute the “Reindex” action. This will take the user to the Reindex Wizard, which breaks the reindex process into multiple steps. Once the user has started a reindex, they'll be immediately shown the progress of the reindex task.

Implementation details

The Reindex API doesn’t create the destination index for you. That’s a separate step that is abstracted by the Reindex Wizard. The Upgrade Assistant server-side logic provides a similar abstraction. We can extract this logic into a separate index_operations plugin and reuse it in both the Upgrade Assistant and the Reindex Wizard.

Similarly, if there is an alias pointing to the source index, it's reasonable to assume that users may want to redirect it to the destination index once the reindex is complete to avoid any downtime. This is also its own API call which we can abstract away. Upgrade Assistant has already implemented a similar abstraction which we can extract into the index_operations plugin.

Plan of attack

Note that implementing the Reindex Wizard is a superset of implementing most of the logic needed for "Create index" and "Clone index" features. If we doubt we'll be able to complete the Reindex Wizard in a release cycle, we should consider shipping one or both of these features first, and then build the Reindex Wizard upon them in another release.

Index Management UX

Users should be able to select multiple indices and have the option to "Reindex into another index" or to select a single index and have the option to "Reindex other indices into this index". Selecting the first option will open the Reindex Wizard with these indices pre-selected as the source indices and selecting the second option will open the Reindex Wizard with this index pre-selected as the destination index.

Users will also be able to click a "Reindex" button which opens the Reindex Wizard with all fields blank.

If reindex tasks are active, the user should be able to view their status within Index Management beneath a "Reindex tasks" tab. We would need to create a special endpoint for retrieving these tasks, since a reindex task created by the Reindex Wizard is an abstraction over both index creation and reindexing. The Upgrade Assistant does something similar so it might be a useful reference for this kind of behavior.

Nice-to-have: It makes sense to defer the "Reindex tasks" tab to the end of the release cycle, since it's not core to the value that the Reindex Wizard provides.

image

Reindex Wizard UX

Step 1: Select data to reindex

The users defines the source index or indices to reindex. The user can also specify a reindex from a remote cluster and make index alias adjustments to eliminate downtime.

Scope-reduction opportunity: We can reduce scope by deferring "reindex from remote" functionality to a separate scope of work.

Selecting indices

In both this step and the second step, the user has the ability to specify indices. We could implement this in many ways:

  • A simple option would be a text input that verifies the specified indices exist.
  • A more advanced option would be a typeahead which retrieves indices matching your search input, which you can then select.
  • The most advanced option would be a full-blown table that lets you search, sort, and multi-select indices.

Alias redirection

Nice-to-have: @jethr0null can verify for us whether this feature is valuable. We can defer this feature to the end of the release cycle as a "nice-to-have".

If there are index aliases associated with the source indices they'll be listed here and the user can select to redirect any or all of them to the destination index. This can be useful for preventing downtime during a reindex. We should surface some help text to that effect.

Suggest remote clusters

Maybe-nice-to-have: @sebelga suggested that if the user selects "Remote", then we suggest the remote clusters the user has registered for the "Host" field. This may or may not make sense depending on whether users of CCR want to both replicate from a remote cluster and reindex from it. @jethr0null will look into this.

image

Step 2: Configure destination index

The user defines the destination index, which can be existing or a new index.

  • If a new index, we'll automatically use the merged result of the source indices' mappings, settings, and aliases by default. The user will have the option to edit these. Note that this essentially comprises the implementation of the "Create Index" and "Clone index" features.
  • If an existing index, the user will be able to review mappings, settings, and aliases but can't make changes.

Scope-reduction opportunity: We can reduce scope by deferring the "create index" option to a separate scope of work. We will only give the user the option to reindex into an existing index, so they'd have to create this index before-hand.

Overwrite destination by default

A document conflict is when a document in the destination index has the same ID as a document in a source index. By default, any conflicting documents in the destination will be overwritten by those in the source. If the destination index contains documents, we'll need to show a danger callout to warn the user of this potential consequence. We'll use this configuration to achieve this behavior:

{
  "conflicts": "proceed",
  "dest": {
    "version_type": "internal"
  }
}

Leveraging index templates

Index templates are a useful way to store references to mappings, settings, etc. As a future improvement, we could allow the user to browse their index templates and copy their configurations to their destination index.

image

image

Step 3: Document transformations

The user can specify an ingest node pipeline to transform the source data before it is indexed, or define conditions that prevent documents from being indexed or delete them from the destination.

A future improvement would be to allow the user to click a "Test" button to try out the transformation using the Simulate Pipeline API.

Another future improvement would be to cross-link to the ingest node pipeline builder so you can create a new pipeline and return to your reindex job without losing your work.

image

Step 4: Start reindex

At the last step, the user has the opportunity to review their reindex configuration, tweak the performance of the task itself, and preview the ES request(s) to create the index and reindex which will be executed under the hood. This request preview will be a useful "escape hatch" for users who need functionality not provided by the wizard -- they can still use the wizard the build these requests up, and then edit them in Console or an editor.

Once the user clicks the "Start reindexing" button they'll immediately be shown the progress of the reindex task within the context of the wizard. Setting up a reindex isn't easy and it can be gratifying to see the result of your hard work with some instant feedback. We should see if this feels rewarding and keep it if makes the UX pleasant. If it turns out to be a poor UX we can remove it and redirect the user back to the "Reindex tasks" tab in Index Management instead.

Scope-reduction opportunity: We can reduce scope by deferring "performance tweaking" functionality to a separate scope of work. We will just use the defaults out of the box. If the user really wants to get their hands dirty they can copy the JSON into Console.

Tweaking performance

Advanced users may want to control how the reindex task is executed. We'll need plenty of help text to explain the role of each of these options. These options are supported using the wait_for_active_shards, timeout, and requests_per_second query parameters, as well as the size field on the request body.

Telemetry

Per @zuketo we'll want to measure whether people are using this wizard from start to finish, or using it to build up a request which they then manually tweak and execute via Console later. It's impossible to get concrete measurements of this usage, but we can approximate this usage by tracking:

  • Clicks on the "Show request" button
  • ...or clicks on the "Copy to clipboard" button
  • ...or events in which people complete the wizard but then navigate away

Clicking the "View task" button takes you back to the "Reindex tasks" tab with the reindex task pre-selected.

Clicking the "Stop reindexing" button will cancel the reindex. The screen will show you a "Reindex canceled" message along with the summary of the reindex configuration and the "Start reindexing" button again.

image

image

image

@cjcenizal cjcenizal added enhancement New value added to drive a business result Feature:Index Management Index and index templates UI Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more labels Sep 27, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/es-ui

This was referenced Sep 27, 2019
@zuketo
Copy link

zuketo commented Oct 2, 2019

Hey @cjcenizal this looks great!

A few notes/comments from the team below.

Index Management UX -> Reindex Tasks:

  • How long will this history last?
  • Are Kibana or ES endpoints used to create the list? E.g. we're thinking about if a user creates an "advanced" reindex task via dev tools, will it show up in this list?
  • What would clicking the trash icon do?

Step 2: Configure destination index:

  • Auto configuring the destination index based on the source index will be one of the largest wins for the project, could we show how this will look (maybe it takes the shape of the tabs under the "existing index" view for step 2?)

Step 3: Document transformations:

  • Very happy about leaving out painless and putting ingest pipelines as an option
  • Test button -> see below on simulation idea

Step 4: Performance:

  • Remove this step and add it as an advanced input group for the "Step 5: Start reindex" screen
  • Remove Slices as an option

Reindex in Progress:

  • Could we remove this screen and put them back in the first Reindex Tasks view? And show progress there?

Simulate endpoint for Reindex:

  • The team thinks we don't need a separate endpoint for this, we can use the existing endpoints and name the destination something specific to the simulation (e.g. reindex-indexa-indexb-test) and specify number of documents, such as 10. This presents some UI challenges, I'm not sure if "Step 5: Start reindex" would have a "test with 10 documents button", how the user would use the same settings to complete the full reindex, remove the temp index, etc. Curious on any thoughts you may have here?

@cjcenizal
Copy link
Contributor Author

cjcenizal commented Oct 3, 2019

Thanks @zuketo! I've updated the description with deets around the reindex wizard tasks API we would have to build in Kibana and I moved the performance options into the "Start reindex" step (great idea!). I would like to keep the progress screen in-place and see if it feels like a small reward (as I expect/hope it to). If it's a dud we can get rid of it and redirect to the tasks tab as you suggest. More thoughts below.

How long will this history last?

I don't have an answer to this yet. I'll defer to Seb and Alison when they work through the implementation and see what's possible.

What would clicking the trash icon do?

Cancel the reindex -- a better icon might be necessary or just a button that says "Cancel" or "Stop". :)

Auto configuring the destination index based on the source index will be one of the largest wins for the project, could we show how this will look

The simplest version of this will be blocks of editable JSON for the mappings, settings, and aliases. We might model this UI off our Index Template creation form (below), but I defer to Alison, Seb, and any designers who assist us on this point. Imagine the screenshot below prepopulated with JSON of course.

image

Simulate endpoint for Reindex

This does sound complicated for us to implement. That said I also don't think it's necessary for us to implement it in this iteration. I think we should punt for now.

@ppf2
Copy link
Member

ppf2 commented Oct 24, 2019

I believe this feature is dependent on some of the work we are doing on the Elasticsearch side for reindex API v2. It may help to cross-reference them here for tracking/visibility.

@cjcenizal
Copy link
Contributor Author

Warn users about dangers of reindexing large data sets

Per conversation with @aleph-zero, reindexing could be unfeasible for real-world datasets which tend to be quite large. There may also not be a huge need for fixing mappings on large datasets. It would be useful for smaller datasets. The Reindex Wizard should warn users about potential problems (e.g. the operation could be very expensive) if they are attempting to reindex large sets of data.

@cjcenizal
Copy link
Contributor Author

@111andre111 Would you mind sharing some info on your use case for changing the mappings of an existing index on the fly? Are you using update_by_query?

@111andre111
Copy link

111andre111 commented Jan 20, 2020

Hi @cjcenizal

Maybe I was not concrete enough.
What I meant by changing mapping parameters is, that you could edit the existing mapping of an existing index, and all editable subparameters are not greyed out.
You for instance would not be able to change an existing mapping type text to keyword.
It will result in an error for instance like that:
mapper [......] of different type, current_type [text], merged_type [keyword]
https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/mapper/FieldMapper.java#L359

But what is possible for instance is, to change the ignore_above parameter.
https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html#keyword-params
This is as well possible for instance with the coerce Parameter: in my test:
https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html#number-params
That is possible for other parameters as well.

I am not sure, I did not test all combinations. But maybe it is possible to change even Datatypes under certain circumstances, unless I didn't find one example at the moment

Here an example for changing type parameters:

PUT /testindex234/_doc/1
{
  "number": 5,
  "testtext": "test"
}

# defaults to ignore_above=256 for testtext.keyword and to coerce=true for number

PUT /testindex234/_mapping
{
  "properties": {
    "number":  { "type": "long", "coerce": false},
    "testtext": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 1024 } } }
  }
}

If the type parameters cannot be changed, I got error valid messages as well.

Does that make sense?

@cjcenizal
Copy link
Contributor Author

@111andre111 Thanks for clarifying! That makes sense. There are some parameters that can be changed on an existing index which don't require reindexing. In this case, I think we'd need to update the Index Management UI with an "Edit Index" page that allows users to edit specific aspects of the index's mappings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Index Management Index and index templates UI Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more
Projects
None yet
Development

No branches or pull requests

5 participants