Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML-DataFrame] version data frame transform internal index #45375

Merged
merged 14 commits into from
Aug 22, 2019

Conversation

hendrikmuhs
Copy link

@hendrikmuhs hendrikmuhs commented Aug 9, 2019

  • version the internal data frame transform index to allow smooth upgrades (adding/change mappings)
  • add mappings for config::version, config::create_time, checkpoint::timestamp, checkpoint::time_upper_bound
  • increment internal version to 2

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@hendrikmuhs hendrikmuhs marked this pull request as ready for review August 9, 2019 15:07
…mestamp,

checkpoint::time_upper_bound and increment version
@hendrikmuhs hendrikmuhs changed the title [ML-DataFrame] POC: version data frame transform internal index [ML-DataFrame] version data frame transform internal index Aug 9, 2019
@benwtrent benwtrent self-requested a review August 9, 2019 16:04
@benwtrent
Copy link
Member

bwc tests are failing, this is interesting as 7.4 transforms are being put, but then they cannot be found. Looking into it.

@benwtrent
Copy link
Member

Ah, I see why bwc tests are failing.

TransportGetDataFrameTransformsAction does not take into account multiple indices. Need some refactoring

@hendrikmuhs
Copy link
Author

I forgot that there is access outside of the config manager. The change I pushed might help but isn't enough. We need the same de-duplication as in the config manager to do this properly.

@benwtrent
Copy link
Member

@hendrikmuhs I am fixing the bug now :)

@@ -110,7 +110,7 @@ static void getStatisticSummations(Client client, ActionListener<DataFrameIndexe
.filter(QueryBuilders.termQuery(DataFrameField.INDEX_DOC_TYPE.getPreferredName(),
DataFrameTransformStoredDoc.NAME)));

SearchRequestBuilder requestBuilder = client.prepareSearch(DataFrameInternalIndex.INDEX_NAME)
SearchRequestBuilder requestBuilder = client.prepareSearch(DataFrameInternalIndex.LATEST_INDEX_NAME)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will not provide accurate data. Stats documents for various transforms could exist between various index versions. Since we use aggregations and a transform could have multiple stored docs between index versions, I am not sure we can even provide accurate stats.

@droberts195 What do you think?

Copy link
Member

@benwtrent benwtrent Aug 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and DataFrameUsageTransportAction.java, these don't seem like critical pieces data and SOME information drift is OK. How we tackle this seems predicated on how we deal with old indices and how quickly they are cleaned up.

@@ -114,7 +114,7 @@ protected void masterOperation(Task task, XPackUsageRequest request, ClusterStat
}
);

SearchRequest totalTransformCount = client.prepareSearch(DataFrameInternalIndex.INDEX_NAME)
SearchRequest totalTransformCount = client.prepareSearch(DataFrameInternalIndex.LATEST_INDEX_NAME)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a transform exists in the old index and has never been updated (thus NOT in the latest index) ?

public void testUpdateDeletesOldTransformConfig() throws Exception {
TestRestHighLevelClient client = new TestRestHighLevelClient();
String transformIndex = "transform-index-deletes-old";
createSourceIndex(transformIndex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to create the source index when you PUT directly to the index bypassing any validations or this for the update?

@@ -133,6 +145,8 @@ private static XContentBuilder mappings() throws IOException {
addDataFrameTransformsConfigMappings(builder);
// add the schema for transform stats
addDataFrameTransformStoredDocMappings(builder);
// add the schema for checkpoints
addDataFrameCheckpointMappings(builder);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -834,7 +835,18 @@ protected void doSaveState(IndexerState indexerState, DataFrameIndexerPosition p
onStop();
transformTask.shutdown();
}
next.run();
transformsConfigManager.deleteOldTransformStoredDocuments(transformId, ActionListener.wrap(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be called everytime doSaveState is called? Is it possible to add something to the response to say whether or not the deletion is necessary

* @param transformId The configuration ID potentially referencing configurations stored in the old indices
* @param listener listener to alert on completion
*/
public void deleteOldTransformConfigurations(String transformId, ActionListener<Void> listener) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test to DataFrameTransformsConfigManagerTests

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -834,7 +837,23 @@ protected void doSaveState(IndexerState indexerState, DataFrameIndexerPosition p
onStop();
transformTask.shutdown();
}
next.run();
// Only do this clean up once, if it succeeded, no reason to do the query again.
if (oldStatsCleanedUp.compareAndExchange(false, true) == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: compareAndSet is cleaner as you don't need the negation.

if (oldStatsCleanedUp.compareAndSet(false, true) == false)

Copy link
Contributor

@przemekwitek przemekwitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent
Copy link
Member

@elasticmachine update branch

@benwtrent
Copy link
Member

run elasticsearch-ci/bwc
run elasticsearch-ci/default-distro
run elasticsearch-ci/1

@benwtrent
Copy link
Member

@elasticmachine update branch

@benwtrent
Copy link
Member

run elasticsearch-ci/1

@benwtrent benwtrent merged commit f989beb into elastic:master Aug 22, 2019
benwtrent pushed a commit to benwtrent/elasticsearch that referenced this pull request Aug 22, 2019
…5375)

Adds index versioning for the internal data frame transform index. Allows for new indices to be created and referenced, `GET` requests now query over the index pattern and takes the latest doc (based on INDEX name).
benwtrent added a commit that referenced this pull request Aug 22, 2019
…45837)

Adds index versioning for the internal data frame transform index. Allows for new indices to be created and referenced, `GET` requests now query over the index pattern and takes the latest doc (based on INDEX name).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants