-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML-DataFrame] version data frame transform internal index #45375
Conversation
hendrikmuhs
commented
Aug 9, 2019
•
edited
Loading
edited
- version the internal data frame transform index to allow smooth upgrades (adding/change mappings)
- add mappings for config::version, config::create_time, checkpoint::timestamp, checkpoint::time_upper_bound
- increment internal version to 2
Pinging @elastic/ml-core |
…mestamp, checkpoint::time_upper_bound and increment version
bwc tests are failing, this is interesting as 7.4 transforms are being put, but then they cannot be found. Looking into it. |
Ah, I see why bwc tests are failing.
|
I forgot that there is access outside of the config manager. The change I pushed might help but isn't enough. We need the same de-duplication as in the config manager to do this properly. |
@hendrikmuhs I am fixing the bug now :) |
…source with a given id
@@ -110,7 +110,7 @@ static void getStatisticSummations(Client client, ActionListener<DataFrameIndexe | |||
.filter(QueryBuilders.termQuery(DataFrameField.INDEX_DOC_TYPE.getPreferredName(), | |||
DataFrameTransformStoredDoc.NAME))); | |||
|
|||
SearchRequestBuilder requestBuilder = client.prepareSearch(DataFrameInternalIndex.INDEX_NAME) | |||
SearchRequestBuilder requestBuilder = client.prepareSearch(DataFrameInternalIndex.LATEST_INDEX_NAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will not provide accurate data. Stats documents for various transforms could exist between various index versions. Since we use aggregations and a transform could have multiple stored docs between index versions, I am not sure we can even provide accurate stats.
@droberts195 What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and DataFrameUsageTransportAction.java
, these don't seem like critical pieces data and SOME information drift is OK. How we tackle this seems predicated on how we deal with old indices and how quickly they are cleaned up.
@@ -114,7 +114,7 @@ protected void masterOperation(Task task, XPackUsageRequest request, ClusterStat | |||
} | |||
); | |||
|
|||
SearchRequest totalTransformCount = client.prepareSearch(DataFrameInternalIndex.INDEX_NAME) | |||
SearchRequest totalTransformCount = client.prepareSearch(DataFrameInternalIndex.LATEST_INDEX_NAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if a transform exists in the old index and has never been updated (thus NOT in the latest index) ?
...ain/java/org/elasticsearch/xpack/dataframe/persistence/DataFrameTransformsConfigManager.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/xpack/dataframe/persistence/DataFrameTransformsConfigManager.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/xpack/dataframe/persistence/DataFrameTransformsConfigManager.java
Outdated
Show resolved
Hide resolved
.../org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformPersistentTasksExecutor.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/org/elasticsearch/xpack/core/action/AbstractTransportGetResourcesAction.java
Outdated
Show resolved
Hide resolved
...n/java/org/elasticsearch/xpack/dataframe/action/TransportUpdateDataFrameTransformAction.java
Outdated
Show resolved
Hide resolved
public void testUpdateDeletesOldTransformConfig() throws Exception { | ||
TestRestHighLevelClient client = new TestRestHighLevelClient(); | ||
String transformIndex = "transform-index-deletes-old"; | ||
createSourceIndex(transformIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to create the source index when you PUT directly to the index bypassing any validations or this for the update?
@@ -133,6 +145,8 @@ private static XContentBuilder mappings() throws IOException { | |||
addDataFrameTransformsConfigMappings(builder); | |||
// add the schema for transform stats | |||
addDataFrameTransformStoredDocMappings(builder); | |||
// add the schema for checkpoints | |||
addDataFrameCheckpointMappings(builder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -834,7 +835,18 @@ protected void doSaveState(IndexerState indexerState, DataFrameIndexerPosition p | |||
onStop(); | |||
transformTask.shutdown(); | |||
} | |||
next.run(); | |||
transformsConfigManager.deleteOldTransformStoredDocuments(transformId, ActionListener.wrap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this be called everytime doSaveState
is called? Is it possible to add something to the response to say whether or not the deletion is necessary
* @param transformId The configuration ID potentially referencing configurations stored in the old indices | ||
* @param listener listener to alert on completion | ||
*/ | ||
public void deleteOldTransformConfigurations(String transformId, ActionListener<Void> listener) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test to DataFrameTransformsConfigManagerTests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -834,7 +837,23 @@ protected void doSaveState(IndexerState indexerState, DataFrameIndexerPosition p | |||
onStop(); | |||
transformTask.shutdown(); | |||
} | |||
next.run(); | |||
// Only do this clean up once, if it succeeded, no reason to do the query again. | |||
if (oldStatsCleanedUp.compareAndExchange(false, true) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: compareAndSet is cleaner as you don't need the negation.
if (oldStatsCleanedUp.compareAndSet(false, true) == false)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine update branch |
run elasticsearch-ci/bwc |
@elasticmachine update branch |
run elasticsearch-ci/1 |
…5375) Adds index versioning for the internal data frame transform index. Allows for new indices to be created and referenced, `GET` requests now query over the index pattern and takes the latest doc (based on INDEX name).
…45837) Adds index versioning for the internal data frame transform index. Allows for new indices to be created and referenced, `GET` requests now query over the index pattern and takes the latest doc (based on INDEX name).