Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Behavioral Analytics] Analytics collections use DSL instead of ILM for data retention management #100033

Merged
merged 20 commits into from Oct 13, 2023

Conversation

kderusso
Copy link
Member

@kderusso kderusso commented Sep 28, 2023

Behavioral analytics has traditionally used ILM to manage data retention. Starting with 8.11.0, this will change. Analytics collections created prior to 8.11.0 will continue to use their existing ILM policies, but new analytics collections will be managed using DSL.

How this changes the datastream:

Create an analytics collection using the following command:

PUT /_application/analytics/all-about-puggles

Next, get the backing data stream. For the above example, the command would be:

GET _data_stream/behavioral_analytics-events-all-about-puggles

Versions prior to 8.11 would return the following information:

{
  "data_streams": [
    {
      "name": "behavioral_analytics-events-all-about-puggles",
      "timestamp_field": {
        "name": "@timestamp"
      },
      "indices": ...
      "generation": 1,
      "_meta": {
        "description": "Built-in template applied by default to behavioral analytics event data streams.",
        "managed": true
      },
      "status": "GREEN",
      "template": "behavioral_analytics-events-default",
      "ilm_policy": "behavioral_analytics-events-default_policy",
      "hidden": false,
      "system": false,
      "allow_custom_routing": true,
      "replicated": false
    }
  ]
}

Note that the ilm_policy is set to behavioral_analytics-events-default_policy. You can see details about this policy with the call GET _ilm/policy/behavioral_analytics-events-default_policy.

Now, the following information is returned:

{
  "data_streams": [
    {
      "name": "behavioral_analytics-events-all-about-puggles",
      "timestamp_field": {
        "name": "@timestamp"
      },
      "indices": ...
      "generation": 1,
      "_meta": {
        "description": "Built-in template applied by default to behavioral analytics event data streams.",
        "managed": true
      },
      "status": "GREEN",
      "template": "behavioral_analytics-events-default",
      "lifecycle": {
        "enabled": true,
        "data_retention": "180d"
      },
      "hidden": false,
      "system": false,
      "allow_custom_routing": true,
      "replicated": false
    }
  ]
}

Note the lifecycle block with enabled set to true and data_retention set to 180d. Additionally note that ilm_policy is no longer specified.

Note: This PR did not change the default data retention period, only the system with which data retention is managed.

@kderusso kderusso changed the title [Behavioral Analytics] Analytics collections created with DLM instead of ILM [Behavioral Analytics] Analytics collections use DLM instead of ILM for data retention management Sep 29, 2023
@kderusso kderusso marked this pull request as ready for review September 29, 2023 17:07
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Sep 29, 2023
@kderusso kderusso added >feature :EnterpriseSearch/Application Enterprise Search Team:Enterprise Search Meta label for Enterprise Search team and removed needs:triage Requires assignment of a team area label labels Sep 29, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ent-search-eng (Team:Enterprise Search)

@elasticsearchmachine
Copy link
Collaborator

Hi @kderusso, I've created a changelog YAML for you.

@kderusso kderusso requested a review from a team September 29, 2023 17:11
Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into Data stream lifecycle Kathleen.

Exciting you're looking to use DSL as a default ❤️

I left a few minor suggestions and questions.

Copy link
Member

@carlosdelest carlosdelest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Should we add some BWC tests to check this keeps working on older versions?

* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/
package org.elasticsearch.entsearch;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that we should use org.elasticsearch.xpack.application instead of org.elasticsearch.entsearch.
Also I would prefer to organize those tests by application (analytics here).

Suggested change
package org.elasticsearch.entsearch;
package org.elasticsearch.xpack.application.analytics;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it as entsearch assuming that we'll have additional cases in the module outside of BA in the future that we want to test a full restart with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is two different points here.

First org.elasticsearch.entsearch has been abandoned since a while and your package name should at least be org.elasticsearch.xpack.application.

Second, I would like to keep the code (including tests) of each application separated since it ease refactoring. If you want to add a test for another application (let's say for search application), I would recommend that is goes into it own class which would be in the right package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I have refactored the package name, but as discussed elsewhere I'd like to keep the pattern now of using FullClusterRestartIT. We can refactor this at a later date if it becomes burdensome.

@kderusso
Copy link
Member Author

kderusso commented Oct 2, 2023

@carlosdelest

Overall LGTM. Should we add some BWC tests to check this keeps working on older versions?

The FullClusterRestartIT should handle backward compatibility testing, what else did you have in mind?

Copy link
Contributor

@afoucret afoucret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good but I think we can benefits from some minor changes in tests.

Also, I will look more carefully at the behavior described in @andreidan comment later this today.

import java.io.IOException;
import java.util.List;

public class FullClusterRestartIT extends ParameterizedFullClusterRestartTestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a more explicit test naming:

Suggested change
public class FullClusterRestartIT extends ParameterizedFullClusterRestartTestCase {
public class DataStreamLifecycleMigrationIT extends ParameterizedFullClusterRestartTestCase {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, this is a very lightweight test right now and there's precedent for FullClusterRestartIT in other modules. I think it lowers the barrier of entry to adding more tests to keep it as-is until we deem otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @afoucret as this currently holds tests for analytics. But I think @kderusso is aiming for a pattern that is present in other areas of the codebase - doing a specific test for cluster restarting, where we can add new tests for migrations.

I'd stick with a common codebase pattern even if my first gut feeling is to create a more specific, per-feature level test class.

Copy link
Contributor

@afoucret afoucret Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both patterns exist into the codebase (eg org.elasticsearch.xpack.restart.WatcherMappingUpdateIT.
I prefer using one class per test with an explicit naming.

BTW, each tests has it own requirements before being executed (cluster settings, ...) and having separate class per concern makes it easier to manage this intentionally without accidentally modifying other tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm being worn down. 🙂 If you both feel strongly that I change it I will, otherwise I'll keep it as is.

@carlosdelest
Copy link
Member

The FullClusterRestartIT should handle backward compatibility testing, what else did you have in mind?

I'm not familiar with that kind of integration testing. Does it handle a mixed cluster setup, in which different nodes have different versions? Would that be needed for this use case?

@kderusso kderusso changed the title [Behavioral Analytics] Analytics collections use DLM instead of ILM for data retention management [Behavioral Analytics] Analytics collections use DSL instead of ILM for data retention management Oct 2, 2023
@mattc58 mattc58 removed the v8.11.0 label Oct 4, 2023
@mattc58 mattc58 added the v8.12.0 label Oct 4, 2023
@kderusso
Copy link
Member Author

kderusso commented Oct 5, 2023

I'm not familiar with that kind of integration testing. Does it handle a mixed cluster setup, in which different nodes have different versions? Would that be needed for this use case?

@carlosdelest this handles migrating/updating to a new version. We do have BWC tests on the API classes, but I'm not sure we have anything pluggable for template registry. I'll take a look though!

@carlosdelest
Copy link
Member

@carlosdelest this handles migrating/updating to a new version. We do have BWC tests on the API classes, but I'm not sure we have anything pluggable for template registry. I'll take a look though!

I'm taking a closer look and the restart test seems to do the same thing that a BwC mixed test should do. LGTM, thanks for checking!

Copy link
Member

@carlosdelest carlosdelest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for working on this Kathleen

@kderusso
Copy link
Member Author

kderusso commented Oct 9, 2023

@afoucret I cannot merge this until you re-review, can you please review this? Thank you.

assert Version.fromString(VersionProperties.getVersions().get("elasticsearch")).getMajor() == 8:
"If we are targeting a branch other than 8, we should enable migration tests"

BuildParams.bwcVersions.withWireCompatible(v -> v.after("8.8.0") && v.before("8.12.0")) { bwcVersion, baseName ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand v.after("8.8.0") since the entsearch module was not existing before but I think v.before("8.12.0") should be part of the test class. Indeed we need to make sure we will be able to test future migrations (eg. between 8.12.0 and 8.13.0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best way to do this is to use assumeTrue as the first line of your test:

public void testBehavioralAnalyticsDataRetention() throws Exception {
    assumeTrue('No need to run',  getOldClusterVersion().before(V_8_12_0))

    // ... Your test code
}

assumeTrue will cause the test to be skipped if the condition version is not met (there is also a assumeFalse version)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback - changed this, but would appreciate a second set of eyes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check in the CI runs if everything behave as expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the double check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok for me

@afoucret afoucret dismissed their stale review October 10, 2023 16:49

Unblocking the PR to be merged when you considered it ready

@kderusso kderusso merged commit 1dbe928 into elastic:main Oct 13, 2023
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants