Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boxplot support for transform 52189 #96515

Merged
merged 5 commits into from Jul 24, 2023

Conversation

Kiriakos1998
Copy link
Contributor

Add support for boxplot aggregation in transform.

Closes(#52189)

@elasticsearchmachine elasticsearchmachine added v8.9.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Jun 2, 2023
@pxsalehi pxsalehi added :ml/Transform Transform and removed needs:triage Requires assignment of a team area label labels Jun 2, 2023
@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Jun 2, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@przemekwitek przemekwitek self-requested a review June 2, 2023 09:33
@hendrikmuhs
Copy link
Contributor

@Kiriakos1998 Thanks for your contribution. To make reviewing easier, could you add an example how this looks like if you use the POST _transform/_preview API input and output?

(The preview API returns the deduced mappings, I like to confirm they are ok).

We also need a small docs change, so that e.g. https://www.elastic.co/guide/en/elasticsearch/reference/current/preview-transform.html lists boxplot.

See

https://github.com/elastic/elasticsearch/blob/main/docs/reference/rest-api/common-parms.asciidoc?plain=1#LL765C37-L765C37

It should be straightforward to add boxplot there (alphabetic order).

@hendrikmuhs
Copy link
Contributor

@elasticmachine test this please

@Kiriakos1998
Copy link
Contributor Author

Hi @hendrikmuhs ,
This is the schema for the index I created
{ "mappings": { "properties": { "reviewer": { "type": "keyword" }, "stars" : { "type" : "float" } } } }
These are the data I created and added 11 documents
{"reviewer" : "some" , "stars" : 1.2}
{"reviewer" : "some" , "stars" : 1.3}
{"reviewer" : "some" , "stars" : 1.5}
{"reviewer" : "some" , "stars" : 2.0}
{"reviewer" : "some" , "stars" : 3.2}
{"reviewer" : "some" , "stars" : 3.4}
{"reviewer" : "some" , "stars" : 3.7}
{"reviewer" : "some" , "stars" : 4.3}
{"reviewer" : "some" , "stars" : 4.7}
{"reviewer" : "some" , "stars" : 4.9}
{"reviewer" : "some" , "stars" : 5.0}

This is the schema of my request
{ "source": { "index": "test1" }, "pivot": { "group_by": { "reviewer": { "terms": { "field": "reviewer" } } }, "aggregations": { "stars_boxplot": { "boxplot": { "field": "stars" } } } } }

And this is the result I got
{ "preview": [ { "reviewer": "some", "stars_boxplot": { "q1": 1.625, "q2": 3.4000000953674316, "q3": 4.599999904632568, "min": 1.2000000476837158, "max": 5.0, "lower": 1.2000000476837158, "upper": 5.0 } } ], "generated_dest_index": { "mappings": { "_meta": { "_transform": { "transform": "transform-preview", "version": { "created": "8.9.0" }, "creation_date_in_millis": 1686685379258 }, "created_by": "transform" }, "properties": { "reviewer": { "type": "keyword" } } }, "settings": { "index": { "number_of_shards": "1", "auto_expand_replicas": "0-1" } }, "aliases": {} } }

@Kiriakos1998
Copy link
Contributor Author

I also updated the docs and rebased to the main( I think that's why the 2 tests failed).

@hendrikmuhs
Copy link
Contributor

Thank you @Kiriakos1998,

there is a problem with the mappings, if you look at the generated mappings:

{
  "mappings": {
    "_meta": {
      "_transform": {
        "transform": "transform-preview",
        "version": {
          "created": "8.9.0"
        },
        "creation_date_in_millis": 1686685379258
      },
      "created_by": "transform"
    },
    "properties": {
      "reviewer": {
        "type": "keyword"
      }
    }
  },
  "settings": {
    "index": {
      "number_of_shards": "1",
      "auto_expand_replicas": "0-1"
    }
  },
  "aliases": {
    
  }
}

There are no mappings for stars_boxplot. I think the reason is this line:

BOXPLOT("boxplot", SOURCE); in your change.

It tells transform to take the mapping from the source field (in this case float), however for some reason that doesn't work. If you want to debug this, have a look at SchemaUtil.resolveMappings. With the current PR you should see warnings in the logs, can you check that?

However I think we can default map the boxplot output like percentiles using:

BOXPLOT("boxplot", DOUBLE);

If you change it like this, mappings should be created and all boxplot fields should be mapped to double. I think that's a reasonable default, because that's what we use for a lot of other aggs.

@Kiriakos1998
Copy link
Contributor Author

I changed it and now I get this result.
{ "preview": [ { "reviewer": "some", "stars_boxplot": { "q1": 1.625, "q2": 3.4, "q3": 4.6000000000000005, "min": 1.2, "max": 5.0, "lower": 1.2, "upper": 5.0 } } ], "generated_dest_index": { "mappings": { "_meta": { "_transform": { "transform": "transform-preview", "version": { "created": "8.9.0" }, "creation_date_in_millis": 1686762943780 }, "created_by": "transform" }, "properties": { "stars_boxplot.max": { "type": "double" }, "stars_boxplot.upper": { "type": "double" }, "stars_boxplot.lower": { "type": "double" }, "stars_boxplot.min": { "type": "double" }, "reviewer": { "type": "keyword" }, "stars_boxplot.q3": { "type": "double" }, "stars_boxplot.q1": { "type": "double" }, "stars_boxplot": { "type": "object" }, "stars_boxplot.q2": { "type": "double" } } }, "settings": { "index": { "number_of_shards": "1", "auto_expand_replicas": "0-1" } }, "aliases": {} } }

@hendrikmuhs
Copy link
Contributor

Thank you @Kiriakos1998, LGTM

@przemekwitek will have a 2nd look, he is currently on vacation and will pick this up, when he is back

@elasticmachine test this please

@Kiriakos1998
Copy link
Contributor Author

Thank you @Kiriakos1998, LGTM

@przemekwitek will have a 2nd look, he is currently on vacation and will pick this up, when he is back

@elasticmachine test this please

Thanks for the feedback @hendrikmuhs. I see again two tests failing but in terms of the docs suite, I can't quite understand why it's happening. As it concerns :x-pack:plugin:transform:test (cit/part-2 suite) it succeeds locally so I am not sure how to investigate the failing task.

@przemekwitek
Copy link
Contributor

run elasticsearch-ci/part-2
run elasticsearch-ci/docs

@Kiriakos1998
Copy link
Contributor Author

Hi, @przemekwitek I run some of failing test cases and I think the reason was that the release tag moved to 8.10.0 and my branch was at 8.9.0. After I rebased the tests succeeded.

@hendrikmuhs
Copy link
Contributor

@elasticmachine test this please

@Kiriakos1998
Copy link
Contributor Author

Hi, @hendrikmuhs @przemekwitek. It fails on the test that I wrote in line 1903. The q1 is actually 3 and not 2.75. But I remember running the test and actually passing with q1 expected to be 2.75. Is it possible that for different test seeds a different q1 is produced? To be honest, I run the test locally again multiple times (q1 expected to be 3) and it never fails so I think that I was sloppy in the first place with this test.

@hendrikmuhs
Copy link
Contributor

Is it possible that for different test seeds a different q1 is produced?

Yes, it is an approximate algorithm based on TDigest.

Please try to run the failing test locally:

./gradlew ':x-pack:plugin:transform:qa:single-node-tests:javaRestTest' --tests "org.elasticsearch.xpack.transform.integration.TransformPivotRestIT.testPivotWithBoxplot" -Dtests.seed=8FB7FFC6DF4859C5 -Dtests.locale=vi -Dtests.timezone=Africa/Gaborone

It should fail on your local machine given this seed.

I think you have to test some of your outputs (q1-q3) using a range instead on an equals check. I am not sure how to get to the right values exactly, but I think it is ok if you do this trial and error as the purpose of this test is the box plot integration, not the algorithm itself. Still it would be good to have values that at least somehow make sense. So I would simply relax the test to check that q1 is within a range of 2.75 - 3.

Once you relaxed it you can run many test iterations using this command:

./gradlew ':x-pack:plugin:transform:qa:single-node-tests:javaRestTest' --tests "org.elasticsearch.xpack.transform.integration.TransformPivotRestIT.testPivotWithBoxplot" -Dtests.iters=10000

I removed the seed. This command runs 10000 iterations. If this produces failures, relax the test accordingly and repeat.

If no further adjustments are necessary, can you please paste the output of this command here?

@Kiriakos1998
Copy link
Contributor Author

Kiriakos1998 commented Jul 3, 2023

Please try to run the failing test locally:

Indeed it failed.

This command runs 10000 iterations.

I run 1000(10000 was giving a timeout exception to the test suite) iterations locally with q1 set to be expecting the value 3 and the output was:
BUILD SUCCESSFUL in 39m 44s
514 actionable tasks: 5 executed, 509 up-to-date

@droberts195
Copy link
Contributor

@elasticmachine update branch

@droberts195
Copy link
Contributor

@elasticmachine test this please

@Kiriakos1998 Kiriakos1998 force-pushed the boxplot_support_for_transform branch from 3842236 to 5c46b0f Compare July 21, 2023 16:21
@Kiriakos1998
Copy link
Contributor Author

Hello @droberts195, just checked again multiple test repetitions with q1 set to 3 and the test is passing. I also rebased the branch so if you want to run again the suite I believe will succeed this time.

@droberts195
Copy link
Contributor

@elasticmachine test this please

@przemekwitek przemekwitek merged commit ea42c2e into elastic:main Jul 24, 2023
14 checks passed
@Kiriakos1998 Kiriakos1998 deleted the boxplot_support_for_transform branch July 25, 2023 20:18
felixbarny pushed a commit to felixbarny/elasticsearch that referenced this pull request Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team :ml/Transform Transform Team:ML Meta label for the ML team v8.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants