Add builders for percentile and median accumulators/window functions #1139

vbabanin · 2023-06-06T19:45:22Z

This pull request (PR) introduces the capability to calculate the median and percentile of numeric values in the MongoDB aggregation pipeline for $group and $setWindowFields stages

JAVA-3860

…for Scala. JAVA-3860

JAVA-3860

driver-core/src/main/com/mongodb/client/model/Accumulators.java

This reverts commit 15f6dc9.

JAVA-3860

...r-core/src/test/functional/com/mongodb/client/model/AggregatesFunctionalSpecification.groovy

JAVA-3860

driver-core/src/test/functional/com/mongodb/client/model/AggregatesTest.java

driver-core/src/main/com/mongodb/client/model/Accumulators.java

driver-core/src/test/unit/com/mongodb/client/model/TestWindowOutputFields.java

- Squash quantile tests into parametrized one. JAVA-3860

JAVA-4888

driver-core/src/main/com/mongodb/client/model/ApproximateQuantileMethod.java

driver-core/src/test/functional/com/mongodb/client/model/OperationTest.java

JAVA-4888

katcharov

(This is a partial review.)

driver-core/src/main/com/mongodb/client/model/Accumulators.java

katcharov · 2023-06-20T18:21:04Z

driver-core/src/main/com/mongodb/client/model/Accumulators.java

+     * Returns a combination of a computed field and an accumulator that generates a BSON {@link org.bson.BsonType#DOUBLE Double }
+     * representing the median value computed from the given {@code inExpression} within a group.


I am finding these docs surprisingly hard to follow. There appear to be 3 levels of abstraction, and at least 6 complex relations in the same sentence:

this combines and returns: computed field + accumulator

which generates: BSON

which represents: median (which, in turn: computed, from-given, within-group)

I think we should focus on what this method is used for, and break up the relevant pieces of information, starting with the most relevant:

Used to determine the median value for the group. The result is emitted under the specified field name. Each item in a group is a document, so the "input expression" is used to specify the numeric value for each document. This is typically a numeric field on each document, but any expression may be specified.

I think we should also specify what happens to non-numeric values, what the result is when there are no documents, and what MQL value represents the document in cases where a plain field is not used (usually you can just use fields, but it should be possible to get the "median" number of fields per document, and I don't see how to do this ... would we use $$CURRENT?). And, we should have tests that cover these cases.

If we do want to still focus on (or include) certain technical details rather than usage, I think we should be more clear, and still, for example, break up the information so it is easier to understand.

(The same I think applies to the other docs.)

Discussed: we can address this later, in a future PR (or, decide not to).

driver-core/src/main/com/mongodb/client/model/Accumulators.java

katcharov · 2023-06-20T18:25:33Z

driver-core/src/main/com/mongodb/client/model/Accumulators.java

+     * @param fieldName The field computed by the accumulator.
+     * @param inExpression The input expression.
+     * @param method The method to be used for computing the median.
+     * @param <InExpression> The type of the input expression.


I wonder if we should link the "type" (or the word "expression" on the param itself) of any new docs to MqlValue (which is in Beta).

I think it is a good idea. However, I wonder if we should consider including the information that the Mql API can be utilized in situations where an expression is required, in the class level documentation instead? MqlValue can be used allmost in all methods of Accumulator class

This is fine, this would be inconsistent with the rest of the class.

Discussed: we can address this later, in a future PR (or, decide not to).

driver-core/src/main/com/mongodb/client/model/Accumulators.java

driver-core/src/test/functional/com/mongodb/client/model/AggregatesTest.java

katcharov · 2023-06-20T20:01:06Z

driver-core/src/test/functional/com/mongodb/client/model/AggregatesTest.java

+
+        //when
+        List<Document> results = getCollectionHelper().aggregate(Collections.singletonList(
+                group(new Document("gid", "$z"), quantileAccumulator)), DOCUMENT_DECODER);


I suspect you can get rid of DOCUMENT_DECODER everywhere by doing:

List<BsonDocument> results = ...

Why use new Document("gid", "$z") instead of $z?

Discussed: we can address this (DOCUMENT_DECODER) later, in a future PR. ($z fixed)

katcharov · 2023-06-20T20:07:23Z

driver-core/src/test/functional/com/mongodb/client/model/AggregatesTest.java

+        List<Document> results = getCollectionHelper().aggregate(Collections.singletonList(
+                group(new Document("gid", "$z"), quantileAccumulator)), DOCUMENT_DECODER);
+
+        //then


This test does not test the generated BSON, but it could do so as with the existing tests in this class, via:

List<Bson> pipeline = assertPipeline(...

Discussed: we can address this later, in a future PR.

katcharov · 2023-06-20T21:11:56Z

driver-core/src/test/functional/com/mongodb/client/model/AggregatesTest.java

+        Object result = results.stream()
+                .filter(document -> document.get("_id").equals(new Document("gid", true)))
+                .findFirst().map(document -> document.get("sat_95")).get();

+        assertEquals(expectedGroup1, result);
+
+        result = results.stream()
+                .filter(document -> document.get("_id").equals(new Document("gid", false)))
+                .findFirst().map(document -> document.get("sat_95")).get();
+
+        assertEquals(expectedGroup2, result);


It seems that the above boilerplate could be replaced with, for example:

assertResults(pipeline, "[\n" + " { _id: true, sat_95: [3, 2] },\n" + " { _id: false, sat_95: [1, 2] }\n" + "]");

This seems easier to follow than using a stream to pick out the ids, and then manually matching on the particular field.

So instead of the ~10 lines of boilerplate, plus these ~3 lines per test:

Arguments.of( percentile("sat_95", "$x", new double[]{0.95}, QuantileMethod.approximate()), asList(3.0), asList(1.0)),

there could be ~6 lines per test:

List<Bson> pipeline = assertPipeline( ... group("$z", percentile("sat_95", "$x", ofNumberArray(0.95)))); assertResults(pipeline, "[\n" + " { _id: true, sat_95: [3, 2] },\n" + " { _id: false, sat_95: [1, 1] }\n" + "]");

I count the lines because parameterization is often used to decrease boilerplate and line count, but to me it seems there can often be other ways to deal with that problem which ultimately make the code more clear.

I think it is fine to have a some duplication in the "data" part of a test (for example, trying to remove it separates percentile from group, and the array values from sat_95, which makes things harder to follow).

Discussed: we can address this later, in a future PR (or, decide not to). We should have a consistent style in the tests.

JAVA-3860

katcharov

LGTM

vbabanin added 5 commits June 6, 2023 08:28

Add builders for percentile and median accumulators/window functions

ce36d94

JAVA-3860

Add builders for percentile and median accumulators/window functions …

cfc2fe9

…for Scala. JAVA-3860

Move builders introduced in version 7.0 to separate test case.

6d57116

JAVA-3860

Add java and scala documentation.

f523d13

JAVA-3860

Revert windowed functions changes.

15f6dc9

JAVA-3860

vbabanin requested review from katcharov and stIncMale June 6, 2023 19:45

vbabanin self-assigned this Jun 6, 2023

vbabanin commented Jun 6, 2023

View reviewed changes

driver-core/src/main/com/mongodb/client/model/Accumulators.java Show resolved Hide resolved

vbabanin commented Jun 6, 2023

View reviewed changes

driver-core/src/main/com/mongodb/client/model/Accumulators.java Show resolved Hide resolved

vbabanin added 2 commits June 6, 2023 12:53

Revert "Revert windowed functions changes."

8691f57

This reverts commit 15f6dc9.

Add since 4.10 to javadoc.

fbb2839

JAVA-3860

katcharov reviewed Jun 6, 2023

View reviewed changes

...r-core/src/test/functional/com/mongodb/client/model/AggregatesFunctionalSpecification.groovy Outdated Show resolved Hide resolved

vbabanin added 2 commits June 7, 2023 19:39

Substitute spock tests with Junit ones.

d355e92

JAVA-3860

Move decoder to static variable.

0bded47

JAVA-3860

vbabanin requested a review from katcharov June 8, 2023 05:38

stIncMale requested changes Jun 8, 2023

View reviewed changes

vbabanin added 3 commits June 15, 2023 16:41

- Add interface builder to promote strict typing for quantile methods.

a1a1ffc

- Squash quantile tests into parametrized one. JAVA-3860

Correct javadoc and tests.

5926251

JAVA-4888

Add null checks.

6e37cec

JAVA-4888

vbabanin requested a review from stIncMale June 16, 2023 02:34

stIncMale requested changes Jun 16, 2023

View reviewed changes

driver-core/src/main/com/mongodb/client/model/ApproximateQuantileMethod.java Show resolved Hide resolved

driver-core/src/test/functional/com/mongodb/client/model/OperationTest.java Outdated Show resolved Hide resolved

Add scala classes.

08cb7cc

JAVA-4888

vbabanin requested a review from stIncMale June 16, 2023 18:32

vbabanin added 2 commits June 16, 2023 13:00

Fix javadoc.

f7354d9

JAVA-4888

Correct javadoc.

5c7822a

JAVA-4888

stIncMale approved these changes Jun 16, 2023

View reviewed changes

katcharov requested changes Jun 20, 2023

View reviewed changes

Fix tests.

e8c2c71

JAVA-3860

katcharov approved these changes Jun 21, 2023

View reviewed changes

vbabanin merged commit c7558f9 into mongodb:master Jun 21, 2023
54 checks passed

vbabanin deleted the JAVA-3860 branch June 21, 2023 22:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add builders for percentile and median accumulators/window functions #1139

Add builders for percentile and median accumulators/window functions #1139

vbabanin commented Jun 6, 2023

katcharov left a comment

katcharov Jun 20, 2023

katcharov Jun 21, 2023 •

edited

katcharov Jun 20, 2023

vbabanin Jun 21, 2023 •

edited

katcharov Jun 21, 2023

katcharov Jun 20, 2023

katcharov Jun 20, 2023

katcharov Jun 21, 2023

katcharov Jun 20, 2023

katcharov Jun 21, 2023 •

edited

katcharov Jun 20, 2023

katcharov Jun 21, 2023

katcharov left a comment

		* Returns a combination of a computed field and an accumulator that generates a BSON {@link org.bson.BsonType#DOUBLE Double }
		* representing the median value computed from the given {@code inExpression} within a group.

Add builders for percentile and median accumulators/window functions #1139

Add builders for percentile and median accumulators/window functions #1139

Conversation

vbabanin commented Jun 6, 2023

katcharov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

katcharov Jun 21, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vbabanin Jun 21, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

katcharov Jun 21, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

katcharov left a comment

Choose a reason for hiding this comment

katcharov Jun 21, 2023 •

edited

vbabanin Jun 21, 2023 •

edited

katcharov Jun 21, 2023 •

edited