Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Sum and average aggregation queries #1097

Merged
merged 44 commits into from Sep 1, 2023

Conversation

danieljbruce
Copy link
Contributor

This PR demonstrates how we are going to leverage the current framework to support sum/average queries for later use.

This commit introduces the sum aggregation with a simple test to ensure it works
This change modifies the encoding so that the right data reaches the grpc layer
Adds helper functions and a super class so that average and sum can share the same properties.
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: datastore Issues related to the googleapis/nodejs-datastore API. labels Mar 27, 2023
Add sum and average to the transaction tests here to improve test coverage for transactions.
The return type for this function is wrong. The function returns an Average so we should use Average.
Make alias optional since the query still works without providing an alias.
The transaction tests should not pass if the transaction is rolled back like they do currently. We must not catch errors and instead let the test fail.
Ensure that the addAggregation function works correctly with an additional assertion check.
This test ensures that all the aggregate queries are actually the same no matter how you build them
Write tests to effectively document the toProto output of the various aggregate fields
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Apr 28, 2023
Equivalent tests to count are written for sum in system tests and some more tests are written too to meet requirements outlined by team.
src/aggregate.ts Outdated
*
*/
toProto(): any {
const aggregation = Object.assign(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you just

const aggregation = this.property_ ? {property: {name: this.property_}} : {};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/aggregate.ts Outdated
return Object.assign(
{operator: this.operator},
this.alias_ ? {alias: this.alias_} : null,
aggregationObject
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also drop the intermediate object and replace this line by

{[this.operator]: aggregation}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/aggregate.ts Outdated
*
* @param {string} property
*/
constructor(property: string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

constructor(public property_: string) {

and remove the fields + the assignment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

The test for sum and snapshot reads should look at the database before the data is created and run the tests based on the database in that state
Add a test block for a dataset with overflow values and a dataset with NaN values.
Aggregate field should be exported from the client so that it can be used easily by users.
Some idiomatic changes to improve the state of the code in the PR.
@danieljbruce danieljbruce changed the title Sum and average aggregation queries feat: Sum and average aggregation queries Jun 9, 2023
@danieljbruce danieljbruce marked this pull request as ready for review June 9, 2023 18:40
@danieljbruce danieljbruce requested review from a team as code owners June 9, 2023 18:40
Values for sum and average should be different from those of count and these tests provide the right values now.

# Conflicts:
#	system-test/datastore.ts
Average aggregations regarding appearances have been added in tests and correct values have been assigned
At this point the datastore is populated with data about ratings so computations should be done on that instead.
An assertion error should be thrown so that the test doesn’t pass if the request is successful.
The query with the limit will include all data points with the lowest appearance values. This is likely desired, but also important to document.
The rollbacks for the transaction tests ensure that if a test fails then the data gets reset to where it was before.
sharedQueryOpts.readOptions.readTime = {
seconds: Math.floor(seconds),
};
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was taken from

if (options.readTime) {
if (reqOpts.readOptions === undefined) {
reqOpts.readOptions = {};
}
const readTime = options.readTime;
const seconds = readTime / 1000;
reqOpts.readOptions.readTime = {
seconds: Math.floor(seconds),
};
}
and can be refactored after the merge as per the TODO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why we can't refactor this now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason. It is a choice between having two simpler PRs and one PR that includes both changes. Doing so now is fine though.

src/request.ts Outdated
@@ -891,13 +891,26 @@ class DatastoreRequest {
query: Query,
options: RunQueryStreamOptions = {}
): SharedQueryOptions {
// TODO: This needs to be refactored later:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this to be done in this PR or in a future one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same the above comment. We can do it now if preferred.

sharedQueryOpts.readOptions.readTime = {
seconds: Math.floor(seconds),
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why we can't refactor this now?

@@ -704,6 +710,13 @@ describe('Datastore', () => {
assert.strictEqual(secondEntities!.length, 3);
});

it('should query the datastore with snapshot read', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this related to SUM/AVG changes? If not, let's move it into a separate PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

system-test/datastore.ts Show resolved Hide resolved
after(async () => {
await datastore.delete(keys);
});
it('should run a sum aggregation', async () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include overflow word here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

danieljbruce and others added 5 commits August 18, 2023 10:06
Introducing getSharedOptionsOnly allows us to use that function in two different places to avoid a repeated block of code.
This test belongs inside another PR because it is not directly related to sum/avg.
A test should be included that looks at multiple aggregations in a single query.
The tests themselves should include the word overflow so that it is clear they are working with an overflow dataset.
src/aggregate.ts Show resolved Hide resolved
src/aggregate.ts Outdated
static average(property: string): Average {
return new Average(property);
}

/**
* Gets a copy of the Count aggregate field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be updated to remove "Count" since it applies to all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description needs to be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/**
* Gets a copy of the Count aggregate field.
*
* @param {string} alias The label used in the results to describe this
* aggregate field when a query is run.
* @returns {AggregateField}
*/
alias(alias: string): AggregateField {
alias(alias?: string): AggregateField {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add some additional testing on Count now that this is optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Added in commit called "Added tests for when an empty alias is provided".

src/request.ts Outdated
private getSharedQueryOptions(
query: Query,
options: RunQueryStreamOptions = {}
private getSharedOptionsOnly(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the difference between this method and getSharedQueryOptions? The naming is confusing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getSharedQueryOptions also accepts a query and calls getSharedOptionsOnly. getSharedOptionsOnly is also called for fetches in createReadStream where options need to be built without considering a query.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! Maybe getSharedQueryOptions and getSharedRequestOptions, or getQueryOptions and getRequestOptions would be a little more readable, wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getQueryOptions and getRequestOptions would work really well. The good thing about these two functions is that they are both private so we can change the names later too if need be.

system-test/datastore.ts Show resolved Hide resolved
system-test/datastore.ts Show resolved Hide resolved
The comment for setting the alias should not make any mention of count since it is agnostic to the aggregation type.
No matter how alias is encoded, the data structures should store the aggregations the same way inside an aggregate field so as not to create any confusion.
Tests for when an empty alias is provided should check that each aggregate query still works.
The sleep function should enable us to test snapshot reads for aggregate queries
Two tests should be explored that evaluate what happens when multiple aggregations are used and when too many aggregations are used.
Shared options functions could be given a better name so that they make more sense to the code that is using them.
Copy link
Contributor

@ruyadorno ruyadorno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions and tweaking a comment but otherwise it's looking good to me 👍

src/index.ts Show resolved Hide resolved
src/index.ts Show resolved Hide resolved
@@ -676,12 +677,18 @@ describe('Datastore', () => {
];

before(async () => {
// This function is used for testing snapshot reads by getting a time and then sleeping before adding data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it would be important to rephrase this comment so that it explains why we need the sleeping:
is it simulating process time from datastore? is it simulating latency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I actually added some code in the latest commit to ensure the test won't be flakey. The intention of this was to be sure that a recorded read time occurred prior to the time of saving the data we wish to query against. This way a snapshot read at the recorded time would be sure to be against data prior to a save. This is now accomplished by reading data from the database, recording the read time, sleeping and then saving to ensure the save time on the server is later than the recorded read time. This will make sure the test isn't flakey.

The code must explain why the sleep function is needed in the test because its purpose should be clear.
A sum/average test uses snapshot reads for an aggregate query. The key here is write code that will guarantee the read time occurs well before all the data is saved in order to ensure the test isn’t flakey.
timeBeforeDataCreation = Date.now();
await sleep(1000);
// Save for a key so that a read time can be accessed for snapshot reads.
const dummyData = Object.assign(Object.assign({}, keysToSave[0]), {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to "emptyData"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

emptyData is a more accurate variable name for the datastore save data
Copy link
Contributor

@kolea2 kolea2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, please wait for @ruyadorno's approval as well before merging. Thanks for all the hard work on this!

@danieljbruce danieljbruce added the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 1, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 1, 2023
@danieljbruce danieljbruce merged commit 44ba6f8 into googleapis:main Sep 1, 2023
15 checks passed
Copy link
Contributor

@ruyadorno ruyadorno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the googleapis/nodejs-datastore API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants