feat: Add subquery support in pipeline#2323
Conversation
|
Warning: This pull request is touching the following templated files:
|
46ed5ab to
7acdb11
Compare
d504667 to
d0f184a
Compare
| @InternalApi | ||
| public FunctionExpression(String name, List<? extends Expression> params) { |
There was a problem hiding this comment.
Why do we make this internal?
| bookDocs = | ||
| ImmutableMap.<String, Map<String, Object>>builder() | ||
| .put( | ||
| "book1", | ||
| ImmutableMap.<String, Object>builder() | ||
| .put("title", "The Hitchhiker's Guide to the Galaxy") | ||
| .put("author", "Douglas Adams") | ||
| .put("genre", "Science Fiction") | ||
| .put("published", 1979) |
There was a problem hiding this comment.
It looks like this is a copy of the bookDocs from the ITPipelineTest.java setup. Can you pull this setup code out from both of these places and put it into a function that we can reuse in both places? Maybe we could remove ~150 LOC from here.
There was a problem hiding this comment.
I am skeptical about consolidating test setup here. While it serves to remove some duplication, we are talking about tests here, and having all test data ready in the same file has benefits to both human and agents.
I do want to see the multiple review collections we setup in the individual tests to be moved in here. That is, we setup both the books, and reviews or some other collections here. The number of documents can be shrinked, like 3,4 books and their reviews might be enough.
There was a problem hiding this comment.
Thanks for the feedback! Here are my thoughts:
Cross-SDK Consistency: I'd prefer to keep this doc data shared. The web SDKs reuse the same dataset between the SubqueryTest and the ITPipelineTest. Sharing it here ensures our datasets remain consistent across all SDKs.
Future Improvements: I agree we can keep an eye on this. If we find later that we need different types of datasets, or if we want to save on document creation, we can always revisit this and separate the data in the future.
| @@ -0,0 +1,36 @@ | |||
| /* | |||
| * Copyright 2024 Google LLC | |||
| @@ -0,0 +1,39 @@ | |||
| /* | |||
| * Copyright 2025 Google LLC | |||
| bookDocs = | ||
| ImmutableMap.<String, Map<String, Object>>builder() | ||
| .put( | ||
| "book1", | ||
| ImmutableMap.<String, Object>builder() | ||
| .put("title", "The Hitchhiker's Guide to the Galaxy") | ||
| .put("author", "Douglas Adams") | ||
| .put("genre", "Science Fiction") | ||
| .put("published", 1979) |
There was a problem hiding this comment.
I am skeptical about consolidating test setup here. While it serves to remove some duplication, we are talking about tests here, and having all test data ready in the same file has benefits to both human and agents.
I do want to see the multiple review collections we setup in the individual tests to be moved in here. That is, we setup both the books, and reviews or some other collections here. The number of documents can be shrinked, like 3,4 books and their reviews might be enough.
| * @param key The key of the field to access. | ||
| * @return An {@link Expression} representing the value of the field. | ||
| */ | ||
| @BetaApi |
There was a problem hiding this comment.
i'm assuming these beta annotations will be removed in a later PR
There was a problem hiding this comment.
I will remove these in this PR :)
**Overview** This PR introduces support for pipeline subqueries, variable definitions, and joins in the Go Firestore SDK, achieving strict 1:1 feature parity with the Java SDK's pipeline APIs googleapis/java-firestore#2323. **What are Subqueries?** In Firestore pipelines, subqueries allow you to embed an entire pipeline execution as a value within a single stage of an outer pipeline. This is incredibly powerful for performing complex "join-like" operations across different collections. For example, while querying a restaurants collection, you can use a subquery to fetch, filter, and aggregate all documents from a nested reviews subcollection, and embed that aggregated result (e.g., average_rating) directly into the restaurant document being returned. Subqueries can be evaluated into either an array of results (ToArrayExpression()) or a single scalar value (ToScalarExpression()). **Key Features & API Additions:** * **Subqueries (Joins):** * Implemented the Subcollection(path string) package-level function to instantiate relative-scope pipelines. * Added ToScalarExpression() and ToArrayExpression() methods on Pipeline to explicitly convert subqueries into expressions, allowing them to be seamlessly embedded inside stages like AddFields and Where. * **Variable Definition & References:** * Introduced the Define pipeline stage and the AliasedExpressions variadic helper to ergonomically bind values to variables. * Added Variable("name") and CurrentDocument() top-level functions to reference bound values in subsequent pipeline stages. * **Field Access & Overloading:** * Implemented GetField (accepting any to support both string and Expression arguments, mirroring Java's method overloading). **Type Safety & Defensive Constraints:** * Strict Aliasing: Expression.As() now explicitly returns *AliasedExpression (similar to Java: https://github.com/googleapis/java-firestore/blob/0c8188520dfbada0d3fef0719e4f95fc231306be/google-cloud-firestore/src/main/java/com/google/cloud/firestore/pipeline/expressions/Expression.java#L6608-L6621) rather than a generic Selectable interface, and the Define stage strictly requires []*AliasedExpression. * Pipeline Scope Validation: Added validation to ensure relative-scope pipelines (e.g., those created via Subcollection) cannot be executed directly or passed into a Union() stage. Attempts to do so now return descriptive errors identical to the Java SDK's IllegalStateException and IllegalArgumentException.
feat: add support for Pipeline subqueries
This PR introduces support for Pipeline Subqueries, allowing users to perform complex data transformations and aggregations on subcollections within a pipeline.
Key Features
Pipeline.subcollection(String path)to initiate a pipeline on a subcollection of the current document.Pipeline.define(...)to bind values to variables for reuse within the pipeline scope.Expression.currentDocument(): Reference the current document in a pipeline stage.Expression.variable(String name): Access defined variables.Pipeline.toArrayExpression()andPipeline.toScalarExpression(): Convert subquery results into array or scalar expressions for use in parent pipelines.Testing
ITPipelineSubqueryTest.javacontaining integration tests for subquery functionality.