feat(firestore): literals pipeline stage#16028
feat(firestore): literals pipeline stage#16028Linchin wants to merge 23 commits intogoogleapis:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the Firestore client library by introducing a new Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new literals pipeline stage, which allows specifying a fixed set of documents as the starting point of a pipeline. The implementation includes the literals method on the pipeline builder, the Literals stage class, and corresponding unit and end-to-end tests. My review focuses on improving the clarity and correctness of the type hints and docstrings for the new functionality. I've suggested changes to make the API easier to understand and use correctly.
| stages.FindNearest(field, vector, distance_measure, options) | ||
| ) | ||
|
|
||
| def literals(self, *documents: str | Selectable) -> "_BasePipeline": |
There was a problem hiding this comment.
The type hint for *documents is incomplete. It should include dict as documents are often passed as dictionaries, which is not covered by str | Selectable.
| def literals(self, *documents: str | Selectable) -> "_BasePipeline": | |
| def literals(self, *documents: dict | str | Selectable) -> "_BasePipeline": |
There was a problem hiding this comment.
You can probably ignore this, unless that's how other languages handle it
There was a problem hiding this comment.
Actually, looking at go, it seems like it accepts dicts, but not strings?
I don't know much about this stage, but from what I've seen, it's supposed to deal with maps. So maybe this should be def literals(self, *documents: Map | dict[str, CONSTANT_TYPE] | Selectable)?
There was a problem hiding this comment.
Upon further thoughts, I think it should be def literals(self, *documents: dict | Expression):. In this case both Constant and Map are child classes of Expression.
packages/google-cloud-firestore/google/cloud/firestore_v1/base_pipeline.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-firestore/google/cloud/firestore_v1/pipeline_stages.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-firestore/google/cloud/firestore_v1/base_pipeline.py
Outdated
Show resolved
Hide resolved
| stages.FindNearest(field, vector, distance_measure, options) | ||
| ) | ||
|
|
||
| def literals(self, *documents: str | Selectable) -> "_BasePipeline": |
There was a problem hiding this comment.
You can probably ignore this, unless that's how other languages handle it
| ... {"name": "alice", "age": 40} | ||
| ... ] | ||
| >>> pipeline = client.pipeline() | ||
| ... .literals(Constant.of(documents)) |
There was a problem hiding this comment.
Looking at the code, it seems like:
- Constant isn't a Selectable
- Constant doesn't seem like it supports dict types. (We do have a Map, which serves that purpose, but it doesn't seem Selectable either)
There was a problem hiding this comment.
Thanks for catching this! I spent some time to check the internal docs, and I think Expression class and dict should be supported per the following language:
While literal values are the most common, it is also possible to pass in
expressions, which will be evaluated and returned, making it possible to test
out different query / expression behavior without first needing to create some
test data.
| stages.FindNearest(field, vector, distance_measure, options) | ||
| ) | ||
|
|
||
| def literals(self, *documents: str | Selectable) -> "_BasePipeline": |
There was a problem hiding this comment.
Actually, looking at go, it seems like it accepts dicts, but not strings?
I don't know much about this stage, but from what I've seen, it's supposed to deal with maps. So maybe this should be def literals(self, *documents: Map | dict[str, CONSTANT_TYPE] | Selectable)?
| stringValue: "Douglas Adams" | ||
| title: | ||
| stringValue: "The Hitchhiker's Guide to the Galaxy" | ||
| name: literals No newline at end of file |
There was a problem hiding this comment.
We should also have tests here that cover the different input types we support
There was a problem hiding this comment.
Good catch! I added additional type to test.
There was a problem hiding this comment.
I think we still need more examples. Both of these are dict-like, so it seems like a pretty basic test. What if someone passes in Constant(1)? Or Constant("test").byte_length()? We say we support all expressions, how are non-dict types represented?
It would also be good to add some extra stages, to make sure this works like others
You can use gemini to create a few extra test scenarios. I usually don't include assert_proto on all of them, because it can be excessive
There was a problem hiding this comment.
Referring to the golang implementation, the only type accepted is a list of key-value pairs. So I don't think Constant(1) or Constant("test").byte_length() are in scope.
But I think it's a good idea to add more tests. I will also use mapValue instead of Constant.
packages/google-cloud-firestore/google/cloud/firestore_v1/base_pipeline.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-firestore/google/cloud/firestore_v1/base_pipeline.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-firestore/google/cloud/firestore_v1/base_pipeline.py
Outdated
Show resolved
Hide resolved
| stringValue: "Douglas Adams" | ||
| title: | ||
| stringValue: "The Hitchhiker's Guide to the Galaxy" | ||
| name: literals No newline at end of file |
There was a problem hiding this comment.
I think we still need more examples. Both of these are dict-like, so it seems like a pretty basic test. What if someone passes in Constant(1)? Or Constant("test").byte_length()? We say we support all expressions, how are non-dict types represented?
It would also be good to add some extra stages, to make sure this works like others
You can use gemini to create a few extra test scenarios. I usually don't include assert_proto on all of them, because it can be excessive
| - Constant: | ||
| value: | ||
| genre: "Science Fiction" | ||
| year: 1979 |
There was a problem hiding this comment.
This looks like it resolves to Constant({"genre": "Science Fiction", "year": 1979}). But our type hints forbid passing dicts as constants like this (mostly because Python's typing system isn't powerful enough to type nested datastructures, but still)
We provide a Map, which is supposed to fill that gap. Maybe you should test that out, and see if it would work as a declared type here?
| stages.FindNearest(field, vector, distance_measure, options) | ||
| ) | ||
|
|
||
| def literals(self, *documents: Expression | dict) -> "_BasePipeline": |
There was a problem hiding this comment.
Are you sure it makes sense to accept Expression here? Maybe it should just be dict | Map?
Expressions can evaluate to anything, and I don't know if it makes sense to have an int or bool Document
| Returns documents from a fixed set of predefined document objects. | ||
|
|
||
| This stage is commonly used for testing other stages in isolation, | ||
| though it can also be used as inputs to join conditions. |
There was a problem hiding this comment.
This seems to conflict with the statement later:
The
literals(...)stage can only be used as the first stage in a pipeline (or sub-pipeline)
There was a problem hiding this comment.
| """ | ||
| Returns documents from a fixed set of predefined document objects. | ||
|
|
||
| This stage is commonly used for testing other stages in isolation, |
There was a problem hiding this comment.
@daniel-sanche I think what the documentation means is that, the Literals stage could serve as one of the inputs of a join operation, which is not contradictory to it being a first stage, so I think it is reasonable to keep this sentence and make it consistent with documentation.
There was a problem hiding this comment.
I'd still prefer to take it out since we don't yet support the join stage. But if you think it seems to match the coming feature, we can probably leave it
| """ | ||
| Returns documents from a fixed set of predefined document objects. | ||
|
|
||
| This stage is commonly used for testing other stages in isolation, |
There was a problem hiding this comment.
I'd still prefer to take it out since we don't yet support the join stage. But if you think it seems to match the coming feature, we can probably leave it
| return self._create_pipeline(stages.Documents.of(*docs)) | ||
|
|
||
| def literals( | ||
| self, *documents: Mapping[str, Expression | CONSTANT_TYPE] |
There was a problem hiding this comment.
I think we should keep this as dict instead of Mapping, just because Firestore has its own custom Map type, which could get confusing
There was a problem hiding this comment.
Thanks for catching this, I have corrected it.
|
|
||
| Args: | ||
| *documents: One or more documents to be returned by this stage. Each can be a `dict` | ||
| or an `Expression`. |
There was a problem hiding this comment.
Each can be a
dictor anExpression.
This seems out of date? The annotations only support dicts (which seems to be right, since that's how other languages handle it)
There was a problem hiding this comment.
Thanks for catching this, I think Expression means the type of value in the dict could be of Expression type. I have updated the language to make it more accurate.
| args = [] | ||
| for doc in self.documents: | ||
| if hasattr(doc, "_to_pb"): | ||
| args.append(doc._to_pb()) |
There was a problem hiding this comment.
is this ever called? doc should be a dict, right? This section seems unreachable
There was a problem hiding this comment.
Thanks so much, I have removed this part, and also removed the related unit test.
|
|
||
| def __init__(self, *documents: Mapping[str, Expression | CONSTANT_TYPE]): | ||
| super().__init__("literals") | ||
| self.documents = documents # type: ignore |
There was a problem hiding this comment.
And why are you ignoring the type warning? What error were you seeing?
There was a problem hiding this comment.
If I remove # type: ignore, I get the following error message:
pipeline_stages.py:351: error: Need type annotation for "documents" [var-annotated]
If I change the type hint to be the same as init like the following
self.documents: tuple[dict[str, Expression | CONSTANT_TYPE], ...] = documents
I get these mypy errors:
.nox/mypy-3-12/lib/python3.12/site-packages/google/cloud/firestore_v1/pipeline_stages.py:351: error: Incompatible types in assignment (expression has type "tuple[dict[str, Expression | str], ...]", variable has type "tuple[dict[str, Expression | CONSTANT_TYPE], ...]") [assignment]
.nox/mypy-3-12/lib/python3.12/site-packages/google/cloud/firestore_v1/pipeline_stages.py:351: error: Incompatible types in assignment (expression has type "tuple[dict[str, Expression | int], ...]", variable has type "tuple[dict[str, Expression | CONSTANT_TYPE], ...]") [assignment]
.nox/mypy-3-12/lib/python3.12/site-packages/google/cloud/firestore_v1/pipeline_stages.py:351: error: Incompatible types in assignment (expression has type "tuple[dict[str, Expression | float], ...]", variable has type "tuple[dict[str, Expression | CONSTANT_TYPE], ...]") [assignment]
.nox/mypy-3-12/lib/python3.12/site-packages/google/cloud/firestore_v1/pipeline_stages.py:351: error: Incompatible types in assignment (expression has type "tuple[dict[str, Expression | bool], ...]", variable has type "tuple[dict[str, Expression | CONSTANT_TYPE], ...]") [assignment]
.nox/mypy-3-12/lib/python3.12/site-packages/google/cloud/firestore_v1/pipeline_stages.py:351: error: Incompatible types in assignment (expression has type "tuple[dict[str, Expression | datetime], ...]", variable has type "tuple[dict[str, Expression | CONSTANT_TYPE], ...]") [assignment]
.nox/mypy-3-12/lib/python3.12/site-packages/google/cloud/firestore_v1/pipeline_stages.py:351: error: Incompatible types in assignment (expression has type "tuple[dict[str, Expression | bytes], ...]", variable has type "tuple[dict[str, Expression | CONSTANT_TYPE], ...]") [assignment]
.nox/mypy-3-12/lib/python3.12/site-packages/google/cloud/firestore_v1/pipeline_stages.py:351: error: Incompatible types in assignment (expression has type "tuple[dict[str, Expression | GeoPoint], ...]", variable has type "tuple[dict[str, Expression | CONSTANT_TYPE], ...]") [assignment]
.nox/mypy-3-12/lib/python3.12/site-packages/google/cloud/firestore_v1/pipeline_stages.py:351: error: Incompatible types in assignment (expression has type "tuple[dict[str, Expression | Vector], ...]", variable has type "tuple[dict[str, Expression | CONSTANT_TYPE], ...]") [assignment]
.nox/mypy-3-12/lib/python3.12/site-packages/google/cloud/firestore_v1/pipeline_stages.py:351: error: Incompatible types in assignment (expression has type "tuple[dict[str, Expression | None], ...]", variable has type "tuple[dict[str, Expression | CONSTANT_TYPE], ...]") [assignment]
It seems mypy couldn't make out that str, int etc. are subtypes of CONSTANT_TYPE. I wonder if there is a more proper way to do type hint here? Otherwise I don't think we lose much coverage with type ignore.
|
|
||
| def __init__(self, *documents: Mapping[str, Expression | CONSTANT_TYPE]): | ||
| super().__init__("literals") | ||
| self.documents = documents # type: ignore |
There was a problem hiding this comment.
Consider converting Python-native types into Expressions here, like we do in the other stages. This should make building _pr_args later cleaner, since everything would be an Expression with _to_pb
Something like:
def __init__(self, *documents: Mapping[str, Expression | CONSTANT_TYPE]):
super().__init__("literals")
self.documents = [
{k: v if isinstance(v, Expression) else Constant.of(v) for k,v in doc}
for doc in documents
]
There was a problem hiding this comment.
Thanks, I have fixed it.
| from google.cloud.firestore_v1.pipeline_expressions import Field | ||
|
|
||
| instance = self._make_client().pipeline() | ||
| documents = (Field.of("a"), {"name": "joe"}) |
There was a problem hiding this comment.
Field.of("a") isn't a dict
There was a problem hiding this comment.
Thanks, I have corrected it.
| return stages.Literals(*args, **kwargs) | ||
|
|
||
| def test_ctor(self): | ||
| val1 = Constant.of({"a": 1}) |
There was a problem hiding this comment.
I think you should change this, since Constant doesn't support dict inputs (even if it technically works)
You could make this {"a": Constant.of(1)} to be conformant
It looks like this Constant.of(dict) pattern is used in a few other places in the tests
There was a problem hiding this comment.
Thanks, I have corrected this and also fixed other tests where the input is not a dict.
Makes sense, I have taken out the languange. |
Succeeding googleapis/python-firestore#1170 for the monorepo migration.