Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetching a document takes very long when it has a large map with a large index #351

Closed
Cvratingen opened this issue May 10, 2021 · 5 comments · Fixed by #458
Closed

Fetching a document takes very long when it has a large map with a large index #351

Cvratingen opened this issue May 10, 2021 · 5 comments · Fixed by #458
Assignees
Labels
api: firestore Issues related to the googleapis/python-firestore API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@Cvratingen
Copy link

Cvratingen commented May 10, 2021

We have a database where we store information for some IOT devices.
Now our devices routinely send out a value every [5 or 60 or 300] seconds. I'm using a firestore (native mode) database to save these values.

I've got a program which aggregates the messages and updates all the documents every ~10 minutes with all messages received.
The structure of the database is as following:

- sensors (collection)
    - sensor1 (document)
        -  sensor_readings (subcollection)
            - readings_10-05-2021 (document)
            - readings_11-05-2021 (document)
            - readings_12-05-2021 (document)
    - sensor2 
    - etc. 

The document "readings_10-05-2021" will have a map with the values

- readings_10-05-2021
      - values (map)
             - "00:00:01" : 120
             - "00:01:01" : 121
             - "00:02:01" : 122
             - etc.

I'm trying to retreive this document with:

db.collection("sensors").document("sensor1").collection("sensor_readings").document("readings_10-05-2021").get()

When the map has only 1 value it takes about ~.19 seconds
When the map has 1440 values it takes about ~1.5 seconds
When the map has 19200 values it takes about ~400 seconds

I've tested this with documents of the roughly same size in bytes to account for download speeds.

note: fetching this document with the large map on the google cloud terminal takes ~1 second.

As advised by @samtstern posting this here.

Environment details

Python 3.9.2
pip 21.1.1
google-cloud-firestore 2.1.1

@product-auto-label product-auto-label bot added the api: firestore Issues related to the googleapis/python-firestore API. label May 10, 2021
@tseaver
Copy link
Contributor

tseaver commented May 10, 2021

@Cvratingen Can you show the full output of pip list in your environment? There have been recent changes to the Python protobuf package which might help with your issue.

@tseaver tseaver added api: clouddebugger Issues related to the Stackdriver Debugger API. needs more info This issue needs more information from the customer to proceed. priority: p2 Moderately-important priority. Fix may not be included in next release. labels May 10, 2021
@Cvratingen
Copy link
Author

I updated the protobuf package.
My environment:

Package                  Version
------------------------ ---------
CacheControl             0.12.6
cachetools               4.2.2
certifi                  2020.12.5
cffi                     1.14.5
chardet                  4.0.0
cycler                   0.10.0
firebase-admin           4.0.0
google-api-core          1.26.3
google-api-python-client 2.3.0
google-auth              1.30.0
google-auth-httplib2     0.1.0
google-cloud-core        1.6.0
google-cloud-firestore   2.1.1
google-cloud-storage     1.38.0
google-crc32c            1.1.2
google-resumable-media   1.2.0
googleapis-common-protos 1.53.0
grpcio                   1.37.1
httplib2                 0.19.1
idna                     2.10
kiwisolver               1.3.1
matplotlib               3.4.2
msgpack                  1.0.2
numpy                    1.20.2
packaging                20.9
pandas                   1.2.4
Pillow                   8.2.0
pip                      21.1.1
proto-plus               1.18.1
protobuf                 3.16.0
pyasn1                   0.4.8
pyasn1-modules           0.2.8
pycparser                2.20
pyparsing                2.4.7
python-dateutil          2.8.1
pytz                     2021.1
requests                 2.25.1
rsa                      4.7.2
setuptools               56.0.0
six                      1.15.0
uritemplate              3.0.1
urllib3                  1.26.4

I've since changed the structure of my setup but created a simple test document with the following map

{"a": {f"{x}": x for x in range(19200)}}

Fetching this document took me 392.7523 sec

@product-auto-label product-auto-label bot removed the api: clouddebugger Issues related to the Stackdriver Debugger API. label May 11, 2021
@tseaver
Copy link
Contributor

tseaver commented Sep 23, 2021

I can confirm what seems an abnormally long time to fetch the document, after only a relatively short interval (a few seconds?) to save it:

>>> import time
>>> from google.cloud.firestore_v1 import Client
>>> client = Client()
>>> collection = client.collection("repro-351")
>>> document = collection.document("doc-id")
>>> data = {"a": {f"{x}": x for x in range(19200)}}
>>> document.set(data)
>>> before = time.time(); snapshot = document.get(); after = time.time()
>>> print(f"Time: {after - before} seconds")
Time: 193.5851743221283 seconds
>>> snapshot.to_dict() == data
True

@tseaver tseaver added external This issue is blocked on a bug with the actual product. and removed needs more info This issue needs more information from the customer to proceed. labels Sep 23, 2021
@tseaver
Copy link
Contributor

tseaver commented Sep 23, 2021

Hmm, _helpers.decode_dict could be playing a role in this issue, likely due to not stripping away the proto-plus wrappers.

@tseaver tseaver removed the external This issue is blocked on a bug with the actual product. label Sep 23, 2021
@tseaver tseaver self-assigned this Sep 23, 2021
@tseaver tseaver added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Sep 23, 2021
@tseaver
Copy link
Contributor

tseaver commented Sep 23, 2021

Running the same document.get() with the patch from PR #458:

>>> import time
>>> from google.cloud.firestore_v1 import Client
>>> client = Client()
>>> collection = client.collection("repro-351")
>>> data = {"a": {f"{x}": x for x in range(19200)}}
>>> document = collection.document("doc-id")
>>> before = time.time(); snapshot = document.get(); after = time.time()
>>> print(f"Time: {after - before} seconds")
Time: 0.7956831455230713 seconds
>>> snapshot.to_dict() == data
True

crwilcox pushed a commit that referenced this issue Sep 23, 2021
* chore: remove obsolete skip for old Python 3 versions

* perf: strip proto wrappers in '_helpers.decode_{value,dict}'

Closes #351.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: firestore Issues related to the googleapis/python-firestore API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants