Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firestore listener stops receiving changes after a long time but without throwing an error #18

Closed
NixBiks opened this issue Feb 6, 2020 · 50 comments
Assignees
Labels
api: firestore Issues related to the googleapis/python-firestore API. priority: p2 Moderately-important priority. Fix may not be included in next release. repro needed 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@NixBiks
Copy link

NixBiks commented Feb 6, 2020

Environment details

  • OS type and version: debian-10-buster-v20191210 (GCP image)
  • Python version: 3.8.1
  • pip version: 19.2.3
  • google-cloud-firestore version: 1.6.1

Steps to reproduce

I have a python script that sets up a listener to new documents in a firestore collection using on_snapshot on a query. To keep the script alive I have an infinite while loop in the end. Everything works fine but after several hours it stops receiving new documents (or the callback is not executed at least) - yesterday it was after ~7 hours.

In an earlier version of google-cloud-firestore (1.5.0) there was a related issue where the listener got disconnected after 1 hour. I made my own ugly hack back then but it worked at least:

Class ...
    .
    .
    .
    def _connect(self):
        now = None
        query = self.query
        if self.add_time_restriction:
            now = datetime.datetime.utcnow()
            query = query.where('updated', '>=', now)

        logger.info('Subscribing to query{}'.format(" after {}".format(now) if self.add_time_restriction else ""))
        query_watch = query.on_snapshot(callback=self.callback)
        if self.query_watch is not None:
            logger.info("Unsubscribe to former watch")
            self.query_watch.unsubscribe()

        self.query_watch = query_watch
        threading.Timer(3000, self._connect).start()

Code example (sample)

Since the 1 hour issue was fixed I now do

class Firestore:
    def __init__(self):
        self.client = Client()
        self._unsubscribe_news = lambda: None

    def watch_news(
        self
    ):
        def on_snapshot(col_snapshot, changes, read_time):
            logger.debug(f"Parsing news changes at {read_time}")

        self._unsubscribe_news()
        now = datetime.datetime.utcnow()
        query_watch = self.client.collection("news").where("updated", ">", now).on_snapshot(
            on_snapshot
        )
        logger.info(f"News listener set up at {self.client.project} from {now}")
        self._unsubscribe_news = query_watch.unsubscribe

and then execute a script like

firestore = Firestore()
firestore.watch_news()
while True:
    pass

Stack trace

Unfortunately no stack trace since the listener just stops receiving messages.

I see this as the core feature of Firestore so hopefully it'll get the right attention right away (some of our core business is to trigger python applications on news from firestore so we are extremely dependent on this).

@product-auto-label product-auto-label bot added the api: firestore Issues related to the googleapis/python-firestore API. label Feb 6, 2020
@NixBiks
Copy link
Author

NixBiks commented Feb 6, 2020

Just noticed this which seems like the same issue: firebase/firebase-admin-python#294. It concerns me that is has been open since May last year - isn't this considered a core feature?

@NixBiks
Copy link
Author

NixBiks commented Feb 6, 2020

Btw. I always start two processes running in the background. Both listen to the same news but handle it differently. In the end both processes stopped receiving news at the same time.

@yoshi-automation yoshi-automation added triage me I really want to be triaged. 🚨 This issue needs some love. labels Feb 7, 2020
@crwilcox crwilcox added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Feb 12, 2020
@yoshi-automation yoshi-automation added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Feb 12, 2020
@BenWhitehead
Copy link
Collaborator

@crwilcox or @tseaver Any ideas here about what could be going on as far as the python code is concerned? I've not yet heard about this happening in other languages.

@BenWhitehead BenWhitehead added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed 🚨 This issue needs some love. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Feb 14, 2020
@tseaver
Copy link
Contributor

tseaver commented Feb 20, 2020

@mr-bjerre About how frequent are the updates before you stop receiving them? Also, could you try enabling logging in your script and report the output of the grpc-related entries near the time when you cease getting updates? E.g.:

import logging

logging.basicConfig(filename='python-firestore-18.log', level=logging.DEBUG)

@NixBiks
Copy link
Author

NixBiks commented Mar 4, 2020

It seems to be very random when I stop receiving them. It has happened after 20 minutes some times and also after 7 hours some times. I've created a log like you suggested but I don't see any grpc related entries?

@NixBiks
Copy link
Author

NixBiks commented Mar 4, 2020

Oh wait I did find something in my logs where it broke (FYI I have four processes running - not entirely sure if each process writes to the log :-S)

DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:plx_python.util.old_firestore:Parsing news changes at 2020-03-04 09:33:56+00:00
DEBUG:plx_python.util.old_firestore:ADDED: 707ab56fa5f2f6f4
INFO:plx_python.util.thirdparty:Received 707ab56fa5f2f6f4
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:Re-opening stream from gRPC callback.
INFO:google.api_core.bidi:Re-established stream
DEBUG:google.api_core.bidi:Call to retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7f7185ff6940>> caused <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "Received RST_STREAM with error code 0"
        debug_error_string = "{"created":"@1583314580.826689957","description":"Error received from peer ipv4:108.177.15.95:443","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Received RST_STREAM with error code 0","grpc_status":13}"
>.
DEBUG:google.api_core.bidi:Re-opening stream from retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7f7185ff6940>>.
DEBUG:google.api_core.bidi:Stream was already re-established.
DEBUG:urllib3.connectionpool:Resetting dropped connection: metadata.google.internal
DEBUG:urllib3.connectionpool:Resetting dropped connection: metadata.google.internal
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE-ACCOUNT/?recursive=true HTTP/1.1" 200 153
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE-ACCOUNT/?recursive=true HTTP/1.1" 200 153
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE-ACCOUNT/token HTTP/1.1" 200 208
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE-ACCOUNT/token HTTP/1.1" 200 208
DEBUG:google.api_core.bidi:Empty queue and inactive call, exiting request generator.

DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:Re-opening stream from gRPC callback.
DEBUG:google.api_core.bidi:recved response.
INFO:google.api_core.bidi:Re-established stream
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:Call to retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7fa6f4ad1340>> caused <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "Received RST_STREAM with error code 0"
        debug_error_string = "{"created":"@1583314582.251496277","description":"Error received from peer ipv4:74.125.133.95:443","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Received RST_STREAM with error code 0","grpc_status":13}"
>.
DEBUG:google.api_core.bidi:Re-opening stream from retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7fa6f4ad1340>>.
DEBUG:google.api_core.bidi:Stream was already re-established.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:Empty queue and inactive call, exiting request generator.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:urllib3.connectionpool:Resetting dropped connection: metadata.google.internal
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:urllib3.connectionpool:Resetting dropped connection: metadata.google.internal
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE_ACCOUNT/?recursive=true HTTP/1.1" 200 153
DEBUG:google.api_core.bidi:recved response.
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE_ACCOUNT/?recursive=true HTTP/1.1" 200 153
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE_ACCOUNT/token HTTP/1.1" 200 208
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE_ACCOUNT/token HTTP/1.1" 200 208
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:Re-opening stream from gRPC callback.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:urllib3.connectionpool:Resetting dropped connection: metadata.google.internal
DEBUG:google.api_core.bidi:Empty queue and inactive call, exiting request generator.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
INFO:google.api_core.bidi:Re-established stream
DEBUG:google.api_core.bidi:Call to retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7ff601c3f760>> caused <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "Received RST_STREAM with error code 0"
        debug_error_string = "{"created":"@1583314582.657813506","description":"Error received from peer ipv4:74.125.140.95:443","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Received RST_STREAM with error code 0","grpc_status":13}"
>.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:Re-opening stream from retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7ff601c3f760>>.
DEBUG:google.api_core.bidi:Stream was already re-established.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE_ACCOUNT/?recursive=true HTTP/1.1" 200 153
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:urllib3.connectionpool:Resetting dropped connection: metadata.google.internal
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE_ACCOUNT/token HTTP/1.1" 200 208
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:Re-opening stream from gRPC callback.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE_ACCOUNT/?recursive=true HTTP/1.1" 200 153
INFO:google.api_core.bidi:Re-established stream
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:Call to retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7ff601c3f760>> caused <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "Transport closed"
        debug_error_string = "{"created":"@1583314582.799235377","description":"Error received from peer ipv4:74.125.140.95:443","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Transport closed","grpc_status":14}"
>.
DEBUG:google.api_core.bidi:Re-opening stream from retryable <bound method ResumableBidiRpc._recv of <google.api_core.bidi.ResumableBidiRpc object at 0x7ff601c3f760>>.
DEBUG:google.api_core.bidi:Stream was already re-established.
DEBUG:google.api_core.bidi:recved response.
DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/MY-SERVICE_ACCOUNT/token HTTP/1.1" 200 208
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:recved response.
DEBUG:google.api_core.bidi:waiting for recv.
DEBUG:google.api_core.bidi:waiting for recv.

and then the log just continues likes that

@juancruzmartino
Copy link

I am experiencing the same issue! Wasn't able to solve it yet.

@r-hoeve
Copy link

r-hoeve commented Apr 13, 2020

@crwilcox or @tseaver Any ideas here about what could be going on as far as the python code is concerned? I've not yet heard about this happening in other languages.

I was directed to this thread by a google employee through my own Firestore IssueTracker question #152867838. Re: I've not yet heard about this happening in other languages - I am experiencing the same issue with Node.js firebase-admin SDK. My snapshot listener stops listening silently, without throwing any errors after anywhere from 2 - 24 hours.

@MaticConradi
Copy link

Same issue here, still looking for a solution

@crwilcox
Copy link
Contributor

crwilcox commented Jun 3, 2020

@MaticConradi @r-hoeve @juancruzmartino could you share your library versions, python version, and where the code experiencing the failure is executing?

@MaticConradi
Copy link

Python version: 3.7.5
google-cloud-firestore: 1.7.0
google-api-core[grpc]: 1.17.0

Snapshots are created with:

app = App()
app.init_listeners()
def init_listeners(self):
    database.collection("accounts").on_snapshot(self.update_account_properties)
    # and 5 other snapshots
def update_account_properties(self, settings, changes, timestamp):
    for change in changes:
        # processing changes

All code executes just fine for a while, but at some point the update_account_propertiesdoesn't get called anymore, and no exceptions are thrown.

@ikendra
Copy link

ikendra commented Jun 25, 2020

I'm experiencing the same problem with Python 3.8.3 and google-cloud-firestore 1.6.1. The callback method specified as the parameter of on_snapshot() stops being called, without exception thrown or error message, after arbitrary amount of time, usually a few hours.

@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Aug 4, 2020
@tseaver
Copy link
Contributor

tseaver commented Aug 19, 2020

I'm working this week on a reproducer app for this bug in Python: it will run on "Cloud Run", with a listener process and several different "mutator" process, making writes at different intervals.

@tseaver
Copy link
Contributor

tseaver commented Aug 20, 2020

@mr-bjerre Working on a reproducer script, I've come across a couple of questions:

  • The RST_STREAM frames with error code 0x0 seem to be handled "correctly" in the log you posted (the stream is recreated). Do you see any correlation of RST_STREAM log entries with having your listener stop receiving updates?
  • What kind of architecture is generating the updates? Do you have one, a few, dozens, or hundreds or more processes generating updates to the documents addresses by your query?

@NixBiks
Copy link
Author

NixBiks commented Aug 20, 2020

I haven’t noticed any correlation no with anything really. But I don’t think that I qualify to really answer that - I’m not exactly sure what to look for. Maybe some of the other users here can comment on that.

Regarding the architecture - you are asking how many processes write to the collection that I listen to? If that’s the case then the answer is a few

@tseaver
Copy link
Contributor

tseaver commented Aug 27, 2020

I guess I'll let it run with 1.9.0 for now and see if I see the bug in the next few days.

Thanks! Doing the "bisect" hunt would be helpful once we think we have the issue fixed, given that we don't believe it necessarily ever worked prior to 1.6.1.

@tseaver
Copy link
Contributor

tseaver commented Aug 27, 2020

@AntonioLule What version of google-cloud-firestore are you using when you experience the issue? Can you upgrade to 1.9.0 and see if that helps?

@ikendra, @MaticConradi, @juancruzmartino Can you please try updating to google-cloud-firestore 1.9.0 and see if recent changes have fixed the issue?

@MaticConradi
Copy link

Running 1.9.0. It seems like the issue is (almost?) entirely fixed. I'm think I saw this happen again a few days ago, but not entirely sure, so I'll keep an eye on this

@ikendra
Copy link

ikendra commented Aug 28, 2020

Thanks @tseaver. I just upgraded to google-cloud-firestore==1.9.0 and I'm now running the version of my application which was displaying the problem consistently with google-cloud-firestore==1.6.1. Allow me a day or two please - I will update you on my observations then.

@ikendra
Copy link

ikendra commented Aug 29, 2020

Hi @tseaver, I unfortunately still see the same issue also with google-cloud-firestore==1.9.0: after some time, the callback method specified as the parameter of on_shapshot() method stops being called. Everything worked fine for ca. 20 hours before the problem started.

I checked all I could/knew to check to rule out a mistake at my side:

  • My app is a simple script which listens to changes to collection A and upon any new document added to A it writes a new document to collection B
  • The app is practically idle. During my last test I manually added one or two documents to collection A an hour to see whether the callback method is called and a document in collection B is created
  • The app runs on google cloud, on a kubernetes cluster
  • At the moment when I saw that changes to A are not detected anymore (callback method was not called), I saw that my app was still up and running, there were no items in container or audit logs.
  • I restarted the app and everything works fine again.

I now added some extra logging to try and understand beter what's going on.

@ikendra
Copy link

ikendra commented Aug 30, 2020

The same issue with google-cloud-firestore==1.9.0 again, this time after ca. 30 hours. The only difference compared to google-cloud-firestore==1.6.1 appears to be that the problem in my setup occurs after considerably longer time (<10 hours with 1.6.1 vs. 20/30 hours with 1.9.0 with the same code and very similar amount of data store operations per hour)

@NixBiks
Copy link
Author

NixBiks commented Aug 31, 2020

I've had my application running for 4 days and the listener is still working (no way of telling if it was broken at some point though). I'm on google-cloud-firestore==1.9.0

@tseaver
Copy link
Contributor

tseaver commented Sep 1, 2020

@ikendra Can you post the output from running pip list in your environment? I'm wondering if you might have an older version of a dependency, or something. For comparison, here is the output in the environment where I've been trying to reproduce this bug:

$ venv/bin/pip list
Package                  Version   Location
------------------------ --------- -------------------------------------------------------------
cachetools               4.1.1
certifi                  2020.6.20
chardet                  3.0.4
google-api-core          1.22.1
google-auth              1.20.1
google-cloud-core        1.4.1
google-cloud-firestore   2.0.0.dev1    /path/to/python-firestore
googleapis-common-protos 1.52.0
grpcio                   1.31.0
idna                     2.10
libcst                   0.3.10
mypy-extensions          0.4.3
pip                      20.2.2
proto-plus               1.7.1
protobuf                 3.13.0
pyasn1                   0.4.8
pyasn1-modules           0.2.8
pytz                     2020.1
PyYAML                   5.3.1
requests                 2.24.0
rsa                      4.6
setuptools               49.6.0
six                      1.15.0
typing-extensions        3.7.4.2
typing-inspect           0.6.0
urllib3                  1.25.10
wheel                    0.35.1

@NixBiks
Copy link
Author

NixBiks commented Sep 1, 2020

I'm afraid my listener is broken now as well. Survived 4 days but not 5

@tseaver
Copy link
Contributor

tseaver commented Sep 1, 2020

@mr-bjerre Can you provide output of pip list for that environment?

@NixBiks
Copy link
Author

NixBiks commented Sep 1, 2020

Package                      Version   Location
---------------------------- --------- -------------
attrs                        19.3.0
beautifulsoup4               4.8.2
blis                         0.4.1
CacheControl                 0.12.6
cachetools                   4.1.1
catalogue                    1.0.0
certifi                      2020.6.20
cffi                         1.14.2
chardet                      3.0.4
click                        7.1.2
cymem                        2.0.3
fastapi                      0.61.0
firebase-admin               4.3.0
google-api-core              1.22.1
google-api-python-client     1.10.0
google-auth                  1.20.1
google-auth-httplib2         0.0.4
google-cloud-core            1.4.1
google-cloud-error-reporting 0.34.0
google-cloud-firestore       1.9.0
google-cloud-logging         1.15.1
google-cloud-pubsub          1.7.0
google-cloud-storage         1.31.0
google-crc32c                1.0.0
google-resumable-media       1.0.0
googleapis-common-protos     1.52.0
grpc-google-iam-v1           0.12.3
grpcio                       1.31.0
gunicorn                     20.0.4
h11                          0.9.0
html5lib                     1.0.1
httplib2                     0.18.1
httptools                    0.1.1
idna                         2.10
jsonschema                   3.2.0
lxml                         4.5.0
more-itertools               8.4.0
msgpack                      1.0.0
murmurhash                   1.0.2
numpy                        1.18.1
packaging                    20.4
pandas                       1.0.1
pip                          20.2.2
plac                         1.1.3
pluggy                       0.13.1
plx-util                     0.0.1     /opt/plx_util
preshed                      3.0.2
protobuf                     3.13.0
py                           1.9.0
pyasn1                       0.4.8
pyasn1-modules               0.2.8
pycparser                    2.20
pydantic                     1.4
pyparsing                    2.4.7
pyrsistent                   0.16.0
pytest                       5.3.5
python-dateutil              2.8.1
pytz                         2020.1
requests                     2.24.0
rsa                          4.6
setuptools                   49.3.1
six                          1.15.0
soupsieve                    2.0.1
spacy                        2.3.2
srsly                        1.0.2
starlette                    0.13.6
thinc                        7.4.1
tqdm                         4.48.2
typer                        0.0.8
uritemplate                  3.0.1
urllib3                      1.25.10
uvicorn                      0.11.8
uvloop                       0.14.0
wasabi                       0.7.1
wcwidth                      0.2.5
webencodings                 0.5.1
websockets                   8.1
wheel                        0.34.2
xxhash                       1.4.3

@MaticConradi
Copy link

Yup my listener also breaks, though it does take longer before it happens.

@MaticConradi
Copy link

MaticConradi commented Sep 3, 2020

Pip list:

Package                      Version
---------------------------- -----------
aiodns                       1.1.1
aiohttp                      3.6.2
async-timeout                3.0.1
attrs                        20.1.0
base58                       2.0.1
bitarray                     1.2.2
cachetools                   4.1.1
ccxt                         1.33.93
certifi                      2020.6.20
cffi                         1.14.2
chardet                      3.0.4
click                        7.1.2
cryptography                 3.1
cytoolz                      0.10.1
dblpy                        0.3.4
decorator                    4.4.2
discord.py                   1.4.1
eth-abi                      2.1.1
eth-account                  0.5.3
eth-hash                     0.2.0
eth-keyfile                  0.5.1
eth-keys                     0.3.3
eth-rlp                      0.2.0
eth-typing                   2.2.2
eth-utils                    1.9.5
futures                      3.1.1
google-api-core              1.22.1
google-assistant-grpc        0.2.1
google-assistant-sdk         0.6.0
google-auth                  1.21.0
google-auth-oauthlib         0.4.1
google-cloud-core            1.4.1
google-cloud-error-reporting 1.0.0
google-cloud-firestore       1.9.0
google-cloud-logging         1.15.1
google-cloud-storage         1.31.0
google-cloud-vision          1.0.0
google-crc32c                1.0.0
google-resumable-media       1.0.0
googleapis-common-protos     1.52.0
grpcio                       1.31.0
hexbytes                     0.2.1
idna                         2.10
iexfinance                   0.4.3
ipfshttpclient               0.6.1
jsonschema                   3.2.0
libcst                       0.3.10
lru-dict                     1.1.6
multiaddr                    0.0.9
multidict                    4.7.6
mypy-extensions              0.4.3
netaddr                      0.8.0
numpy                        1.19.1
oauthlib                     3.1.0
pandas                       1.1.1
parsimonious                 0.8.1
pathlib2                     2.3.5
pip                          20.2.2
proto-plus                   1.9.0
protobuf                     3.13.0
pyasn1                       0.4.8
pyasn1-modules               0.2.8
pycares                      3.1.1
pycoingecko                  1.3.0
pycparser                    2.20
pycryptodome                 3.9.8
pyOpenSSL                    19.1.0
pyrsistent                   0.16.0
python-dateutil              2.8.1
python-telegram-bot          12.8
pytz                         2020.1
PyYAML                       5.3.1
pyzmq                        19.0.2
ratelimiter                  1.2.0.post0
requests                     2.24.0
requests-oauthlib            1.3.0
rlp                          2.0.0a1
rsa                          4.6
rusty-rlp                    0.1.15
setuptools                   49.1.2
six                          1.15.0
sounddevice                  0.3.15
stripe                       2.51.0
tenacity                     4.12.0
toolz                        0.10.0
tornado                      6.0.4
typing-extensions            3.7.4.3
typing-inspect               0.6.0
urllib3                      1.25.10
varint                       1.0.2
web3                         5.12.1
websockets                   8.1
wheel                        0.34.2
yarl                         1.5.1

@crwilcox
Copy link
Contributor

May be a tall ask, but do you have logs to match the failure? That has been the sticking point for fixing this: we know something is causing failures for users, but aren't aware of what that failure is.

@MaticConradi
Copy link

@crwilcox the general consensus I believe has been that there are no logs produced by this issue whatsoever. I have all function code enclosed inside try except blocks with Google Error reporting integrated, and there just isn't anything related to it coming up, not even network related errors, nothing.

@MaticConradi
Copy link

It also looks like the issue has been coming up more frequently for me lately, but this could be because of higher number of changes listeners had to have been handling on my end.

@crwilcox
Copy link
Contributor

@tseaver is your repro WIP PR capturing the failure at this point?

@crwilcox
Copy link
Contributor

@MaticConradi if you have the ability could you try 2.0.0dev1? I am hoping to release it in the next ~30 days, but have it in dev as it also brings along some new async surface and wanted a chance to experiment first.

The underlying protobuffer code is generated differently also so it is possible it has an effect.

@MaticConradi
Copy link

@crwilcox sure, I can look into migrating to v2 and seeing if that fixes it. Is there a list of known issues (if any) with existing functionality that I should be on the lookout for?

@crwilcox
Copy link
Contributor

I am not aware of issues, but there are some changes that are breaking. Particularly the way params are taken to some methods to protect against future API changes causing usability issues/breaks.

We have some scripts you can run on a codebase to move to this as well: https://github.com/googleapis/python-firestore/tree/master/scripts

Depending on what paths you use this could vary in the amount of change.

@MaticConradi
Copy link

Doesn't look like the issue is fixed in 2.0.0dev1, I can confirm the snapshot listeners break after a few hours with no error messages

@NixBiks
Copy link
Author

NixBiks commented Oct 2, 2020

Btw. I imagine that this is related; googleapis/nodejs-firestore#1023 (comment)

@tseaver
Copy link
Contributor

tseaver commented Oct 6, 2020

@crwilcox

Is your repro WIP PR capturing the failure at this point?

Nope.

@tseaver
Copy link
Contributor

tseaver commented Oct 9, 2020

@BenWhitehead, @crwilcox I'm not sure what is to be done here, beyond constructing the scripts to (attempt) reproducing this issue in #158.

@crwilcox
Copy link
Contributor

crwilcox commented Nov 4, 2020

@tseaver I also don't know what can be done to get a more firm repro. Anyone who is experiencing this, if you could try installing the candidate builds for v2.0.0 it is possible the work done to overhaul things a bit may have squashed this bug.

Also, if anyone has suggestions on how to reproduce the failure we would be interested to try to track this down.

@dmahugh
Copy link

dmahugh commented Jan 21, 2021

Closing this issue because it looks like v2.0.0 (released Nov 10, 2020) may have fixed it. If anyone experiences this behavior with v2.0 or above going forward, please open a new issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: firestore Issues related to the googleapis/python-firestore API. priority: p2 Moderately-important priority. Fix may not be included in next release. repro needed 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests