Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

samples.snippets.subscriber_test: test_receive_with_delivery_attempts failed #291

Closed
flaky-bot bot opened this issue Feb 12, 2021 · 17 comments
Closed

samples.snippets.subscriber_test: test_receive_with_delivery_attempts failed #291

flaky-bot bot opened this issue Feb 12, 2021 · 17 comments

Comments

@flaky-bot
Copy link

@flaky-bot flaky-bot bot commented Feb 12, 2021

This test failed!

To configure my behavior, see the Flaky Bot documentation.

If I'm commenting on this issue too often, add the flakybot: quiet label and
I will stop commenting.


commit: a8c8f53
buildURL: Build Status, Sponge
status: failed

Test output
publisher_client = 
topic = 'projects/python-docs-samples-tests/topics/subscription-test-topic-3.7-6de8422ca9fd4a8582af72286c76b407'
dead_letter_topic = 'projects/python-docs-samples-tests/topics/subscription-test-dead-letter-topic-3.7-6de8422ca9fd4a8582af72286c76b407'
subscription_dlq = 'projects/python-docs-samples-tests/subscriptions/subscription-test-subscription-dlq-3.7-6de8422ca9fd4a8582af72286c76b407'
capsys = <_pytest.capture.CaptureFixture object at 0x7f7ccf021f90>
def test_receive_with_delivery_attempts(
    publisher_client, topic, dead_letter_topic, subscription_dlq, capsys
):
    _publish_messages(publisher_client, topic)
  subscriber.receive_messages_with_delivery_attempts(PROJECT_ID, SUBSCRIPTION_DLQ, 90)

subscriber_test.py:236:


subscriber.py:654: in receive_messages_with_delivery_attempts
streaming_pull_future.result(timeout=timeout)


self = <google.cloud.pubsub_v1.subscriber.futures.StreamingPullFuture object at 0x7f7ccf019cd0>
timeout = 90

def result(self, timeout=None):
    """Resolve the future and return a value where appropriate.

    Args:
        timeout (Union[int, float]): The number of seconds before this call
            times out and raises TimeoutError.

    Raises:
        concurrent.futures.TimeoutError: If the request times out.
        Exception: For undefined exceptions in the underlying
            call execution.
    """
    # Attempt to get the exception if there is one.
    # If there is not one, then we know everything worked, and we can
    # return an appropriate value.
    err = self.exception(timeout=timeout)
    if err is None:
        return self._result
  raise err

E google.api_core.exceptions.NotFound: 404 Resource not found (resource=projects/python-docs-samples-tests/subscriptions/subscription-test-subscription-dlq-3.7-6de8422ca9fd4a8582af72286c76b407).

../../google/cloud/pubsub_v1/futures.py:105: NotFound

@flaky-bot
Copy link
Author

@flaky-bot flaky-bot bot commented Feb 12, 2021

Looks like this issue is flaky. 😟

I'm going to leave this open and stop commenting.

A human should fix and close this.


When run at the same commit (a8c8f53), this test passed in one build (Build Status, Sponge) and failed in another build (Build Status, Sponge).

Loading

@plamut
Copy link
Contributor

@plamut plamut commented Feb 22, 2021

FWIW, I had trouble reproducing this locally, but it happened several times in a row on one of the PRs (link). test_receive_with_delivery_attempts failed was not the only failure, but appeared the most common in those Kokoro runs.

Loading

@flaky-bot
Copy link
Author

@flaky-bot flaky-bot bot commented Mar 7, 2021

Oops! Looks like this issue is still flaky. It failed again. 😬

I reopened the issue, but a human will need to close it again.


commit: f890d2d
buildURL: Build Status, Sponge
status: failed

Test output
publisher_client = 
topic = 'projects/python-docs-samples-tests/topics/subscription-test-topic-3.6-ebaf1382094248e6bc70ce51214a75c3'
dead_letter_topic = 'projects/python-docs-samples-tests/topics/subscription-test-dead-letter-topic-3.6-ebaf1382094248e6bc70ce51214a75c3'
subscription_dlq = 'projects/python-docs-samples-tests/subscriptions/subscription-test-subscription-dlq-3.6-ebaf1382094248e6bc70ce51214a75c3'
capsys = <_pytest.capture.CaptureFixture object at 0x7fe8ef546710>
def test_receive_with_delivery_attempts(
    publisher_client, topic, dead_letter_topic, subscription_dlq, capsys
):
    _publish_messages(publisher_client, topic)
  subscriber.receive_messages_with_delivery_attempts(PROJECT_ID, SUBSCRIPTION_DLQ, 90)

subscriber_test.py:236:


subscriber.py:654: in receive_messages_with_delivery_attempts
streaming_pull_future.result(timeout=timeout)


self = <google.cloud.pubsub_v1.subscriber.futures.StreamingPullFuture object at 0x7fe8ef4a9a58>
timeout = 90

def result(self, timeout=None):
    """Resolve the future and return a value where appropriate.

    Args:
        timeout (Union[int, float]): The number of seconds before this call
            times out and raises TimeoutError.

    Raises:
        concurrent.futures.TimeoutError: If the request times out.
        Exception: For undefined exceptions in the underlying
            call execution.
    """
    # Attempt to get the exception if there is one.
    # If there is not one, then we know everything worked, and we can
    # return an appropriate value.
    err = self.exception(timeout=timeout)
    if err is None:
        return self._result
  raise err

E google.api_core.exceptions.NotFound: 404 Resource not found (resource=projects/python-docs-samples-tests/subscriptions/subscription-test-subscription-dlq-3.6-ebaf1382094248e6bc70ce51214a75c3).

../../google/cloud/pubsub_v1/futures.py:105: NotFound

Loading

@tmatsuo
Copy link
Member

@tmatsuo tmatsuo commented Mar 8, 2021

The error message google.api_core.exceptions.NotFound: 404 Resource not found is probably a new one.
I suspect that when flaky retries the test, somehow the test is using the same subscription as the first attempt, but that subscription is already deleted.

I'll take a look at the test.

Loading

@tmatsuo
Copy link
Member

@tmatsuo tmatsuo commented Mar 8, 2021

Ah, this is the dlq subscription. I think 404 is thrown when the dlq subscription is not yet ready.
Currently the test is retrying NotFound up to 300 seconds. Maybe bumping that max_time helps.

Loading

@flaky-bot
Copy link
Author

@flaky-bot flaky-bot bot commented Mar 12, 2021

Oops! Looks like this issue is still flaky. It failed again. 😬

I reopened the issue, but a human will need to close it again.


commit: 83c85bd
buildURL: Build Status, Sponge
status: failed

Test output
publisher_client = 
topic = 'projects/python-docs-samples-tests/topics/subscription-test-topic-3.7-c278a21861124e49ad4b189cd29bf364'
dead_letter_topic = 'projects/python-docs-samples-tests/topics/subscription-test-dead-letter-topic-3.7-c278a21861124e49ad4b189cd29bf364'
subscription_dlq = 'projects/python-docs-samples-tests/subscriptions/subscription-test-subscription-dlq-3.7-c278a21861124e49ad4b189cd29bf364'
capsys = <_pytest.capture.CaptureFixture object at 0x7f9a3db3ed10>
def test_receive_with_delivery_attempts(
    publisher_client, topic, dead_letter_topic, subscription_dlq, capsys
):
    _publish_messages(publisher_client, topic)
  subscriber.receive_messages_with_delivery_attempts(PROJECT_ID, SUBSCRIPTION_DLQ, 90)

subscriber_test.py:236:


subscriber.py:654: in receive_messages_with_delivery_attempts
streaming_pull_future.result(timeout=timeout)


self = <google.cloud.pubsub_v1.subscriber.futures.StreamingPullFuture object at 0x7f9a3db478d0>
timeout = 90

def result(self, timeout=None):
    """Resolve the future and return a value where appropriate.

    Args:
        timeout (Union[int, float]): The number of seconds before this call
            times out and raises TimeoutError.

    Raises:
        concurrent.futures.TimeoutError: If the request times out.
        Exception: For undefined exceptions in the underlying
            call execution.
    """
    # Attempt to get the exception if there is one.
    # If there is not one, then we know everything worked, and we can
    # return an appropriate value.
    err = self.exception(timeout=timeout)
    if err is None:
        return self._result
  raise err

E google.api_core.exceptions.NotFound: 404 Resource not found (resource=projects/python-docs-samples-tests/subscriptions/subscription-test-subscription-dlq-3.7-c278a21861124e49ad4b189cd29bf364).

../../google/cloud/pubsub_v1/futures.py:105: NotFound

Loading

@tmatsuo
Copy link
Member

@tmatsuo tmatsuo commented Mar 13, 2021

My suspicion right now is that the test somehow failed to retry the sample invocation. From the last build log, I don't see any signs where the test retrying NotFound exception.

Loading

@tmatsuo
Copy link
Member

@tmatsuo tmatsuo commented Mar 13, 2021

For example, the log says:

subscriber_test.py:236:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
subscriber.py:654: in receive_messages_with_delivery_attempts
    streaming_pull_future.result(timeout=timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
  1. The line no is wrong.
  2. There's no backoff._sync.py in the stack trace. Normally we see something like the following:
.nox/py-3-7/lib/python3.7/site-packages/backoff/_sync.py:94: in retry                                                                                              
    ret = target(*args, **kwargs) 

It seems like somehow the build was testing against the code without my backoff change. I don't know why it happened, but that's my interpretation of the logs.

Loading

@flaky-bot flaky-bot bot reopened this Mar 16, 2021
@flaky-bot
Copy link
Author

@flaky-bot flaky-bot bot commented Mar 16, 2021

Oops! Looks like this issue is still flaky. It failed again. 😬

I reopened the issue, but a human will need to close it again.


commit: 34c9b11
buildURL: Build Status, Sponge
status: failed

Test output
publisher_client = 
topic = 'projects/python-docs-samples-tests/topics/subscription-test-topic-3.8-6f1b53aa1e5c4232a6d71bdd459ac06d'
dead_letter_topic = 'projects/python-docs-samples-tests/topics/subscription-test-dead-letter-topic-3.8-6f1b53aa1e5c4232a6d71bdd459ac06d'
subscription_dlq = 'projects/python-docs-samples-tests/subscriptions/subscription-test-subscription-dlq-3.8-6f1b53aa1e5c4232a6d71bdd459ac06d'
capsys = <_pytest.capture.CaptureFixture object at 0x7f154fe3f3d0>
def test_receive_with_delivery_attempts(
    publisher_client, topic, dead_letter_topic, subscription_dlq, capsys
):
    _publish_messages(publisher_client, topic)
  subscriber.receive_messages_with_delivery_attempts(PROJECT_ID, SUBSCRIPTION_DLQ, 90)

subscriber_test.py:236:


subscriber.py:654: in receive_messages_with_delivery_attempts
streaming_pull_future.result(timeout=timeout)


self = <google.cloud.pubsub_v1.subscriber.futures.StreamingPullFuture object at 0x7f154fe522e0>
timeout = 90

def result(self, timeout=None):
    """Resolve the future and return a value where appropriate.

    Args:
        timeout (Union[int, float]): The number of seconds before this call
            times out and raises TimeoutError.

    Raises:
        concurrent.futures.TimeoutError: If the request times out.
        Exception: For undefined exceptions in the underlying
            call execution.
    """
    # Attempt to get the exception if there is one.
    # If there is not one, then we know everything worked, and we can
    # return an appropriate value.
    err = self.exception(timeout=timeout)
    if err is None:
        return self._result
  raise err

E google.api_core.exceptions.NotFound: 404 Resource not found (resource=projects/python-docs-samples-tests/subscriptions/subscription-test-subscription-dlq-3.8-6f1b53aa1e5c4232a6d71bdd459ac06d).

../../google/cloud/pubsub_v1/futures.py:105: NotFound

Loading

@tmatsuo
Copy link
Member

@tmatsuo tmatsuo commented Mar 16, 2021

Again, the log indicates that it tests without retrying code.

subscriber_test.py:236:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
subscriber.py:654: in receive_messages_with_delivery_attempts
    streaming_pull_future.result(timeout=timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
  1. The line no (236) is wrong
  2. Not seeing stacktrace for backoff

I'm trying to figure out why it's happening.

Loading

@tmatsuo
Copy link
Member

@tmatsuo tmatsuo commented Mar 16, 2021

Probably not related, but I noticed that the builds are installing the library from source:

------------------------------------------------------------
- testing samples/snippets
------------------------------------------------------------
nox > Running session py-3.8
nox > Creating virtual environment (virtualenv) using python3.8 in .nox/py-3-8
nox > python -m pip install -r requirements.txt
nox > python -m pip install -r requirements-test.txt
nox > python -m pip install -e /tmpfs/src/github/python-pubsub
nox > pytest --junitxml=sponge_log.xml

Loading

@tmatsuo
Copy link
Member

@tmatsuo tmatsuo commented Mar 16, 2021

Reported: googleapis/synthtool#988

Loading

@tmatsuo
Copy link
Member

@tmatsuo tmatsuo commented Mar 16, 2021

I think I understand what's happening.

Our test script rewinds the repo to the latest release:

# Run periodic samples tests at latest release
if [[ $KOKORO_BUILD_ARTIFACTS_SUBDIR = *"periodic"* ]]; then
LATEST_RELEASE=$(git describe --abbrev=0 --tags)
git checkout $LATEST_RELEASE
fi

The test is running without the backoff fix.

@busunkim96

We fixed flaky tests for sample, but our periodic builds are running test at the latest release tag.

Should we make a release?

Loading

@busunkim96
Copy link
Collaborator

@busunkim96 busunkim96 commented Mar 16, 2021

@tmatsuo Yep, a release should close the issue. We can chat tomorrow about whether the "rewinding to last release" is the right thing to do or if periodic should just run samples at head.

Loading

@tmatsuo
Copy link
Member

@tmatsuo tmatsuo commented Mar 18, 2021

It should be fixed now for presubmit-head builds.

For a time being, we may see the same error in presubmit builds, which we can safely ignore because the presubmit builds are using the source code at the latest release so that it doesn't have the fix for the flake.

Loading

@tmatsuo tmatsuo closed this Mar 18, 2021
@flaky-bot
Copy link
Author

@flaky-bot flaky-bot bot commented Mar 27, 2021

Oops! Looks like this issue is still flaky. It failed again. 😬

I reopened the issue, but a human will need to close it again.


commit: d40d027
buildURL: Build Status, Sponge
status: failed

Test output
publisher_client = 
topic = 'projects/python-docs-samples-tests/topics/subscription-test-topic-3.8-3e3197671e994bb9a911396fd2e6203a'
dead_letter_topic = 'projects/python-docs-samples-tests/topics/subscription-test-dead-letter-topic-3.8-3e3197671e994bb9a911396fd2e6203a'
subscription_dlq = 'projects/python-docs-samples-tests/subscriptions/subscription-test-subscription-dlq-3.8-3e3197671e994bb9a911396fd2e6203a'
capsys = <_pytest.capture.CaptureFixture object at 0x7fa5ed464a60>
def test_receive_with_delivery_attempts(
    publisher_client, topic, dead_letter_topic, subscription_dlq, capsys
):
    _publish_messages(publisher_client, topic)
  subscriber.receive_messages_with_delivery_attempts(PROJECT_ID, SUBSCRIPTION_DLQ, 90)

subscriber_test.py:236:


subscriber.py:654: in receive_messages_with_delivery_attempts
streaming_pull_future.result(timeout=timeout)


self = <google.cloud.pubsub_v1.subscriber.futures.StreamingPullFuture object at 0x7fa5ed47aee0>
timeout = 90

def result(self, timeout=None):
    """Resolve the future and return a value where appropriate.

    Args:
        timeout (Union[int, float]): The number of seconds before this call
            times out and raises TimeoutError.

    Raises:
        concurrent.futures.TimeoutError: If the request times out.
        Exception: For undefined exceptions in the underlying
            call execution.
    """
    # Attempt to get the exception if there is one.
    # If there is not one, then we know everything worked, and we can
    # return an appropriate value.
    err = self.exception(timeout=timeout)
    if err is None:
        return self._result
  raise err

E google.api_core.exceptions.NotFound: 404 Resource not found (resource=projects/python-docs-samples-tests/subscriptions/subscription-test-subscription-dlq-3.8-3e3197671e994bb9a911396fd2e6203a).

../../google/cloud/pubsub_v1/futures.py:105: NotFound

Loading

@flaky-bot flaky-bot bot reopened this Mar 27, 2021
@tmatsuo
Copy link
Member

@tmatsuo tmatsuo commented Mar 27, 2021

We fix the flake at head, but the periodic build uses the code without the fix (it rewinds the git to the latest release). It's expected to fail for now. The flake will go away once we release the next version.

We do have periodic builds against head. They seem to be green, closing.

Loading

@tmatsuo tmatsuo closed this Mar 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants