-
Notifications
You must be signed in to change notification settings - Fork 43
Description
My requests to the datastore keep on timing out with 504 errors, despite a very long retry handler. Here is the stack trace:
Traceback (most recent call last):
...
File "/home/vmagent/app/updater/views.py", line 6484, in _get_most_recent_update
doc = list(doc.order_by("-last_get")[:1])
File "/home/vmagent/app/django/db/models/query.py", line 96, in __iter__
self._fetch_all()
File "/home/vmagent/app/django/db/models/query.py", line 854, in _fetch_all
self._result_cache = list(self.iterator())
File "/home/vmagent/app/django/db/models/query.py", line 220, in iterator
for row in compiler.results_iter():
File "/home/vmagent/app/djangotoolbox/db/basecompiler.py", line 380, in results_iter
for entity in results:
File "/home/vmagent/app/djangoappengine/db_gcl/compiler.py", line 256, in fetch
for entity in results:
File "/env/local/lib/python2.7/site-packages/google/api_core/page_iterator.py", line 199, in _items_iter
for page in self._page_iter(increment=False):
File "/env/local/lib/python2.7/site-packages/google/api_core/page_iterator.py", line 230, in _page_iter
page = self._next_page()
File "/env/local/lib/python2.7/site-packages/google/cloud/datastore/query.py", line 517, in _next_page
query=query_pb,
File "/env/local/lib/python2.7/site-packages/google/cloud/datastore_v1/gapic/datastore_client.py", line 294, in run_query
request, retry=retry, timeout=timeout, metadata=metadata)
File "/env/local/lib/python2.7/site-packages/google/api_core/gapic_v1/method.py", line 139, in __call__
return wrapped_func(*args, **kwargs)
File "/env/local/lib/python2.7/site-packages/google/api_core/retry.py", line 260, in retry_wrapped_func
on_error=on_error,
File "/home/vmagent/app/appengine_config.py", line 250, in new_retry_target
deadline, target), last_exc), last_exc)
File "/app/libs/six.py", line 692, in raise_from
raise value
RetryError: Deadline of 900.0s exceeded while calling <functools.partial object at 0x7fdd6b7e1a48>, last exception: 504 Deadline Exceeded
"
I am using the following versions of the API:
google-cloud-datastore==1.7.0
google-cloud-logging==1.6.0
google-cloud-storage==1.10.0
Running Python 2.7 on Google Cloud Flexible with the following app.yaml:
runtime: custom
api_version: 1
service: vm
env: flex
entrypoint: gunicorn -b :$PORT main:app
beta_settings:
enable_app_engine_apis: true
automatic_scaling:
min_num_instances: 3
max_num_instances: 50
cool_down_period_sec: 240
cpu_utilization:
target_utilization: 0.6
resources:
cpu: 8
memory_gb: 32
disk_size_gb: 20
It seems that requests to the datastore are timing out with 504 errors even after retrying many times. To better debug the issue, I monkey patched the google.api_core.retry_target function which inserts more logging on retry calls. The monkey patch is as follows (compare to original funciton here):
def improve_retry_logging():
'''Improve debug logging for our retry handler.'''
try:
from google.api_core import retry
except AllImportErrors as e:
return
logging.info("Monkey patching retry.retry_target: %s", retry)
def new_retry_target(target, predicate, sleep_generator, deadline,
on_error = None):
if deadline is not None:
deadline_datetime = (
retry.datetime_helpers.utcnow() +
retry.datetime.timedelta(seconds=deadline))
else:
deadline_datetime = None
last_exc = None
total_sleep = 0.0
for sleep_i, sleep in enumerate(sleep_generator):
try:
result = target()
# pylint: disable=broad-except
# This function explicitly must deal with broad exceptions.
except Exception as exc:
if not predicate(exc):
raise
last_exc = exc
if on_error is not None:
on_error(exc)
else:
if sleep_i > 0:
logging.info('retry_target success. Retries: %d. '
'Slept %.2f Secs', sleep_i, total_sleep)
return result
now = retry.datetime_helpers.utcnow()
if deadline_datetime is not None and deadline_datetime < now:
logging.warning('retry_target deadline reached. Retries: %d.'
'Deadline: %.1f S. Slept: %.2f S, Last %.2f S. Err: %s',
sleep_i, deadline, total_sleep, last_exc)
retry.six.raise_from(
retry.exceptions.RetryError('Retry ({}) deadline of '
'{:.1f}s exceeded while calling {}'.format(
sleep_i, deadline, target,), last_exc), last_exc)
logging.info('retry_target retry due to "%s" %s (%d), sleep %.2fs',
last_exc, target, sleep_i, sleep)
retry.time.sleep(sleep)
total_sleep += sleep
raise ValueError('Sleep generator stopped yielding sleep values.')
# Here we do the monkey patch.
retry.retry_target = new_retry_target
improve_retry_logging()
With the logging instrumentation above, you can clearly see the retry handler being hit multiple times, and failing every single time with these 504 errors.
E web deferred_free_update ctqueue 'Connecticut State, Supreme Court', '19966': Deadline of 900.0s exceeded while calling <functools.partial object at 0x7fdd6b7e1a48>, last exception: 504 Deadline Exceeded (50.03% mem, 187 threads) ...
I retry_target retry due to "504 Deadline Exceeded" <functools.partial object at 0x7fdd6b7e1a48> (7), sleep 3.21s 140588964886272
I retry_target retry due to "504 Deadline Exceeded" <functools.partial object at 0x7fdd6b7e1a48> (7), sleep 3.21s 140588964886272
I retry_target retry due to "504 Deadline Exceeded" <functools.partial object at 0x7fdd6b7e1a48> (6), sleep 13.02s 140588964886272
I retry_target retry due to "504 Deadline Exceeded" <functools.partial object at 0x7fdd6b7e1a48> (5), sleep 6.76s 140588964886272
I retry_target retry due to "504 Deadline Exceeded" <functools.partial object at 0x7fdd6b7e1a48> (4), sleep 4.10s 140588964886272
I retry_target retry due to "504 Deadline Exceeded" <functools.partial object at 0x7fdd6b7e1a48> (3), sleep 2.17s 140588964886272
I retry_target retry due to "504 Deadline Exceeded" <functools.partial object at 0x7fdd6b7e1a48> (2), sleep 0.08s 140588964886272
I retry_target retry due to "504 Deadline Exceeded" <functools.partial object at 0x7fdd6b7e1a48> (1), sleep 0.50s 140588964886272
I retry_target retry due to "504 Deadline Exceeded" <functools.partial object at 0x7fdd6b7e1a48> (0), sleep 0.26s 140588964886272
Also, I have already opened support request 16456733, but they do not seem able to help.