Datastore: infrequent operations always fail first time, requires retry #899

timanovsky · 2016-09-15T08:00:21Z

Once we moved to v1 API we saw significant slow down of one particular operation. Investigation suggested that only particular type of operation is affected - infrequent (once per tens of minutes) reads, and unfortunately the servers performing these ops are not doing any other kind of datastore operations. I believe it is due to the reason mentioned in grpc/grpc-java#1648 that google load balancer shuts inactive TCP connections down after 10 minutes. So if previous operation was further back than that, new operation fails with EOF code (I mentioned that in other issue). Effectively what we see is that the operations goes shortly enough after previous one it takes 120-180 msec, but if retry is involved it takes 1200 ms.

I think some kind of keep alive / pings should be configured on the connection to prevent this. I'm not sure grpc provides such configuration option though.

In worst case, should I implement this keepalive myself in background thread, what would be a good Datastore endpoint to reach, so that it does not depend on data presence?

blowmage · 2016-09-15T12:40:52Z

@murgatroid99 does the GRPC client have a keep alive feature for non-streaming requests?

@timanovsky what is the operation that is affected in V1?

timanovsky · 2016-09-15T14:10:43Z

@blowmage It is just query (with ancestor)

quartzmo · 2016-11-21T20:59:40Z

@timanovsky Is this still an issue? If not, can you close?

blowmage · 2017-02-21T22:41:28Z

@timanovsky Is this still happening with grpc 1.1.2?

timanovsky · 2017-02-22T09:24:15Z

Hi, I haven't updated for awhile, I can give it a try. Does it have the enhancement? I couldn't find anything related in the release notes. Best regards, Alexey Timanovsky.

blowmage · 2017-02-22T13:43:46Z

The grpc 1.1.0 release made improvements to networking. If you install the latest gem you will get use that version. Curious if it improves your situation.

blowmage · 2017-02-22T13:50:32Z

@murgatroid99 can you or someone else comment on keep alive in the GRPC lib?

timanovsky · 2017-02-24T13:42:33Z

I've been running it for half a day, and I would say no change, connections still die after some inactivity. Error text has changed though, now it is GRPC::Internal / 13 / "Transport closed" Best regards, Alexey Timanovsky.

blowmage · 2017-02-24T13:57:22Z

Thanks @timanovski!

Rob117 · 2017-03-02T10:45:34Z

I'm experiencing the same issue in a rails app with the Vision library.

In my config/initializers folder, I have

require 'google/cloud/vision'
project_id = '<valid project here>'
VisionApi = Google::Cloud::Vision.new project: project_id, timeout: 10

Then in the controller, I simply have:

class Api::Services::OcrController < Api::BaseController
  skip_before_action :verify_authenticity_token

  def generate_text
    text = VisionApi.image(request.body).text
    result = {
      locale: text.locale,
      text: text.text
    }
    respond_json result: result
  end
end

This works, but if I wait 4 minutes, I get:

Google::Cloud::InternalError (13:{"created":"@1488451071.891371320","description":"Transport closed","file":"src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":1072}):

And such as a result. Subsequent requests within the time frame still work.

landrito · 2017-03-10T22:41:38Z

grpc/grpc#9986 should be able to help.

swcloud · 2017-03-14T06:03:52Z

@landrito can you work with @apolcyn to find out grpc 1.2 release plan, i.e. whether we can get 1.2 release before March 17?

apolcyn · 2017-03-23T22:27:27Z

grpc-1.2.1.pre1 pre-release gem was just pushed, which should fix this (it includes grpc/grpc#9986). Can you please use this pre-release gem to further test and verify?

swcloud · 2017-03-23T22:43:01Z

@Rob117 @timanovsky Can you please give it a try?

swcloud · 2017-03-29T17:15:47Z

@Rob117 @timanovsky This issue is blocking our release. If there are no updates from you, we will close this issue. You may reopen it if you still run into the issue later.

blowmage · 2017-03-30T19:30:55Z

FWIW, I have not been able to reproduce the behavior in described in this issue. I've left a process idle for hours and it connects again without error.

swcloud · 2017-03-31T21:04:08Z

Close it now since no updates from the original reporters. Fix added in grpc 1.2.1-pre1 gem.

Source-Link: googleapis/googleapis@55499b5 Source-Link: googleapis/googleapis-gen@cf5049b Copy-Tag: eyJwIjoiZ29vZ2xlLWNsb3VkLWNvbXB1dGUtdjEvLk93bEJvdC55YW1sIiwiaCI6ImNmNTA0OWI3MDc5MjgyMDA2NWRiMzhlNzEyN2YzMmVhYjc3MDU5NDQifQ==

quartzmo added api: datastore Issues related to the Datastore API. grpc labels Sep 15, 2016

danoscarmike added this to Cloud Datastore in First 4 (GA) Feb 21, 2017

landrito self-assigned this Mar 1, 2017

landrito mentioned this issue Mar 1, 2017

Infrequent reads of an operation causing transport closed error. grpc/grpc#9941

Closed

bjwatson added the status: acknowledged label Mar 9, 2017

swcloud mentioned this issue Mar 31, 2017

[stackdriver] API enabled, but stackdriver not working, suspicious errors in logs #1355

Closed

swcloud closed this as completed Mar 31, 2017

theacodes unassigned landrito Sep 19, 2018

JustinBeckwith assigned swcloud Feb 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datastore: infrequent operations always fail first time, requires retry #899

Datastore: infrequent operations always fail first time, requires retry #899

timanovsky commented Sep 15, 2016

blowmage commented Sep 15, 2016

timanovsky commented Sep 15, 2016

quartzmo commented Nov 21, 2016

blowmage commented Feb 21, 2017

timanovsky commented Feb 22, 2017 via email •

edited by blowmage

blowmage commented Feb 22, 2017

blowmage commented Feb 22, 2017

timanovsky commented Feb 24, 2017 via email •

edited by blowmage

blowmage commented Feb 24, 2017

Rob117 commented Mar 2, 2017 •

edited

landrito commented Mar 10, 2017

swcloud commented Mar 14, 2017

apolcyn commented Mar 23, 2017

swcloud commented Mar 23, 2017 •

edited

swcloud commented Mar 29, 2017

blowmage commented Mar 30, 2017

swcloud commented Mar 31, 2017

Datastore: infrequent operations always fail first time, requires retry #899

Datastore: infrequent operations always fail first time, requires retry #899

Comments

timanovsky commented Sep 15, 2016

blowmage commented Sep 15, 2016

timanovsky commented Sep 15, 2016

quartzmo commented Nov 21, 2016

blowmage commented Feb 21, 2017

timanovsky commented Feb 22, 2017 via email • edited by blowmage

blowmage commented Feb 22, 2017

blowmage commented Feb 22, 2017

timanovsky commented Feb 24, 2017 via email • edited by blowmage

blowmage commented Feb 24, 2017

Rob117 commented Mar 2, 2017 • edited

landrito commented Mar 10, 2017

swcloud commented Mar 14, 2017

apolcyn commented Mar 23, 2017

swcloud commented Mar 23, 2017 • edited

swcloud commented Mar 29, 2017

blowmage commented Mar 30, 2017

swcloud commented Mar 31, 2017

timanovsky commented Feb 22, 2017 via email •

edited by blowmage

timanovsky commented Feb 24, 2017 via email •

edited by blowmage

Rob117 commented Mar 2, 2017 •

edited

swcloud commented Mar 23, 2017 •

edited