New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High latency #5
Comments
We are actively working on reducing the network latency between Compute and Cloud Datastore, in particular by having a better collocation of the 2 services. But the numbers you are getting are way higher than the one I'm seeing on average from Compute (<140ms median for a write of 1x 1k random entity, and <100ms median for runquery of 10 entities). Can you share more details about how you measure the latency and from which location you are making the API call? Thanks in advance. PS: feel free to open a separate issue for the dashboard feature request. |
It's worth noting: if there's not much GCD traffic, there's a significant "cold start" penalty on latency. To get realistic latency numbers, I would recommend having a background traffic process running at a few requests per second (doesn't matter much what requests, we've used a beginTransaction+commit pair for our own testing). |
I think I've accounted for the cold start, but I made only about a 100 requests in sequence. I might have hit it on a particularly 'cold' day, though, will try again. Also Google's definition of an active store probably has a lot more 0s on the requests/second count. I measured by turning on the HTTP logger that's included in the Ruby example and looking at the output for sequential runs - definitely not a well thought out benchmark, but enough to show what's going on. |
I tried Cloud Datastore API from Compute Engine. instance type:f1-micro, zone:us-central1-a I tried with this code. This code puts a small entity 1000 times and query latest 10 entities. According to my experiment, it takes for about 150-200 ms on average to put a entity, and takes for about 500-700 ms to query 10 entities. But it takes for 40-50 ms on average to put a entity from App Engine with Low-Level-API. I expect that Cloud Datastore API will be improved its execution speed from Compute Engine. |
I'm still hoping for better speed outside of Compute Engine - namely from AWS. The biggest draw for me is that I can now deploy servers all around with world with a multitude of stacks and frameworks and have them share a common high availability and high speed distributed datastore. Seems like a pipe dream, but definitely possibly if Google can improve the latencies on GCD. |
Can you try running your benchmarks again? |
Great. 7 03, 2013 4:01:16 AM put entities 117610 milliseconds. |
Also, it may be worth running your benchmark again on an n1-standard-4 or n1-highcpu-4 instance. Overall network throughput is higher on higher CPU instances. I wouldn't expect latency to be largely affected, but it's worth verifying with your workload. |
I ran same benchmark again with GCE's n1-highcpu-4 instance. GCE n1-highcpu-4, us-central1-a -put -query I felt that the network speed was increased while I was downloading files using wget and yum. I think that the result of the benchmark of 'put' didn't change much. I'll retry new benchmark for 'query' tomorrow. |
I tried on GCE's n1-highcpu-4 instance. I queried latest 10 entities after I had put entities. (14 times) It took 326 ms to query on average. I think the speed of queries became faster about 20-30%. |
I've inserted 100 blank entities in python. It took 23 seconds :( |
Did you insert them in a single batch request or in serially in individual requests? |
Serially, on purpose. To see how long a single insert would take on average. 200 - 250 ms sounds very long to me. I understand that the replication is probably why it takes so long. But I expected it to be faster. Most results come in, in 5 minute intervals from multiple data sources. So I guess I'll have to queue that up. |
Are there any updates here? |
Not yet. This is still a top priority. |
Any news on this? |
I'm having the same problem. I've created a small script in Node to test get performance and no matter whether running on my local machine or on Google Compute Cloud (on the cheapest instance) I get on average 400 ms response times for a single get request in a tiny data store with just a handful of entries. The response times vary from 200 ms up to several seconds (!). I've tried letting a get run every ten seconds or so for a longer period; the times do not improve. Is this really normal? Would latency improve by running in AppEngine (though that seems extremely complicated using NodeJS)? Even response times of 100 ms, mentioned earlier in this thread, would seem to make it impossible to use Datastore for anything remotely time-critical. But there are people actually using Datastore, right? How are others using it? Or am I doing something wrong? For reference, my tiny timing script:
This yields output like:
Thankful for any help. With these response times I will need to rething my entire architecture. |
This is the same feeling I've been having for a long time now. "Why am I the only one with this problem aren't there hundreds or even thousands of others too building on the google platform? How did they solve this problem? Why is it I don't read anything about these questions on the rest of the internet?" I feel as if the solutions available on the google cloud platform are built for stateless agents/machines that run slow (200ms - 10sec) but large numbers/operations. I've had expected bigquery to eventually to get faster so that we could store chart data in it. But it responds between 2 and 10 secs. With GCD I expected the service behave more or less with the speeds of other databases. I would have been quite ok with 100ms. You will probably have to adjust your design. I hope I'm wrong and I too have missed something. But so far no enlightenment has come. |
We are working hard to solve this problem. We are implementing a new As a stopgap solution, there are things you can do to mitigate the latency. Sorry for the inconvenience, Alfred On Sun Nov 23 2014 at 11:44:20 PM Sjuul Janssen notifications@github.com
|
AE Frontend instance performance? |
Yes, the ah-builtin-datastoreservice version is what is serving the HTTP requests |
I'm measuring the latency between google cloud compute engine and google datastore. I'm performing a simple lookup() using the python client library. The google performance dashboard says that my requests are consuming aroud 18 milliseconds. I assume that this is a server side metric and not a round trip metric. Can someone refer me to the service level agreements behind the minimum round trip response times I should expect currently between google compute engine and google datastore? For a trivial lookup, i'm experiencing around 60 milliseconds. I would expect around 20 milliseconds. Here's the code, not that some of these files are not strictly correlated. Each is intended to be indicative on its own. If someone can show me some code with better latencies, then that would be great. https://github.com/peterrham/projects/blob/master/google_cloud/read.py Here's an example output file: https://github.com/peterrham/projects/blob/master/google_cloud/read.out In this example, I'm getting 77 milliseconds. I'm not trying to be statistically significant here, I'm just looking for some indicative guidance. Data store latency here is under 15 milliseconds: I also have a sample tcpdump ascii text out put file: https://github.com/peterrham/projects/blob/master/google_cloud/40ms.txt using this command line: (the time stamps are the delta values in between the packet events) /usr/sbin/tcpdump -r tcpdump.out -nnq -ttt > 40ms.txt For example, the TCP initial SYN is acked in 1 millisecond, so network latency does not seem to be a problem. However, the ack from the lookup() request is over 40 milliseconds after the request. Server side latency from the app engine logs is 9 milliseconds: 2015-02-02 16:15:55.753 /datastore/v1beta2/Lookup 200 9ms 0kb module=default version=ah-builtin-datastoreservice any ideas? Are their any buffering configurations to setting to get the minimum latency? |
Looks like the 40ms is related to Nagle's algorithm, although I do not think that it accounts for the delayed response which is over 40ms, but i'm not sure. http://neophob.com/2013/09/rpc-calls-and-mysterious-40ms-delay/ |
Hi - we are doing some app and backend testing (prior to launch) and are seeing this exact problem (looking at the ah-builtin-datastoreservice) we see times form 10-20ms to 2000ms. Do you have any timescale on a fix (or when you will apply any such fix already made to the current Beta channel)? Else we'll just migrate to Dynamo. Thanks. |
I second the motion to move to Dynamo. |
I am also disappointed with the performance I'm seeing from GCD. A couple patterns I've noticed: 1) the first request from a new datastore connection is always slow, and 2) batch mutations (even a batch of 100 entities) is painfully slow. |
In my experience it is better than it was when this issue was opened. I used to see a lot more response times between 500ms and 1000ms. Most of my requests are between 200ms and 500ms. Perhaps I've segmented my data better? |
@cerdmann @gcjc I would advise against moving to dynamodb if you're concerned about keeping costs as low as possible. I did an evaluation after hitting this issue and in my opinion dynamodb has one major drawback. You have to provision your read/write capacity ahead of time. This makes it difficult to adjust to a sudden spike in traffic while keeping costs down. There are some auto scaling libraries that attempt to help with this but their adjustment time is not very fast and dynamo currently limits you to scaling back down only a few times a day. I'm really hoping this issue gets fixed soon as there's a huge node.js community that could really take advantage of GCD. |
@obeleh it is hard to predict when there will be problems (and it seems to be any operation). We will likely knock up some longer running tests internally to see if we can spot any patterns. However, we just ran a very short test and it seems to have improved, most requests between 150-220ms. Anyone else seen any improvement, just wondering if any changes in the app engine front end have been made? @hbizira we really do want to use GCD, moving to dynamo will be the last resort. |
Just started using gcloud via nodejs and it's a bit disappointing. Getting from 200ms - 500ms approx latency. This is running on RH OpenShift (really AWS) in the US and EU. Tried my home connection too, all roughly the same. I hope I'm doing something wrong :/ |
I got better results than that, but still not great. What do you mean By the way, I'm trying out google "bigtable" which is in Beta. It promises On Wed, Jul 8, 2015 at 4:54 PM, jonface notifications@github.com wrote:
|
I was referring to Amazon Web Services EC2 which is the underlying infrastructure of OpenShift. Isn't BigTable more expensive and overkill? I'll check my timings with wireshark. Also my datastore is empty, so it's not like it's got millions of items in it. |
OK, it's hard to tell exactly how long it's taking via wireshark due to the TLS but you can kind of guess. TLS connection SYN to FIN - total 460ms What is interesting is that for every query, a new connection is setup/destroyed. Is this correct? Am I doing something wrong? Would it not be better to keep some connection pool and reuse connections? Thanks |
I realise I'm being impatient but what's the plan for this? Is this just a nodejs lib problem? |
@pcostell Will this be addressed with |
That is the goal, but we are still working on benchmarking v1beta3. |
@pcostell I am seriously considering this service. Please post an update about this issue since it has been a while. Thanks for the great work. |
We are seeing much better latency numbers with v1beta3. v1beta3 has been a complete rewrite in our infrastructure and as such we are being very cautious about rolling it out. We hope to have it ready early this year. |
Great! On Fri, Jan 8, 2016 at 2:09 PM, Patrick Costello notifications@github.com
|
@pcostell What numbers are you seeing? Anything below 20 ms? That is the performance goal that we have, and it is easily achievable in DynamoDB. This issue is a dealbreaker for us for moving to Google Cloud right now. |
I thought I would share some instrumentation as I found this issue and have been following it with interest. Below is 95th percentile and mean (in milli-seconds) for PUT requests moving from v1beta2 to v1beta3. Both show an increase of ~10x (~25ms & ~10ms) - good work guys! It would also be nice to get it out of beta endpoints for us to use in production. One thing worth mentioning is that the v1beta3 seems to be missing the transaction endpoint. |
Hi @leonardaustin, can you share details about what service(s) you're using? We haven't actually launched v1beta3 yet, so I'm wondering if we're talking about two different things. |
@eddavisson Sure, I forked https://github.com/GoogleCloudPlatform/gcloud-golang and changed the url from v1beta2 to v1beta3. |
Is there a list of changes I should see or can I just change the url and I'm set? |
The v1beta3 API is not yet available to customers. We will be sure to post announcement here when it is. |
If possible I would like to sign up for the public beta of the beta? |
@faizalkassamalisc, we're busy reticulating splines and hopefully will have an update for you soon. Thanks for your patience! @obeleh, the beta will be public and just requires you using the appropriate API clients. There will be no other sign-up beyond the normal project creation you do currently. |
Is there a more technical page with latencies expressed as milliseconds? Thanks! |
We don't have a page with latency numbers since it's a moving target as we continually improve the platform as well as depending upon both the location your Cloud Datastore was set up for as well as from where you are accessing it. Keep in mind you cannot compare DynamoDB with Cloud Datastore directly as we're a functionally different service and in most cases are serving customers in a Multi-Regional instance vs merely Regional. |
Why the first request from a datastore is always slow? |
Hello Guys, this is probably the wrong place for ask this question but I didn't find any better place. Could datastore using local disks play the same role of redis for session cache? Everything would be much easier and cheaper to our company. |
Is this ever solved? Any one share how much are the latencies we can expect now on data store(Firestore with datastore mode vs native mode)? |
b/8334662
I'm consistently seeing hight latency numbers when using the GCD - about 1.5 seconds on average for queries (10 items or less) and single item writes ( < 1kb).
My ping time to the API endpoint is about 50ms, so discounting a 100ms round trip, that still leaves more than a second to GCD latency. This simply won't work for a server environment, certainly not one that scales. This is very surprising because I was expecting latencies much closer to GAE: https://code.google.com/status/appengine/detail/hr-datastore/2013/06/05#ae-trust-detail-hr-datastore-query-latency
Can we get a dashboard like the GAE HR Datastore status board? Possibly measuring latencies from Google Compute Engine instances and a few different AWS Regions?
The text was updated successfully, but these errors were encountered: