-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datastore: Performance across 1000+ rows #1314
Comments
Hi @jryd, thanks for bringing this to our attention! I've been doing some testing on my side, and using a data set of 2000 rows (with, I think, a similar but not identical dataset to yours), I'm seeing total execution time to read and iterate through the set of under two seconds. Google Cloud PHP doesn't actually send the runQuery request until you start iterating through I'm curious if your filters may be causing the query to run slower than it otherwise would. If you removed, temporarily, the filters and instead used |
Hi @jdpedrie! Thanks for looking into this with me. I updated my query to simply be:
And it is still taking just as long. Could latency be an issue here? I had setup the datastore using 'Cloud Firestore in Datastore' mode, and had to pick the us-east-1 location (I'm in Australia). |
@jdpedrie so I've been testing this over the last few days. I've set the datastore up in the Sydney datacentre, and I'm fetching the results from my local PC here in Brisbane. The query is fetching 714 rows currently, and taking 5 seconds to execute and process in the foreach loop, which is double what you were seeing at 2000 rows... What more could you suggest? |
I'm using the Laravel framework and it doesn't look like their cache implementation implements the So I've pulled in the Symfony one you suggest and it is no quicker; still taking 5 seconds to execute. |
I'll work on seeing if I'm able to replicate the higher latency in the Sydney zone. Could you try something, just to satisfy me that it's the request which is taking the bulk of the time rather than a code problem? The snippet below will use the library's connection to send requests and iterate through the set, but won't do any processing of the data. $projectId = 'MY_PROJECT_ID';
$ds = \Google\Cloud\Core\Testing\TestHelpers::stub(DatastoreClient::class);
$conn = $ds->___getProperty('connection');
$cursor = null;
$done = false;
$now = microtime(true);
do {
$res = $conn->runQuery([
'projectId' => $projectId,
'partitionId' => [
'projectId' => $projectId
],
'query' => [
'kind' => [
[
'name' => 'TemperatureEvent'
]
],
'limit' => 1200,
'startCursor' => $cursor
]
]);
if ($res['batch']['moreResults'] === 'NOT_FINISHED') {
$cursor = $res['batch']['endCursor'];
} else {
$done = true;
}
var_dump('iteration time: '. (microtime(true) - $now));
$now = microtime(true);
} while (!$done); Edit: If you have a very large data set, perhaps you should consider creating some indexes as well, to help optimize the queries you're executing. I'm not sure how much of an impact it would have as it depends heavily on your data set, but it's worth a try, and will be useful in the future anyways. |
I've been running some tests in the australia-southeast1 zone, and I'm seeing considerably higher latency than my normal us-east1 zone. Some of that is due to the geographical distance, but it seems there may be something else going on as well. australia-southeast1:
us-east1:
|
@jdpedrie here's my results:
I have already created a composite index for the query I am running:
|
@tmatsuo have you made any progress looking into this? |
@jryd do you see any performance improvement with my datastore-grpc branch? (Be sure you have |
@jdpedrie there was a tiny performance improvement on the requests (around 20-30ms each request), however, we believe the bottleneck is coming from creating the connection the first time, which took around 200-300ms. We believe the issue can not be fixed as we understand we can not make persistent connection in PHP. |
@joseph1125 have you implemented an auth token cache? use Google\Auth\Cache\SysVCacheItemPool;
use Google\Cloud\Datastore\DatastoreClient;
$datastore = new DatastoreClient([
'authCache' => new SysVCacheItemPool()
]); Any PSR-6 cache implementation will work. If you already use a persistent key/value store such as redis, that would be a great candidate for this case. |
Thanks a lot, it significantly improves our performance |
@jdpedrie sorry for the delay in getting back to you. I went to integrate with gRPC today but your branch you reference is not there anymore. How do I get to testing this now? |
@jryd it is pushed to the latest version, all you need to do is upgrade |
Thanks @joseph1125. As for actually utilising gRPC as opposed to REST calls - I installed gRPC in my Docker containers today - does it just automatically detect and use this? |
Yes, it should detect automatically.
James <notifications@github.com> 於 2019年2月21日 週四 17:12 寫道:
… Thanks @joseph1125 <https://github.com/joseph1125>. As for actually
utilising gRPC as opposed to REST calls - I installed gRPC in my Docker
containers today - does it just automatically detect and use this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1314 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AH5VbnMHEW9ROgwSWLIMVxQN-XieSU3Oks5vPmLzgaJpZM4WzYTN>
.
|
@joseph1125 is correct. If gRPC is available, Google Cloud PHP will default to using it. You can verify whether gRPC is available by calling If you wish to switch between REST and gRPC, you can provide the use Google\Cloud\Datastore\DatastoreClient;
$datastore = new DatastoreClient([
'transport' => 'rest', // available options are 'rest', 'grpc'.
]); |
Thanks @jdpedrie ! I tried to rerun the test you got me to do to see what impact using gRPC has - but the code doesn't seem to run anymore: $projectId = 'my-project-xxxxx';
$ds = \Google\Cloud\Core\Testing\TestHelpers::stub(DatastoreClient::class);
$conn = $ds->___getProperty('connection');
$cursor = null;
$done = false;
$now = microtime(true);
do {
$res = $conn->runQuery([
'projectId' => $projectId,
'partitionId' => [
'projectId' => $projectId
],
'query' => [
'kind' => [
[
'name' => 'TemperatureEvent'
]
],
'limit' => 1200,
'startCursor' => $cursor
]
]);
if ($res['batch']['moreResults'] === 'NOT_FINISHED') {
$cursor = $res['batch']['endCursor'];
} else {
$done = true;
}
var_dump('iteration time: '. (microtime(true) - $now));
$now = microtime(true);
} while (!$done); Gives me an error:
I'd love to rerun the tests and see what a difference gRPC is making. |
Hi @jryd, running your script with dummy data I created, I do not receive an error. Can you share an example entity, represented as a PHP array, which I can insert and then read? Please also make sure you're using the most up-to-date versions of the Google Cloud PHP clients. |
@jryd have you been able to figure this out? |
@jdpedrie I didn't spend any more time trying to recreate the dummy script, but I am happy to report that the performance is much better! I'm using gRPC and I am seeing ~1500 to ~2000 rows loading within 4 or 5 seconds now. |
Excellent! I'm really glad the situation has improved. Please let us know if you run into any more problems. :) |
"~1500 to ~2000 rows loading within 4 or 5 seconds now" It's unfortunate that Datastore is so slow. |
I have a PHP project currently running Laravel 5.7.
I have a bunch of IoT sensors that are sending data into Google Datastore.
I am wanting to query this data so that I can show a graph of this data on the frontend of the project.
The frontend makes a request to my project, my project makes the request to Google Datastore, and then builds up the result to return it as JSON to the frontend.
There's a fair bit of data (roughly 1200 rows), and it takes about 13 seconds to fetch and render on the screen. Which is longer than I would like.
Here is my current code to fetch this IoT Data:
Below is the output of my
dump()
statements:From this, I would understand that the query is fast, but the processing of the data is slow. But that doesn't make sense. I am running a
foreach
loop and would therefore expect this to be pretty fast over already fetched data.When does the library actually fetch the rows? Judging by the timings, I would almost assume that it is during the process of getting the data for that row?
In an attempt to optimise this code and make it faster, I thought I might just take every 50th row:
However the processing time is just as long.
Given what I am trying to achieve, how can I speed this up?
The text was updated successfully, but these errors were encountered: