New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow query(and it gets slower with readMode: "outdated") #5195
Comments
Running successive times doesn't change the speed, removing the eqJoin().zip() part reduces the time to 1.8 seconds, with readMode: "outdated" it raises to 2.1 seconds. |
An average document of the pageviews table: {
"actions": 0 ,
"campaign_params": { } ,
"category": null ,
"client_id": 7 ,
"created_at": Thu Dec 03 2015 16:15:18 GMT-02:00 ,
"deep_link": false ,
"domain": "delivery.veran.com.br" ,
"end_ts": 1449170115516 ,
"ended_at": Thu Dec 03 2015 16:15:36 GMT-02:00 ,
"href": http://delivery.veran.com.br/produto, »
"id": "000147a8-5b2b-4c16-9e4c-4eac40664562-7-browser031220151600-1449170096530" ,
"nav_type": "normal" ,
"referer": {
"domain": "delivery.veran.com.br" ,
"path": "/" ,
"query": null ,
"type": "internal"
} ,
"scroll": [
{
"percentage": {
"pixel_depth": 763 ,
"timing": 7.32 ,
"value": 25
}
} ,
{
"percentage": {
"pixel_depth": 0 ,
"timing": 7.32 ,
"value": 50
}
} ,
{
"percentage": {
"pixel_depth": 0 ,
"timing": 7.32 ,
"value": 75
}
} ,
{
"percentage": {
"pixel_depth": 1063 ,
"timing": 7.82 ,
"value": 100
}
}
] ,
"session_id": "000147a8-5b2b-4c16-9e4c-4eac40664562-7-browser031220151600" ,
"ssl": false ,
"timing": {
"ttdr": 2.71 ,
"ttfb": 0.28 ,
"tti": 1.98
} ,
"title": "supermercado delivery veran - agora também somos delivery!" ,
"ts": 1449170096530 ,
"visitor_id": "000147a8-5b2b-4c16-9e4c-4eac40664562"
} Session documents should be half the size approximately Also, when running this query, all four cores go to ~90% but memory stays the same(8gb consumed), this is running under a write load of ~50/sec And it is impossible to run this query on the Data Explorer, it always hangs |
I moved the secondary index into the primary id as an array(on a temporary table, just for testing), but the client_id is incremental so all the data went into only one shard(could this be a issue for secondary indexes too?). |
Rebalanced the test table, run the query again. 4.2 seconds and 4.6s with readMode: "outdated", it seems outdated reads will always be slower when the data is evenly distributed among the shards |
To benefit from outdated reads, you could try this:
i.e. something like this: r.table("pageviews").between([7,r.time(2015,12,1,"-02:00")],[7,r.now()],{index: "client_id_created_at"})
.eqJoin("session_id", r.table("sessions", {readMode: "outdated"})).zip().merge(........ This will allow the I suspect that the Could you post the full query you tested when you removed the |
All tables are already replicated evenly among all shards |
With readMode: "outdated" on the sessions table and removing this optarg from pageviews I get 3,6 seconds |
The query that returned in 1.8 seconds:
|
Talking with @danielmewes, it seems I should raise the cpu cores of my cluster and rerun the queries. I will do that and report again soon, but this is really sad, I didn't know RethinkDB wasn't suitable for big datasets(although today a million records is nothing) |
I believe this is just joins being slow, i.e. #5115 . RethinkDB can generally handle billions of rows, though it's probably not the right system to run analytics at such scales. More for OLTP-style queries where you only use a small selection of rows at a time. |
I don't intend to process billiions of rows in a single query, but my point stands, in 2015, a million rows is nothing |
@danielmewes, I've replaced the three servers of my cluster with 16-core instances each(they were 4-core), running the first query of this issue gives me roughly the same time :( |
So I have this query which returns 13k items before the .group() and 569 after:
This takes approximately 5 seconds, running it with a node.js script on the same machine as the database. I have a cluster of 3 n1-standard4 google cloud instances, all with 100gb SSD disks(database is using 4gb, cache is using 10gb of 15gb available RAM.
This is supposed to run on hundreds of millions of records in the worst case, so I get really alarmed that it takes 5 seconds for a small dataset.
Also, I tried setting readMode to "outdated" and to my surprise it slowed the query to 7 seconds.
I'm using latest RethinkDB version and rethinkdbdash
The text was updated successfully, but these errors were encountered: