-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why 130MB Initial Bucket on windows ? #39
Comments
In default configuration, data file (documents) grows every 128 MB, and hash table has an initial capacity of:
The very original story for such large initial size, was for benchmarks to get accurate measure (> 1 second for each feature) without being interrupted by file capacity growth. But you are absolutely correct - in real usage scenarios, high initial capacity is not desired. What do you think about 32 MB data + 32 MB index? |
It's a good idea... 👍 |
There actually is a "small_disk" branch that cuts down initial size to only 4 MB per collection =D but as a result, the performance is about 100x worse. That sounds like a plan - I will fix benchmarks, and reduce initial file size as well. |
I see 3 other possibilities
|
From 130MB to 4MB I can imagine, it has an impact on performances. You spoke about 32MB, it sounds good to me 👍 |
@alexandrestein i agree |
How about offering two options:
And by default, HTTP API creates a small collection; a request parameter will be set for creating the large collection. Benchmarks will continue to use large collection. |
Does the collection growth really need to be fixed ? How about Small collection - grows every Examplefunc getIncrement() int {
size := getSize()
increment := 0
switch true {
case size > 536870912: // 512MB
increment = 134217728 // 128MB
case size > 134217728: // 128MB
increment = 67108864 // Increase by 64MB
default:
increment = 33554432 // Increase by 32MB
}
return increment
} We can still look for a better threshold after proper testing but this is just an example |
That sounds like a rather nice idea. |
The next question may be more interesting: what shall we do with hash table? There are some difficulties with downsizing hash table:
There seems to be two easy solutions:
What do you think, any better idea? |
I don't think this is a big deal. In most of the real case data won't grow more than some MB per minute. And 1MB of "data" per minute is a lot even if you store logs or things like this (I exclude the cases where you store images or binary files). And for those how has data which grows like this, they probably take care of setting up database correctly :-) I think the best thing to do is to set by default a small grow size and let user configure properly the database if he has special needs (server app which append a lot or millions of users adding content). I'm maybe wrong... |
I think dynamically determine collection growth is a very good idea. How about:
|
See my comment in #23 What do you think? |
Fixed in nextgen - number of collection partitions is now configurable. A collection of one partition will only use 32MB disk storage in the beginning. |
I have 11 collections, 256Mb for each with no data at all! Some collections will only hold a few entries. This is a major blow to my project. |
@agozie sorry - the size of collection files depends on the number of CPUs on the system: I intended for the initial size of collection to depend on GOMAXPROCS, so the line must have been a mistake. The collection file size could be reduced to 64MB by replacing runtime.NumCPU()) into 1. |
How will reducing runtime.NumCPU() to one affect performance? Thanks alot. |
Reduce it to one should retain approx.30% in your scenario. Run benchmark ./tiedot -mode=bench to be sure. |
Just curios why
tiedot
need to create initial130MB
file for_uid
&data
on windows. That is260MB
wasted without a inserting any documents.The text was updated successfully, but these errors were encountered: