Why 130MB Initial Bucket on windows ? #39

olekukonko · 2013-12-08T18:52:28Z

Just curios why tiedot need to create initial 130MB file for _uid & data on windows. That is 260MB wasted without a inserting any documents.

The text was updated successfully, but these errors were encountered:

HouzuoGuo · 2013-12-09T07:20:40Z

In default configuration, data file (documents) grows every 128 MB, and hash table has an initial capacity of:

2 ^ 14 keys (16384 distinct values)
100 entries per key

The very original story for such large initial size, was for benchmarks to get accurate measure (> 1 second for each feature) without being interrupted by file capacity growth.

But you are absolutely correct - in real usage scenarios, high initial capacity is not desired. What do you think about 32 MB data + 32 MB index?

alexandrestein · 2013-12-09T07:27:54Z

It's a good idea... 👍
I think we already discussed about that. :-)

HouzuoGuo · 2013-12-09T07:37:59Z

There actually is a "small_disk" branch that cuts down initial size to only 4 MB per collection =D but as a result, the performance is about 100x worse.

That sounds like a plan - I will fix benchmarks, and reduce initial file size as well.

olekukonko · 2013-12-09T07:46:59Z

I see 3 other possibilities

Making it optional 4 , 8 , 16, 32 , 64, 128 MB and recommend 128MB for best performance
Introduce preemption (Percentage Usage & Volume of Data insert per sec) on increment
Store data and read data from memory while file system would just be backup

alexandrestein · 2013-12-09T07:48:24Z

From 130MB to 4MB I can imagine, it has an impact on performances.
But maybe somewhere among those values you can find a balance between performances and DB files size.

You spoke about 32MB, it sounds good to me 👍

olekukonko · 2013-12-09T07:54:27Z

@alexandrestein i agree 32MB sounds good but the size can be flexible

HouzuoGuo · 2013-12-09T08:30:53Z

How about offering two options:

Small collection - grows every 32 MB
Large collection - grows every 128MB (current config)

And by default, HTTP API creates a small collection; a request parameter will be set for creating the large collection.

Benchmarks will continue to use large collection.

olekukonko · 2013-12-09T08:53:39Z

Does the collection growth really need to be fixed ? How about Small collection - grows every 32MB when data <= X because gowning a large data by 32MB might be too much overhead.

Example

func getIncrement() int {
    size := getSize()
    increment := 0
    switch true {
    case size > 536870912: // 512MB
        increment = 134217728 // 128MB
    case size > 134217728: // 128MB
        increment = 67108864 // Increase by 64MB
    default:
        increment = 33554432 // Increase by 32MB
    }
    return increment
}

We can still look for a better threshold after proper testing but this is just an example

HouzuoGuo · 2013-12-09T20:04:55Z

That sounds like a rather nice idea.

HouzuoGuo · 2013-12-09T20:24:20Z

The next question may be more interesting: what shall we do with hash table?

There are some difficulties with downsizing hash table:

The algorithm is a classic static hash table (unfortunately dynamic resizing is close to impossible)
The initial "head" buckets must be allocated upfront.
The performance worsens a lot if initial hash table size is brought down.
Downsize hash table configuration is not feasible right now - it will break everyone's existing hash table.

There seems to be two easy solutions:

Rewrite hash table to use a better algorithm
Make hash table parameter configurable

What do you think, any better idea?

alexandrestein · 2013-12-09T22:50:32Z

I don't think this is a big deal.

In most of the real case data won't grow more than some MB per minute. And 1MB of "data" per minute is a lot even if you store logs or things like this (I exclude the cases where you store images or binary files).

And for those how has data which grows like this, they probably take care of setting up database correctly :-)

I think the best thing to do is to set by default a small grow size and let user configure properly the database if he has special needs (server app which append a lot or millions of users adding content).

I'm maybe wrong...

HouzuoGuo · 2013-12-10T06:36:10Z

I think dynamically determine collection growth is a very good idea.

How about:

Dynamically determine collection growth
Make hash table size configurable (user has choice of small/large)

HouzuoGuo · 2013-12-29T01:48:05Z

See my comment in #23

What do you think?

HouzuoGuo · 2014-01-18T14:34:49Z

Fixed in nextgen - number of collection partitions is now configurable. A collection of one partition will only use 32MB disk storage in the beginning.

agozie · 2015-04-26T20:13:07Z

I have 11 collections, 256Mb for each with no data at all! Some collections will only hold a few entries. This is a major blow to my project.
Any Thoughts?

HouzuoGuo · 2015-04-27T05:59:54Z

@agozie sorry - the size of collection files depends on the number of CPUs on the system:
https://github.com/HouzuoGuo/tiedot/blob/master/db/db.go#L50

I intended for the initial size of collection to depend on GOMAXPROCS, so the line must have been a mistake.

The collection file size could be reduced to 64MB by replacing runtime.NumCPU()) into 1.

agozie · 2015-04-27T20:48:53Z

How will reducing runtime.NumCPU() to one affect performance? Thanks alot.

HouzuoGuo · 2015-04-28T05:01:12Z

Reduce it to one should retain approx.30% in your scenario. Run benchmark ./tiedot -mode=bench to be sure.

HouzuoGuo mentioned this issue Dec 29, 2013

Investigate IO queuing and its benefits #23

Closed

HouzuoGuo closed this as completed Jan 18, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why 130MB Initial Bucket on windows ? #39

Why 130MB Initial Bucket on windows ? #39

olekukonko commented Dec 8, 2013

HouzuoGuo commented Dec 9, 2013

alexandrestein commented Dec 9, 2013

HouzuoGuo commented Dec 9, 2013

olekukonko commented Dec 9, 2013

alexandrestein commented Dec 9, 2013

olekukonko commented Dec 9, 2013

HouzuoGuo commented Dec 9, 2013

olekukonko commented Dec 9, 2013

HouzuoGuo commented Dec 9, 2013

HouzuoGuo commented Dec 9, 2013

alexandrestein commented Dec 9, 2013

HouzuoGuo commented Dec 10, 2013

HouzuoGuo commented Dec 29, 2013

HouzuoGuo commented Jan 18, 2014

agozie commented Apr 26, 2015

HouzuoGuo commented Apr 27, 2015

agozie commented Apr 27, 2015

HouzuoGuo commented Apr 28, 2015

Why 130MB Initial Bucket on windows ? #39

Why 130MB Initial Bucket on windows ? #39

Comments

olekukonko commented Dec 8, 2013

HouzuoGuo commented Dec 9, 2013

alexandrestein commented Dec 9, 2013

HouzuoGuo commented Dec 9, 2013

olekukonko commented Dec 9, 2013

alexandrestein commented Dec 9, 2013

olekukonko commented Dec 9, 2013

HouzuoGuo commented Dec 9, 2013

olekukonko commented Dec 9, 2013

Example

HouzuoGuo commented Dec 9, 2013

HouzuoGuo commented Dec 9, 2013

alexandrestein commented Dec 9, 2013

HouzuoGuo commented Dec 10, 2013

HouzuoGuo commented Dec 29, 2013

HouzuoGuo commented Jan 18, 2014

agozie commented Apr 26, 2015

HouzuoGuo commented Apr 27, 2015

agozie commented Apr 27, 2015

HouzuoGuo commented Apr 28, 2015