Using overlapped IO for batch insert and query #16

NicolasDorier · 2016-11-07T17:27:14Z

At least for queries, there is no way to fetch data in batch efficiently by using overlapped IO.
This is needed for several reasons for doing that:

When the DB is on external hard drive, there is some latency. Querying in parallel would divide this latency.
Some hard drive support parallel read. (ie multiple head hard drives, and SSD)
A partition in RAID 0 supports parallel read (stripping)

hhblaze · 2016-11-07T20:55:13Z

Open in different threads different transactions and query the same table. It works out of the box

NicolasDorier · 2016-11-08T13:21:10Z

This is not exactly the same as overlapped IO. You can do several overlapped IO without having to spawn more threads. Also I noticed big perf hit if I make one transaction per query.

hhblaze · 2016-11-08T13:41:47Z

Of course, you create several transactions to execute in parallel fetching batches, not one select.
Try to write an example for me proving/showing me your fetching scenario.
Because, if you have logical batches to query, already now, you can make it in parallel.
If you don't have logical batches to split between fetching threads, you will always encounter waiting for the result, before starting new fetch.

NicolasDorier · 2016-11-08T14:29:08Z

Thanks, I'll try that. I receive one big batch to fetch, so indeed I can split it in smaller batch to several worker threads.

Might be good enough for my case, just that I think it is even more efficient to use BeginRead/BeginEnd on the file system, as it would not require several threads.

I let you know how it goes, thanks.

NicolasDorier · 2016-11-08T14:41:24Z

An example:

Here https://github.com/hhblaze/DBreeze/blob/master/DBreeze/Storage/FSR.cs#L833 you are looping and doing many write sequentially.
Instead you can BeginWrite them all, and wait for them all afterward.

I will try to do that and benchmark a bit, I have some crappy hard drive which make measuring that easy.

NicolasDorier · 2016-11-08T14:48:49Z

Forget what I said, you can't call several BeginRead without reading results sequentially. https://msdn.microsoft.com/en-us/library/zxt5ahzw(v=vs.110).aspx

NicolasDorier closed this as completed Nov 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using overlapped IO for batch insert and query #16

Using overlapped IO for batch insert and query #16

NicolasDorier commented Nov 7, 2016 •

edited

Loading

hhblaze commented Nov 7, 2016

NicolasDorier commented Nov 8, 2016 •

edited

Loading

hhblaze commented Nov 8, 2016

NicolasDorier commented Nov 8, 2016

NicolasDorier commented Nov 8, 2016

NicolasDorier commented Nov 8, 2016

Using overlapped IO for batch insert and query #16

Using overlapped IO for batch insert and query #16

Comments

NicolasDorier commented Nov 7, 2016 • edited Loading

hhblaze commented Nov 7, 2016

NicolasDorier commented Nov 8, 2016 • edited Loading

hhblaze commented Nov 8, 2016

NicolasDorier commented Nov 8, 2016

NicolasDorier commented Nov 8, 2016

NicolasDorier commented Nov 8, 2016

NicolasDorier commented Nov 7, 2016 •

edited

Loading

NicolasDorier commented Nov 8, 2016 •

edited

Loading