Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using overlapped IO for batch insert and query #16

Closed
NicolasDorier opened this issue Nov 7, 2016 · 6 comments
Closed

Using overlapped IO for batch insert and query #16

NicolasDorier opened this issue Nov 7, 2016 · 6 comments

Comments

@NicolasDorier
Copy link

NicolasDorier commented Nov 7, 2016

At least for queries, there is no way to fetch data in batch efficiently by using overlapped IO.
This is needed for several reasons for doing that:

  1. When the DB is on external hard drive, there is some latency. Querying in parallel would divide this latency.
  2. Some hard drive support parallel read. (ie multiple head hard drives, and SSD)
  3. A partition in RAID 0 supports parallel read (stripping)
@hhblaze
Copy link
Owner

hhblaze commented Nov 7, 2016

Open in different threads different transactions and query the same table. It works out of the box

@NicolasDorier
Copy link
Author

NicolasDorier commented Nov 8, 2016

This is not exactly the same as overlapped IO. You can do several overlapped IO without having to spawn more threads. Also I noticed big perf hit if I make one transaction per query.

@hhblaze
Copy link
Owner

hhblaze commented Nov 8, 2016

Of course, you create several transactions to execute in parallel fetching batches, not one select.
Try to write an example for me proving/showing me your fetching scenario.
Because, if you have logical batches to query, already now, you can make it in parallel.
If you don't have logical batches to split between fetching threads, you will always encounter waiting for the result, before starting new fetch.

@NicolasDorier
Copy link
Author

Thanks, I'll try that. I receive one big batch to fetch, so indeed I can split it in smaller batch to several worker threads.

Might be good enough for my case, just that I think it is even more efficient to use BeginRead/BeginEnd on the file system, as it would not require several threads.

I let you know how it goes, thanks.

@NicolasDorier
Copy link
Author

An example:

Here https://github.com/hhblaze/DBreeze/blob/master/DBreeze/Storage/FSR.cs#L833 you are looping and doing many write sequentially.
Instead you can BeginWrite them all, and wait for them all afterward.

I will try to do that and benchmark a bit, I have some crappy hard drive which make measuring that easy.

@NicolasDorier
Copy link
Author

Forget what I said, you can't call several BeginRead without reading results sequentially. https://msdn.microsoft.com/en-us/library/zxt5ahzw(v=vs.110).aspx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants