Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streaming rows #32

Closed
hwchen opened this issue Apr 2, 2019 · 16 comments
Closed

streaming rows #32

hwchen opened this issue Apr 2, 2019 · 16 comments

Comments

@hwchen
Copy link
Contributor

hwchen commented Apr 2, 2019

Hello again!

I was wondering if you plan to expose an api for streaming rows. I see that you have Fold currently, I'm not sure that will work for my current purposes (streaming output rows through to the user). If it could work, or I missed something in the api, let me know!

@hwchen
Copy link
Contributor Author

hwchen commented Apr 2, 2019

Just want to note that I'm going to try implementing this myself in my own fork.

@hwchen
Copy link
Contributor Author

hwchen commented Apr 2, 2019

ah, it's a little difficult for me... so I guess I'll wait and see what your plans are first

@hwchen
Copy link
Contributor Author

hwchen commented Apr 2, 2019

ah, I got streaming blocks to compile, and will test tomorrow.

@suharev7
Copy link
Owner

suharev7 commented Apr 3, 2019

Hello

Right now I don't have any plans to expose an API for streaming rows. Current API allows streaming output but it can be inconvenient and little inefficient, so it will be great if you will be able to suggest a better solution.

@hwchen
Copy link
Contributor Author

hwchen commented Apr 3, 2019

What’s the recommended way to stream with the current api? I think I missed it, it wasn’t clear to me that fold would work.

@hwchen
Copy link
Contributor Author

hwchen commented Apr 3, 2019

Below are links to the api I hacked in (I hope I didn't duplicate work...), as well as a poc to show how it's used.

https://github.com/hwchen/clickhouse-stream
https://github.com/hwchen/clickhouse-rs/tree/feature/stream-blocks

Let me know what you think about this.

@suharev7
Copy link
Owner

suharev7 commented Apr 4, 2019

About streaming with the current API:
The function which you pass as the second argument of fold can return Future that allows, in theory, sent temporary result through IO, but in practice, some web frameworks may require Stream to streaming output. To solve it you can spawn task which should fetch database results and pass rows through a channel to Stream, but I do not recommend this way, it's just way which allows current API.

About your API:
It's a great job, but I'm wondering why stream_blocks method return BoxStream<Result<Block, Error>> not just BoxStream<Block>?

And about your comment at query_result/mod.rs:157:
According to the signature of map_err method of Stream, this code can be executed multiple times, and that's why you had to clone release_pool again.

@hwchen
Copy link
Contributor Author

hwchen commented Apr 4, 2019

Thanks for looking over my code! Let me know if you'd be ok having an api like this in the driver, I'm happy to work on it more.

For BoxStream<Result<Block, Error>>, I'm not sure how to handle errors like "table does not exist" without the Result.

One downside of my current stream api is that it appears to not end the stream if there's an error like "table does not exist". But I'm not sure how to properly propagate this in the stream before it reaches the user. I've updated my proof of concept clickhouse-stream to properly map the Result in the for_each so that it will exit on a DB exception.

@suharev7
Copy link
Owner

suharev7 commented Apr 4, 2019

Can you send a pull request with your changes as is? I'm going to make a few small modifications to your code. It will save me a lot of time.

@hwchen
Copy link
Contributor Author

hwchen commented Apr 4, 2019

pull request made, thanks!

@hwchen
Copy link
Contributor Author

hwchen commented Apr 5, 2019

One additional thought on api: This clickhouse js driver pushes responsibility for dealing with each type of stream even (Block, Error, End, Metadata) onto the user.

Don't know if this makes sense or is consistent with how you want this driver to work, but wanted to put it out there.

https://github.com/apla/node-clickhouse

suharev7 added a commit that referenced this issue Apr 6, 2019
suharev7 added a commit that referenced this issue Apr 6, 2019
@suharev7
Copy link
Owner

suharev7 commented Apr 7, 2019

I've completed and published new stream API
and about pushing responsibility to users I suppose it will make them write a lot of tedious and error-prone code (code which solve this issue can illustrate it), and also it complicates the next developing.

@hwchen
Copy link
Contributor Author

hwchen commented Apr 8, 2019

Thanks! It was very informative to see how you handled the API, I learned something.

@hwchen hwchen closed this as completed Apr 8, 2019
@hwchen
Copy link
Contributor Author

hwchen commented Apr 9, 2019

One quick question: I don't know if you know this, but sometimes clickhouse will stream multiple blocks, but sometimes it gathers the entire output into one block, even if it's very large.

It doesn't seem entirely consistent to me, so I was wondering if you knew when clickhouse would stream and when it would gather.

@hwchen hwchen reopened this Apr 9, 2019
@suharev7
Copy link
Owner

suharev7 commented Apr 9, 2019

I think it's better to ask it here or here.

@hwchen
Copy link
Contributor Author

hwchen commented Apr 9, 2019

makes sense, thanks!

@hwchen hwchen closed this as completed Apr 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants