Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to parse delimited messages #61

Closed
LeanderK opened this issue Dec 15, 2016 · 9 comments
Closed

How to parse delimited messages #61

LeanderK opened this issue Dec 15, 2016 · 9 comments

Comments

@LeanderK
Copy link

LeanderK commented Dec 15, 2016

I got a socket connection that is used to stream data with delimited protobuf messages (see this). In the java api there is an convenient method called parseDelimitedFrom. How can this be achieved using this library?
I am new to haskell, so i might have overlooked something. I am sorry if this is very obvious.

@LeanderK LeanderK changed the title Any method to parse delimited messages? How to parse delimited messages Dec 15, 2016
@judah
Copy link
Collaborator

judah commented Dec 15, 2016

Currently there's no way to do this automatically. I'm assuming your stream is exposed as a Haskell System.IO.Handle? You could implement it manually with the following steps:

  • Read a varint from the steam
  • Read a ByteString of that length from the stream (e.g., with Data.ByteString.hGet)
  • Call Data.ProtoLens.decodeMessage on the returned ByteString.

I don't know if there's any standalone code for reading a varint (there probably ought to be). You could try copying proto-lens's internal routine, for example adapting its inner loop to read bytes from a Handle instead of calling attoparsec's anyWord parser:
https://github.com/google/proto-lens/blob/master/proto-lens/src/Data/ProtoLens/Encoding/Bytes.hs#L38

@LeanderK
Copy link
Author

LeanderK commented Dec 15, 2016

ah ok, so there is not some obvious function i somehow missed. I hoped this would be an easy, straightforward project to do some practical haskell stuff ;) It still looks doable, i just might try solving it. If i get it to work and am confident in the code, would an PR be welcome, or is this too much of an edge case? The PR would probably need some serious reviewing, since i am not yet well versed in best practices/code style etc.

@judah
Copy link
Collaborator

judah commented Dec 16, 2016

Hm...too be honest, such a function might be better as a separate package, at least until proto-lens has a better story around streaming in general. For example, we could provide

parseMessage :: Message a => Parser a
parseDelimitedMessage :: Message a => Parser a

and you could use parseDelimitedMessage along with a library like attoparsec-conduit to process a stream of inputs. But it's not clear that we'll stick with attoparsec in the long term (#62), so I'm hesitant to expose such an API from the core of proto-lens at this time.

@LeanderK
Copy link
Author

ok, that makes sense! thank you.

@judah
Copy link
Collaborator

judah commented Dec 18, 2018

This was done by #268.

@judah judah closed this as completed Dec 18, 2018
@LeanderK
Copy link
Author

well, this was a long time ago! Thanks!

@ulysses4ever
Copy link
Contributor

ulysses4ever commented Jun 8, 2019

@judah I think, this is not quite done yet. The new parseMessageDelimited can help to get a message (via runParser) only if you already have a ByteString. In contrast, #61 concerns a Handle (or a socket, I guess): how do I know exactly how many bytes to take from my Handle to build the bytestring? I still seem to have to parse a varint from my Handle manually, right? Something like a loop reading one byte at a time and checking the MSB. Only after turning this varint into an Int can I read the exact amount of bytes needed to get the correct ByteString, which, in turn, I can feed to runParser.

I'm facing the problem right now and curious if you agree that's an issue. And if so, what would be the plan to tackle it.

@judah
Copy link
Collaborator

judah commented Jun 8, 2019

Hm, I think you're right @ulysses4ever. Previously we were using a streaming parser based on attoparsec, which would have made this possible. But in proto-lens-0.5, we switched to a strict parser that needs a strict ByteString. Unfortunately, I don't think it's feasible at this time to make Parser non-strict without significantly affecting its performance.

I think it's reasonable to add more functionality to proto-lens to handle this use case. One idea I can think of is an API directly using Handles, which would copy the implementation of Data.ProtoLens.Encoding.Bytes.getVarInt:

decodeDelimitedMessageH :: Message m => Handle -> IO (Either String m)

(I'm not wild about duplicating the code, but don't see another obvious way forward; maybe having some tests to verify the new code would be sufficient.)

Would you be interested in sending a PR?

@judah judah reopened this Jun 8, 2019
@ulysses4ever
Copy link
Contributor

@judah thanks for your reply! Yes, I'm interested in making a PR, given that I've already mostly implemented it in my project. I am gonna test it first, and then make a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants