Alternative formulation of Stream #72

safareli · 2017-07-12T09:12:30Z

Here is what kind of Stream I came to independently, when I was playing with purescript-parsing:

class Stream f c | f -> c where
  uncons :: f -> Maybe { head :: c, tail :: f, updatePos :: Position -> Position }
  stripPrefix :: Prefix f -> f -> Maybe {  rest :: f, updatePos :: Position -> Position }

class HasUpdatePosition a where
  updatePos :: Position -> a -> Position

newtype Prefix a = Prefix a


-- example implementation for list
instance (Eq a, HasUpdatePosition a) => Stream (List.List a) a where
  uncons f = L.uncons f <#> \({ head, tail}) ->
    { head, tail, updatePos: (_ `updatePos` head)}
  stripPrefix (Prefix p) s = List.stripPrefix (List.Pattern p) s <#> \rest ->
    { rest, updatePos: unwrap (fold (p <#> (flip updatePos >>> Endo)))}

purescript-contrib/purescript-parsing#62

I think this formulation is nicer as you don't need to carry updatePosition function around, and I don't see why you need the m in uncons too.

Would like your thoughts on this formulation.

The text was updated successfully, but these errors were encountered:

ekmett · 2017-07-18T14:24:43Z

Folks use the m in uncons to allow "streams" that aren't fully in memory. E.g. Streaming from disk or pipes.

safareli · 2017-07-30T18:37:31Z

Thanks @ekmett I have added the m to PureScript PR .

What you think on moving Position -> Position into Stream class?

safareli · 2017-07-30T18:58:57Z

Also in mtl we have class Monad m => MonadState s m | m -> s where should the Stream also have m -> s dep?

ekmett · 2017-08-01T03:45:51Z

Lots of people use different states that happen to run in the IO monad.

ekmett · 2017-08-01T03:51:33Z

I'm not sure about the Position -> Position thing. To me it seems to entangle a couple of concerns, and complicates uncons. It seems you'd really want

Position -> f -> Maybe { head :: c, tail :: f, newPos :: Position }

which would allow for different parsing based on position, and avoids capturing a very expensive closure. An example where that might matter is if your lexer prepass did something like c preprocessing where it might need to know if it is at the start of file to know how to properly handle finding # directives. Regardless, building a function there means that this will never fully unbox into something that just runs on the stack no matter what instance you work with unlike the existing solution.

You're always paying full price.

safareli · 2017-12-03T01:22:04Z

This is what i came to:

class Stream s m t | s -> t where
  uncons :: ParserState s -> m (Maybe (Tuple t (ParserState s)))
  stripPrefix :: Prefix s -> ParserState s -> m (Maybe (ParserState s))

type ParserState s = { input :: s, pos :: Position }

ekmett · 2017-12-03T20:23:15Z

Poking at this from the other side:

First uncons:

You've now converged almost back to the existing design. The major difference is that uncons doesn't get fed the parsec SourcePos, but rather just whatever s you ask for. Why? When it is a string, then the 's' can just be a tail of the string. When it is a file, you need something like a number of bytes consumed so far for seeking or a file handle and a pushback buffer. In neither case do you use the source position and we don't even look at the Char returned at present.

Here you're bolting in machinery to update position, but 'uncons' doesn't need to inspect the char it gives back, so there is no "double dipping" on work, there. Only the "update position" combinator passed to tokenPrim actually looks at the Char or whatever is given back by your stream.

So the common use-cases don't use the extra information you are supplying here to access the stream.

This consciously leaves open the notion of what is a 'column' to tokenPrim to fill in, enabling you to work with a more traditional notion of tabstops out of the box with the char combinators, etc. or plug in your own, so you can support, say, modelines, or choose to have every character (even tabs) be one 'column' wide, or using the number of utf-16 code units seen so far in the line so you can write a language server protocol implementation, etc.

At present Stream factors out the concern of updating position from getting data. In the scenario you propose wed couple them more tightly forcing users to use newtypes to split these cases back apart.

It feels to me that this part of your design couples the stream to the notion of columns in a really tight way that then require more code duplication (copying the update position function into stream types that previously lived agnostic of parsec's rather awful SourcePos type) and newtypes to tease apart, but doesn't actually pay out in a win in terms of avoiding duplicating any work at runtime.

Now stripPrefix:

Adding something like stripPrefix could be quite useful for performance. The major objections to overcome there I'd see is that currently Parsec doesn't make any use of type families and the aforementioned issues with entangling the update of position with the internals of the stream types in the API you provide.

Here things become more nebulous as at this point you are looking at the characters given back and so my argument about how uncons need not concern itself with what is fetched falls apart a bit. Interestingly there is a second option when evaluating stripPrefix-like combinator, which is to have whatever calls it take an argument like Prefix s -> SourcePos -> SourcePos so that some calculations can be shared across calls rather than having it come back as part of the state update. This way it can be lifted out by the compiler and computed once for a given prefix you might be trying to strip, e.g. a given keyword "foo" will always be 3 columns wide. For sharing, many practical implementations could represent the SourcePos -> SourcePos using something like a fairly simple

data Delta = Delta !Int !Int
instance Monoid Delta where
  mempty = Delta 0 0
  mappend (Delta a b) (Delta 0 d) = Delta a (b + d)
  mappend (Delta a _) (Delta c d) = Delta (a + c) d

Then convert Prefix s -> Delta to get a cacheable result, then apply the delta with Delta -> SourcePos -> SourcePos to get the Prefix s -> SourcePos -> SourcePos while carefully caching that intermediate delta in the environment.

With that in mind, it seems even stripPrefix benefits from going to something like the current approach and splitting position updates from reading from the stream.

safareli · 2017-12-04T12:28:28Z

Thanks for detailed response! I would need to think on this, meanwhile, where can I find definition of Delta?

ekmett · 2017-12-07T20:56:38Z

I wrote it inline above as an example. It isn't used by parsec.

You can read Delta x y as 'move down x lines, and if x is > 1 the column is absolute, otherwise its relative.

haskell#107) Fixes haskell#72.

safareli mentioned this issue Jul 12, 2017

Generalize StringLike to StreamLike fix #58 purescript-contrib/purescript-parsing#62

Closed

int-index pushed a commit to int-index/parsec that referenced this issue Sep 18, 2020

Fix STX appearing in any lines of the prompt, not just the last one. (

788a4f4

haskell#107) Fixes haskell#72.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative formulation of Stream #72

Alternative formulation of Stream #72

safareli commented Jul 12, 2017 •

edited

Loading

ekmett commented Jul 18, 2017

safareli commented Jul 30, 2017

safareli commented Jul 30, 2017

ekmett commented Aug 1, 2017

ekmett commented Aug 1, 2017

safareli commented Dec 3, 2017

ekmett commented Dec 3, 2017 •

edited

Loading

safareli commented Dec 4, 2017

ekmett commented Dec 7, 2017

Alternative formulation of Stream #72

Alternative formulation of Stream #72

Comments

safareli commented Jul 12, 2017 • edited Loading

ekmett commented Jul 18, 2017

safareli commented Jul 30, 2017

safareli commented Jul 30, 2017

ekmett commented Aug 1, 2017

ekmett commented Aug 1, 2017

safareli commented Dec 3, 2017

ekmett commented Dec 3, 2017 • edited Loading

safareli commented Dec 4, 2017

ekmett commented Dec 7, 2017

safareli commented Jul 12, 2017 •

edited

Loading

ekmett commented Dec 3, 2017 •

edited

Loading