Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data.Conduit.Resumable: resumable sources, sinks, and conduits #92

Closed
wants to merge 3 commits into from

Conversation

joeyadams
Copy link
Contributor

This pull request adds a new module, Data.Conduit.Resumable. The conduit package already has ResumableSource; this adds ResumableSink and ResumableConduit, with similar interfaces. ResumableSink is heavily inspired by the [conduit-resumablesink package] by Andrew Miller (@A1kmm).

ResumableSink and ResumableConduit both have tests, though they might not be 100% comprehensive. I marked the Data.Conduit.Resumable module "experimental" in case we find a better interface or discover semantic issues with the existing interface.

I did a couple "clumsy" things in the test code:

  • Import Data.Conduit, Data.IORef, etc. unqualified.
  • Use partial do patterns instead of shouldBe.

I know this is inconsistent with the existing code, but I left it this way for a philosophical reason: writing tests should first and foremost be easy. It's better to have some tests that produce ugly failure output than it is to not have those tests at all. If a test fails later and we want to know what the failure output was, we can replace partial patterns with shouldBe as part of the troubleshooting process.

Move ResumableSource implementation to Data.Conduit.Resumable,
but re-export it from Data.Conduit.

ResumableSink is inspired by the conduit-resumablesink package by
Andrew Miller.  Changes from the original:

 * Return completion as Right instead of Left.  Right is "after" left,
   and completion is "after" resumption.

   On the other hand, a resumable sink usually doesn't complete,
   and Left is normally used for error values.

 * Rename -++$$ to -+$$, for consistency with $$+-

 * Remove connectResumeSink and closeResumableSink, favoring the
   infix alternatives.

   However, keep newResumableSink, and add newResumableSource and
   newResumableConduit.  I was originally reluctant to add these, as
   they might lead to less consistency in application code (you can
   also say `src $$+ return ()`, which seems just as easy).  However,
   creating an initial ResumableSink this way is particularly
   inconvenient because of the 'Either' return value in '+$$'.

 * Use type parameter order "i m r", for consistency with Sink.
   ConduitM puts the @m@ parameter on the right (to support
   MonadTrans), while the original ResumableSink had @m@ on the left.

 * Don't export the data constructor of ResumableSink.  If conduit
   changes finalization semantics in the future, the definition
   may change.
Test resumable source finalization, too.

The ResumableSink tests come from the conduit-resumablesink package by
Andrew Miller, with minor modifications.

I did a couple "clumsy" things in my tests:

 * Import Data.Conduit, Data.IORef, etc. unqualified.

 * Use partial do patterns instead of shouldBe.

I know this is inconsistent with the existing code, but I left it this way
for a philosophical reason: writing tests should first and foremost
be easy.  It's better to have some tests that produce ugly failure output
than it is to not have those tests at all.  If a test fails later and we
want to know what the failure output was, we can replace partial patterns
with `shouldBe` as part of the troubleshooting process.
@snoyberg
Copy link
Owner

I'm still not fully comfortable about adding resumable sinks and conduits to conduit itself. I feel confident that the use case addressed by resumable sources cannot be addressed by another means, but I don't have a deep enough grasp of resumable conduit/sink to say the same thing yet. For now, I think it would make the most sense to let this module live in a separate package, and in the future we can consider folding it back into conduit itself.

@joeyadams
Copy link
Contributor Author

Fair enough. I'll put together a conduit-resumable package.

However, I still want to add a feature to ResumableSource:

-- | Finalize a 'ResumableSource' by turning it into a 'Source'.
-- The resulting 'Source' may only be used once.
--
-- This may be used instead of '$$+-' to finalize a 'ResumableSource'.
finishResumableSource :: Monad m => ResumableSource m o -> Source m o

This can be useful when you need to supply ResumableSource data to a function requiring a plain Source. I don't think you can do this with the external API as-is, short of an awkward await/yield loop. The finalizer returned by unwrapResumable is invalid once the source produces another output.

finishResumableSource works by making the Source call the finalizer when it finishes, unless it overrides that finalizer with HaveOutput.

@snoyberg
Copy link
Owner

But finishResumableSource has a problem. Consider:

(rsrc, _) <- sourceFile "foo" $$+ await
src <- finishResumableSource rsrc
src $$ return ()

In this case, the finalizer will not be called. Another example would be:

(rsrc, _) <- sourceFile "foo" $$+ await
src <- finishResumableSource rsrc
(yield "foo" >> src) $$ await

@joeyadams
Copy link
Contributor Author

Darn, you're right. The finalizer would only run if the source is prompted for output. Thanks for pointing this out.

It would be nice if the finalizer were built into Pipe or ConduitM, so we could just have:

($$+) :: Monad m => Source m a -> Sink a m b -> m (Source m a, b)

This would reduce API complexity. For example, the http function in http-conduit could return a plain Source instead of a ResumableSource (meaning no need to explain ResumableSource vs Source). It would also enable the intuition that any pipe (Source, Sink, or Conduit) is a state machine that can be stepped or closed. On the other hand, it might constrain future optimizations.

This problem shouldn't hurt my conduit-resumable package too badly, but it will make it harder to glue resumables together.

@snoyberg
Copy link
Owner

If ConduitM had that functionality, there'd be no point for resumables. I've actually considered adding something like that, but (IIRC) it breaks the monad instance and kills performance.

@joeyadams
Copy link
Contributor Author

Thanks for all the insight. I pushed the conduit-resumable package to GitHub, but not to Hackage yet.

@snoyberg
Copy link
Owner

snoyberg commented Sep 8, 2013

Another possible location for this is the new conduit-extra package. @ppetr already created a Resumable module over there: https://github.com/snoyberg/conduit/blob/master/conduit-extra/Data/Conduit/Extra/Resumable.hs

@snoyberg snoyberg closed this Sep 8, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants