Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archictectural change: Make all writers useable without IO #2930

Closed
jgm opened this issue May 19, 2016 · 79 comments
Closed

Archictectural change: Make all writers useable without IO #2930

jgm opened this issue May 19, 2016 · 79 comments

Comments

@jgm
Copy link
Owner

jgm commented May 19, 2016

Currently a number of writers are impure and do IO: docx, odt, epub, epub3, fb2, icml, rtf.

It would be nice to use a free monad instead, so that these writers could be used either outside of IO contexts. First step would be to catalog the places where IO is really used in these writers.

@mb21
Copy link
Collaborator

mb21 commented May 29, 2016

I've only quickly skimmed through some posts about free monads. But if I understand correctly, you're proposing to abstract the writers: instead of doing the document conversion, they would only generate a plan for a document conversion. This plan (of type Free Pandoc r) could then be executed by either of two functions:

runIO   :: Free Pandoc r -> IO r
runPure :: Free Pandoc r -> r

where r is a tuple of warnings/errors and the output (either String, ByteString, or hopefully in the future Text).

The ICML writer for example, only needs IO when processing images. So if you know that your document doesn't contain images (or wish to ignore their dimensions), you could run runPure to actually do the conversion.

@jgm
Copy link
Owner Author

jgm commented May 31, 2016

Something like that, yes. (An alternative would be to use a
typeclass that can be instantiated by various monads.)

I would separate out the two issues:

  1. allowing different underlying textual types (Text,
    String, ByteString)
  2. allowing all writers to be used outside of IO

The main point here is (2). There might be a reason to do (1) also,
but it's a separate issue.

In addition to runIO and runPure, there could be a way of
running a writer where you specify the contents of images
as arguments, so they don't need to be read in IO. This
could be handy in some cases.

+++ Mauro Bieg [May 29 16 05:19 ]:

I've only quickly skimmed through some posts about free monads. But if
I understand correctly, you're proposing to abstract the writers:
instead of doing the document conversion, they would only generate a
plan for a document conversion. This plan (of type Free Pandoc r) could
then be executed by either of two functions:
runIO :: Free Pandoc r -> IO r
runPure :: Free Pandoc r -> r

where r is either String, ByteString, or hopefully in the future Text.

The ICML writer for example, only needs IO when processing images. So
if you know that your document doesn't contain images (or wish to
ignore their dimensions), you could run runPure to actually do the
conversion.


You are receiving this because you authored the thread.
Reply to this email directly, [1]view it on GitHub, or [2]mute the
thread.

References

  1. Archictectural change: Make all writers useable without IO #2930 (comment)
  2. https://github.com/notifications/unsubscribe/AAAL5EBPVs9J8QfAybYZMjDiE7wkW1cBks5qGYQ6gaJpZM4Iiem3

@jkr
Copy link
Collaborator

jkr commented Jun 16, 2016

The docx writer really only seems to need it for reading the default reference-docx, and for producing some random numbers for nsids. Presumably the latter could be done in some deterministic form.

Any good references on free monads that you'd recommend? Or instructive real-world usages? I tried to wrap my head around them a while back, and felt like I wasn't asking the proper questions to be able to understand the solutions.

@jkr
Copy link
Collaborator

jkr commented Sep 21, 2016

@jgm: as a way of working through some Free monad discussions online, I reimplemented the Docx writer using free. It was fairly painless. I've yet to see if I can gain any speed using improve here, but there doesn't seem to be much performance penalty (looks like about a 0.1s increase on a 120,000-word manuscript, md->docx). Note that I have liftF sprinkled around instead of defining new functions, but that's easy enough to address later.

You can find it here: https://github.com/jkr/pandoc/tree/free

If this is still something you're interested in pursuing, I could try doing this for the other IO writers. We would probably also want a T.P.Free module, and move DocxActions and runDocxIO as a more general set of actions and a general runIO interpreter, respectively.

@jgm
Copy link
Owner Author

jgm commented Sep 23, 2016

I like the idea! I assume DocxAction would be replaced by
something like PandocAction that could be used in all the
relevant writers? That would allow us to use uniform test
harnesses, etc.

+++ Jesse Rosenthal [Sep 21 16 10:28 ]:

[1]@jgm: as a way of working through some Free monad discussions
online, I reimplemented the Docx writer using free. It was fairly
painless. I've yet to see if I can gain any speed using improve here,
but there doesn't seem to be much performance penalty (looks like about
a 0.1s increase on a 120,000-word manuscript, md->docx). Note that I
have liftF sprinkled around instead of defining new functions, but
that's easy enough to address later.

You can find it here: [2]https://github.com/jkr/pandoc/tree/free

If this is still something you're interested in pursuing, I could try
doing this for the other IO writers. We would probably also want a
T.P.Free module, and move DocxActions and runDocxIO as a more general
set of actions and a general runIO interpreter, respectively.


You are receiving this because you were mentioned.
Reply to this email directly, [3]view it on GitHub, or [4]mute the
thread.

References

  1. https://github.com/jgm
  2. https://github.com/jkr/pandoc/tree/free
  3. Archictectural change: Make all writers useable without IO #2930 (comment)
  4. https://github.com/notifications/unsubscribe-auth/AAAL5HYYzoEf3rnaredflyreKytfxDpNks5qsWlTgaJpZM4Iiem3

@jkr
Copy link
Collaborator

jkr commented Sep 24, 2016

Yep -- I moved it into a PandocAction Monad in Text.Pandoc.Free. Right now, I have pure versions of the EPUB, Docx, ODT, and ICML readers. There's still FB2 and RTF, but I think it's in fairly usable shape right now. I currently export a pure and IO version, but the only difference is that the runIO function from Text.Pandoc.Free is run to get the IO one (we could eventually just run this in the binary, if we wanted, and only export the pure version).

The repo is here: https://github.com/jkr/pandoc/tree/free
and the comparison view is here: master...jkr:free

One oddness to take note of: in order to have generic IORef functions, I have to add a type parameter to the functor (PandocActionF) which produces the free monad, so the monad has a parameter (PandocAction a) To make this a bit less ugly in practice, I add a type to each of the writers (type EPUBAction = PandocAction [(FilePath, (FilePath, Maybe Entry))] or type DocxAction = PandocAction ()). This all produces a limitation, though -- if we want to use IORefs, they all have to have the same type, at least in a function. This isn't a problem now, but it's worth being aware of.

I'll look around to see if anyone deals with it in any useful way.

@jgm
Copy link
Owner Author

jgm commented Nov 16, 2016

@jkr what's the status of these free monad experiments?

@jkr
Copy link
Collaborator

jkr commented Nov 16, 2016

I got through making pure versions of the Docx, EPUB, ICML, and ODT writers, along with a runIO function to run them. Where I hit a snag was in trying to write a runTest function that would work with the IORefs in the ODT reader. There's no doubt a way to do it with unsafePerformIO or STRefs, but I haven't been able to follow it up.

The branch is here: https://github.com/jkr/pandoc/commits/free

If you remove the HEAD of that branch, you'll have working writers with a runIO. I have to rebase them given changes to those writers in the interim, of course, but that shouldn't be more than an hour's work or so.

@jkr
Copy link
Collaborator

jkr commented Nov 16, 2016

Okay, I rebased the free branch on master. I also moved the test runner that I couldn't quite figure out to another branch to fiddle with. So if you pull from there you should have functioning pure writers.

@jkr
Copy link
Collaborator

jkr commented Nov 17, 2016

I've dived back into this. In both cases (odt and epub) it looks like the only modification of IORefs is straightforward list cons-ing. So if our goal is to make pure writers, it seems like the best thing to do would just be to replace this with a plain State monad. We could also do ST, since we don't seem to use anything necessarily IO-ish, but I'll try State first and check performance.

@jgm
Copy link
Owner Author

jgm commented Nov 17, 2016

Sounds good.

+++ Jesse Rosenthal [Nov 17 16 04:12 ]:

I've dived back into this. In both cases (odt and epub) it looks like
the only modification of IORefs is straightforward list cons-ing. So if
our goal is to make pure writers, it seems like the best thing to do
would just be to replace this with a plain State monad. We could also
do ST, since we don't seem to use anything necessarily IO-ish, but I'll
try State first and check performance.


You are receiving this because you were mentioned.
Reply to this email directly, [1]view it on GitHub, or [2]mute the
thread.

References

  1. Archictectural change: Make all writers useable without IO #2930 (comment)
  2. https://github.com/notifications/unsubscribe-auth/AAAL5IJV5jEGa07WIMgzWtmRyD8tS1EJks5q_ESXgaJpZM4Iiem3

@jkr
Copy link
Collaborator

jkr commented Nov 17, 2016

Any objection to me trying out glob instead of filemanip for font globbing for epubs? filemanip only has an io version.

@jgm
Copy link
Owner Author

jgm commented Nov 18, 2016

Fine with me.

+++ Jesse Rosenthal [Nov 17 16 06:32 ]:

Any objection to me trying out [1]glob instead of [2]filemanip for font
globbing for epubs? filemanip only has an io version.


You are receiving this because you were mentioned.
Reply to this email directly, [3]view it on GitHub, or [4]mute the
thread.

References

  1. https://hackage.haskell.org/package/Glob
  2. https://hackage.haskell.org/package/filemanip
  3. Archictectural change: Make all writers useable without IO #2930 (comment)
  4. https://github.com/notifications/unsubscribe-auth/AAAL5ObqsOA-5LJVdA6eu72XohNCfKseks5q_GWKgaJpZM4Iiem3

@jkr
Copy link
Collaborator

jkr commented Nov 18, 2016

@jgm: okay -- we now have a functioning set of pure writers, along with a runIO function and a fairly functional pure runTest function (using State and Reader monads). The pure writers are a teeny bit slower, maybe 3%-5% based on my unscientific observations. I haven't looked at memory usage.

Right now, all the writers export a write{Format} doing IO and write{Format}Pure outputting PandocAction. The IO one just runs runIO on the pure one. We could also just output the pure one and run runIO in Text.Pandoc.

There are still some improvements to be made. I'd like to make PandocAction an instance of MonadError so we can throw and catch in the writers -- but I haven't quite figured out a good way to do it. I'd also like to streamline the functions we output from Text.Pandoc.Free -- for example we currently have a PandocAction version of three diffrerent file reading functions (Strict BS, Lazy, and UTF8). But this might be best because the IO interpreter can just use the different versions and not have to convert. In any case, it might be nice to prune it down a bit.

Anyway, it's in a workable form now. You can take a look here:

https://github.com/jkr/pandoc/tree/free-with-tests

@jgm
Copy link
Owner Author

jgm commented Nov 19, 2016

+++ Jesse Rosenthal [Nov 18 16 14:16 ]:

[1]@jgm: okay -- we now have a functioning set of pure writers, along
with a runIO function and a fairly functional pure runTest function
(using State and Reader monads). The pure writers are a teeny bit
slower, maybe 3%-5% based on my unscientific observations. I haven't
looked at memory usage.

Great. That's not a big deal. If you want to look at
memory, I've included a 'weigh-pandoc' executable which
can be built by setting the 'weigh' flag. This would allow
you to make comparisons (though I suppose the weigh-pandoc
program itself would need to be revised for the API change).

Right now, all the readers export a write{Format} doing IO and
write{Format}Pure outputting PandocAction. The IO one just runs runIO
on the pure one. We could also just output the pure one and run runIO
in Text.Pandoc.

A bit confused here...are we talking about readers or
writers here, or both?

An application of this in the readers would be include files
in LaTeX (already implemented with a complete hack) and RST.

There are still some improvements to be made. I'd like to make
PandocAction an instance of MonadError so we can throw and catch in the
writers -- but I haven't quite figured out a good way to do it. I'd
also like to streamline the functions we output from Text.Pandoc.Free
-- for example we currently have a PandocAction version of three
diffrerent file reading functions (Strict BS, Lazy, and UTF8). But this
might be best because the IO interpreter can just use the different
versions and not have to convert. In any case, it might be nice to
prune it down a bit.

Agreed.

@jgm
Copy link
Owner Author

jgm commented Nov 19, 2016

Maybe a good thing to do would be to get a list together of which IO operations occur in which writers, and why. Looking at the list in Text.Pandoc.Free, I can't even remember why some of those things are there.

@jgm
Copy link
Owner Author

jgm commented Nov 19, 2016

Tempting to change the API across the board, so that ALL readers and writers uniformly go in the PandocActionF monad. So e.g.

readMarkdown :: ReaderOptions -> String -> PandocF Pandoc

writeHtmlString :: WriterOptions -> Pandoc -> PandocF String

main = runIO $ readMarkdown def "hello world" >>= writeHtmlString def

That would simplify the types a lot; we'd no longer need a distinction between IO writers and pure writers, for example. (If we really wanted to simplify things, I suppose we could have them all produce ByteString output; I don't know if that's a good idea, though.)

We could then implement things like \today or \include in LaTeX, and their equivalents in other formats, in a clean way, and get rid of the ugly handleIncludes hack.

@jgm
Copy link
Owner Author

jgm commented Nov 19, 2016

Rethinking the API a bit more radically:

main = runIO $ do
   setReaderOption readerSmart True
   setWriterOption writerColumns 50
   readFileUTF8 "myfile.txt" >=> readMarkdown >=> allCapsFilter >=> writeHtmlString
                                            >=> writeFileUTF8 "myfile.html"

@jkr
Copy link
Collaborator

jkr commented Nov 19, 2016

A bit confused here...are we talking about readers or writers here, or both?

Sorry -- I meant to say "writers" there. But as you've pointed out, it would be interesting to extend it to writers.

One problem that occurs to me -- we already have a number of readers and writers that are pure (so long as you're not doing a standalone). Say, docx -> markdown. So would necessitating runIO on everything remove that ability? Granted, mkStringReader and mkBSReader already make everything IO, but they don't need to, do they?

@jkr
Copy link
Collaborator

jkr commented Nov 19, 2016

Sorry -- I meant to say "writers" there. But as you've pointed out, it would be interesting to extend it to writers.

And, yes, I mean to say "extend it to readers." This might be a hardwired expressive difficulty for me.

@jgm
Copy link
Owner Author

jgm commented Nov 19, 2016

And with a runPure function, we could provide functions to set some of these things that would normally be gotten via IO (current time, date, contents of files).

main = runPure $ do
   setCurrentTime = UTCTime{...}
   writeFileUTF8 "myinclude.tex" -- this just puts a "file" in state
   readLaTeX "\\today\n\\include{myinclude.tex}" >=> writeHtmlString

@jkr
Copy link
Collaborator

jkr commented Nov 19, 2016

For pruning, I got rid of newUUID by introducing a pure RandomGen g => g -> UUID function in Text.Pandoc.UUID (since we already have a newStdGen function in Free).

next step: getPosixTime and getCurrentTime are obviosly redundant, since there are utcTimeToPOSIXSeconds and the inverse. Any preference on what should be our primitive? (In Data.Time.Clock, getCurrentTime is defined internally as posixSecondsToUTCTime <$> getPOSIXTime).

@jkr
Copy link
Collaborator

jkr commented Nov 19, 2016

I really like the idea of the separate runPure and runIO handlers.

@jkr
Copy link
Collaborator

jkr commented Nov 19, 2016

I'm going to see if I can move getDefaultReference{Docx,Odt} into the relevant writers as well, so they can use the T.P.Free functions instead of IO.

@jgm
Copy link
Owner Author

jgm commented Nov 21, 2016

+++ Jesse Rosenthal [Nov 21 16 08:30 ]:

For the reference.docx, did we think it should go into WriterOptions
anyway?
(Both the user's reference.docx and the "default" reference.docx,
which we need too.)

I think so -- only question is whether it should go in as a strict
ByteString or an Archive. My preference would be the BS, since it would
require the user to have one less library loaded.

Agreed.

@jkr
Copy link
Collaborator

jkr commented Nov 21, 2016

That's one nice thing about the Free Monad
approach -- we can use state behind the scenes in the
interpreter without it ever even being exposed to the
user. Here, we need to expose it.

I think we can do exactly the same thing with typeclasses, perhaps a bit more simply. First, just to make sure we're on the same page: in the Free version I have up right now, we have this:

runIO :: PandoAction nxt -> IO nxt

runTest :: PandocAction nxt -> Testing nxt

where Testing is a ReaderT State monad. Are you describing the following:

runIO :: PandoAction nxt -> IO nxt

runTest' :: PandocAction nxt -> Testing nxt

-- free monad version

runPure :: PandocAction nxt -> nxt
runPure x = runState (runReaderT (runTest' x) def) def

and then the user could do:

runPure $ do
  setTime midnight
  setReferenceDocx someByteString
  writeDocx opts docx

This, I think, is the best we could do as far as hiding state and still letting the user set state.

But, like I said, I think the typeclass approach lets us do basically the same thing. We don't have runIO anymore, because that's implied by setting PandocMonad to IO. But for the pure version, we can just have

-- typeclass version

runPure :: Testing ByteString -> ByteString
runPure tbs = evalState (runReaderT tbs def) def

foo = runPure $ do
  setTime "midnight"
  setReferenceDocx someBlob
  writeDocx opts docx

Note that we don't need to have the subordinate runTest' (or whatever it would be called). And because of how runPure is defined, the compiler would infer that it should output the pure state version from writeDocx.

I'm not sure this is an argument for the typeclass version, but I don't see how the free monad version allows us to hide the details any more. In fact, it seems slightly clumsier, because of the different explicit interpreter we'd need to run.

@jkr
Copy link
Collaborator

jkr commented Nov 21, 2016

Sorry, for more generality, runPure in the second example should be Testing a -> a

@jgm
Copy link
Owner Author

jgm commented Nov 21, 2016

That's basically what I had in mind, but on reflection I think we might want to consider something a bit fancier:

newtype PandocPure a = PandocPure (State PandocPureState a)
newtype PandocIO a = PandocIO (StateT PandocIOState IO a)

runPure :: PandocPure a -> a
runIO :: PandocIO a -> IO a

Here PandocPureState would include all the fake real-world stuff we need, but also reader and writer options (which could therefore be set within the monad by the user) and accumulated warning messages. PandocIOState would be similar but without fake real-world stuff.

So a user could do something like:

convert :: String -> ([String], String)
convert inp = runPure $ do
  setReaderOption readerSmart True
  setWriterOption writerColumns 50
  doc <- readMarkdown inp
  let doc' = walk emphToCaps doc
  out <- writeDocBook doc
  warnings <- getWarnings
  return (warnings, out)

@jgm
Copy link
Owner Author

jgm commented Nov 21, 2016

On this approach:

  • the instances of PandocMonad (MonadPandoc?) we expose would be opaque newtypes
  • they would all include some kind of internal state to facilitate things like setReaderOption
  • they would all include some internal state (or writer monad) to collect warning messages
  • in the pure case they would include state for "fake real world"
  • but none of these state gizmos would be exposed directly to the user; the users would interact with them only indirectly via functions

Note that in the above example (convert) you could replace runPure with runIO and leave everything else the same; this would make the type String -> IO ([String], String).

As for warnings, maybe we'd need some kind of switch to allow warnings to be emitted in a streaming fashion for the IO case, instead of collecting them all. But giving the option to collect them (even in IO) seems desirable; consider a web application, for example, that would want to display the warnings on a web page rather than having them go to stderr.

@jgm
Copy link
Owner Author

jgm commented Nov 21, 2016

@jkr I put up a typeclass branch at jgm/pandoc -- rebased onto master from your typeclass branch.

@jkr
Copy link
Collaborator

jkr commented Nov 21, 2016

So PandocPure and PandocIO would be the instances? And given the convert function above, I take it that we could imagine all readers and writers having the types

readFoo :: PandocMonad m => ByteString -> m Pandoc

writeBar :: PandocMonad m => Pandoc -> m ByteString

(not getting into the technicalities of whether/how we want to unify our string types, though that might be a conversation we want to have). Just wanted to make sure I've got this.

Note that in the above example (convert) you could replace runPure with runIO and leave everything else the same; this would make the type String -> IO ([String], String).

I take it we would do runIO in the binary for all cases? But how would users (of the library) know if they have to, for their particular reader and writer? Right now, for example, we'd get the same thing with runIO . readMarkdown and return . runPure . readMarkdown. But not with, say, readT2T, which has macro expansion. I guess that's what documentation is for.

@jgm
Copy link
Owner Author

jgm commented Nov 22, 2016

+++ Jesse Rosenthal [Nov 21 16 15:57 ]:

So PandocPure and PandocIO would be the instances? And given the
convert function above, I take it that we could imagine all readers and
writers having the types

readFoo :: PandocMonad m => ByteString -> m Pandoc

writeBar :: PandocMonad m => Pandoc -> m ByteString

(not getting into the technicalities of whether/how we want to unify
our string types, though that might be a conversation we want to have).

That was the idea -- both readers and writers would be
parameterized in PandocMonad. The string type is another
issue. We could unify everything to ByteString, I suppose,
but that doesn't feel like the right thing to do with the
textual ones...

Note that in the above example (convert) you could replace runPure
with runIO and leave everything else the same; this would make the
type String -> IO ([String], String).

I take it we would do runIO in the binary for all cases?

Yes.

But how would
users (of the library) know if they have to, for their particular
reader and writer? Right now, for example, we'd get the same thing with
runIO . readMarkdown and return . runPure . readMarkdown. But not with,
say, readT2T, which has macro expansion. I guess that's what
documentation is for.

Yes, that's what I had in mind.

Also, we need to think about error handling. We don't want
any 'error' bombs in this code, especially if we want
everything to be useable "pure." You said you were
investigating making these instances of MonadError? It
would be nice to have some uniform method of throwing errors
that behaved sensibly in both monads.

This interacts with things like include files. Say we run
a reader in PandocPure and it sees "\include{myfile}".
And let's assume that there's nothing in the "fake real
world" representing the contents of myfile. I guess we
throw an error, right? So the user of the library would
soon find out that the reader only works in "pure" when
a fake include file is provided?

@jkr
Copy link
Collaborator

jkr commented Nov 22, 2016

Also, we need to think about error handling. We don't want any 'error' bombs in this code, especially if we want everything to be useable "pure." You said you were investigating making these instances of MonadError? It would be nice to have some uniform method of throwing errors that behaved sensibly in both monads.

Yes, the difficulty before had to do with the specifics of the Free monad. It should be pretty easy in the typeclass version. For the IO version, we get MonadError for free from the IO. And for the pure version we just have to add an ExceptT to the stack (I have to refresh my memory on the pros and cons of doing it on the inside vs. the outside). Or we could an EitherT to both stacks.

Then so long as we mandate that all of the readers/writers only use the P.namespace functions, we can handle errors in a sensible way. If we wanted to we could even rewrite the IO functions so they would be safe:

instance PandocMonad PandocIO where
   readFileLazy = (liftIO . BL.readFile) E.catch (doSomethingSafe)
   ...

@jkr
Copy link
Collaborator

jkr commented Nov 22, 2016

This interacts with things like include files. Say we run a reader in PandocPure and it sees "\include{myfile}". And let's assume that there's nothing in the "fake real world" representing the contents of myfile. I guess we throw an error, right?

Well, I guess it should mirror what we do in IO. But if we have safe versions we could catch errors in IO as well. At the command line, the caught errors would exit the program and give an error message. But in the programmatic version, even in IO, it would give the user a Left PandocError (or whatever) that the user could decide what to do with.

In other words, from an executable standpoint, it would look the same. But in the library, both IO and pure would catch errors in the same way. So to go back to our previous signatures, something like:

runPure :: PandocMonad a -> Either PandocError a
runIO :: PandocMonad a -> IO (Either PandocError a)

[EDITED JGM: s/PandocPure/PandocMonad/; s/PandocIO/PandocMonad/]

@jgm
Copy link
Owner Author

jgm commented Nov 23, 2016

wanted to we could even rewrite the IO functions so they would be safe:

Yes, good thought. That's probably worth doing.
We'd need to give some thought to the PandocError type,
e.g. giving it a constructor that can store IO errors
and the like, so no information gets lost.

@jkr
Copy link
Collaborator

jkr commented Nov 24, 2016

@jgm: I just pushed an update to Text.Pandoc.Class that introduces the opaque newtypes PandocIO and PandocPure. PandocIO is MonadIO. They're both instance of MonadError, so I've started implementing some of the safe functions above, which should work with throwError. See for example the implementation of readFileLazy in PandocIO. To avoid compatibility problems, fetchItem and fetchItem' still have an Either output, but that's redundant now with the MonadError instance.

Which is all to say that runIO and runPure have signatures m a -> Either PandocExecutionError a. Right now, PandocExecutionError only has one form (PandocFileReadError String), but, of course, sky's the limit there.

For the sake of compatibility, I'm also exporting runIOorExplode, which is just runIO, but it errors out if it hits a Left. So that's what's in Text.Pandoc.Pandoc now.

@jkr
Copy link
Collaborator

jkr commented Nov 24, 2016

Which is all to say that runIO and runPure have signatures m a -> Either PandocExecutionError a

Sorry, misspoke. runIO has type PandocIO a -> IO (Either PandocExecutionError a)

And, of course, happy Thanksgiving.

@jgm
Copy link
Owner Author

jgm commented Nov 25, 2016 via email

@jkr
Copy link
Collaborator

jkr commented Nov 26, 2016

You might have to take a look into Custom to see what would be required All the Lua functions seem to be IO, but I'm not sure I understand the interop enough to see the best way to wrap that (and to purify it).

@jkr
Copy link
Collaborator

jkr commented Nov 26, 2016

By the way (and this might have been clear to you from the beginning) one nice side effect of having PandocMonad be an instance on MonadError this is that all readers and writers will be able to run throwError in a unified and pure way, without needing to build it into each reader/writer's monad stack. I'm currently going through and changing all writers to PandocMonad m => WriterOptions -> Pandoc -> m String, which is currently just tacking a return $ on the toplevel function. But it does mean in the future, we can throw errors cleanly. When we do runIO we can have them self-destruct (or not) but when we do runPure, they'll just go into the Left.

@jkr
Copy link
Collaborator

jkr commented Nov 26, 2016

The more I look at it, the more I think the Custom writer doesn't really fit in with the proposed new architecture. Since it's necessarily IO, we can't pretend, as we currently do, that it's a normal writer (currently when it's called, at the toplevel, we pretend it's an IOStringWriter, but that sort of always-IO type won't exist anymore). Suggestion: treat it at the toplevel as if it were JSON output + IO post-processing. In other words, internally, we'd treat

pandoc foo.md -t custom/script.lua

as if it were

pandoc foo.md -t json | custom/script.lua

where we understand the internal lua runner to convert json first.

This might not be a big difference (other than a minor slowdown) but it does keep us from trying to force the Custom writer into a parameterized box, which might make the code a bit less hacky.

@jgm
Copy link
Owner Author

jgm commented Nov 26, 2016 via email

@jgm
Copy link
Owner Author

jgm commented Nov 26, 2016 via email

@ickc
Copy link
Contributor

ickc commented Dec 6, 2016

Is the free monad approach abandoned in favor of the typeclass approach? Sorry to ask this, I saw a lot of commits over there but no discussion here.

@jgm
Copy link
Owner Author

jgm commented Dec 6, 2016 via email

@mpickering
Copy link
Collaborator

I skimmed this discussion and it was interesting. I think you're right that ultimately using a type class is easier and in fact equivalently powerful. After all you can write an instance for the the type class which interprets each operation into the free monad and likewise an interpreter for the free monad which interprets each constructor as the type class method.

@jgm
Copy link
Owner Author

jgm commented Dec 11, 2016 via email

@jgm jgm modified the milestone: pandoc 2.0 Jan 25, 2017
@jgm
Copy link
Owner Author

jgm commented Feb 1, 2017

Since the typeclass branch has been merged, we can close this.
There are still going to be changes to the details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants