-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Archictectural change: Make all writers useable without IO #2930
Comments
I've only quickly skimmed through some posts about free monads. But if I understand correctly, you're proposing to abstract the writers: instead of doing the document conversion, they would only generate a plan for a document conversion. This plan (of type
where The ICML writer for example, only needs IO when processing images. So if you know that your document doesn't contain images (or wish to ignore their dimensions), you could run |
Something like that, yes. (An alternative would be to use a I would separate out the two issues:
The main point here is (2). There might be a reason to do (1) also, In addition to runIO and runPure, there could be a way of +++ Mauro Bieg [May 29 16 05:19 ]:
|
The docx writer really only seems to need it for reading the default reference-docx, and for producing some random numbers for nsids. Presumably the latter could be done in some deterministic form. Any good references on free monads that you'd recommend? Or instructive real-world usages? I tried to wrap my head around them a while back, and felt like I wasn't asking the proper questions to be able to understand the solutions. |
@jgm: as a way of working through some Free monad discussions online, I reimplemented the Docx writer using free. It was fairly painless. I've yet to see if I can gain any speed using You can find it here: https://github.com/jkr/pandoc/tree/free If this is still something you're interested in pursuing, I could try doing this for the other IO writers. We would probably also want a T.P.Free module, and move |
I like the idea! I assume DocxAction would be replaced by +++ Jesse Rosenthal [Sep 21 16 10:28 ]:
|
Yep -- I moved it into a PandocAction Monad in Text.Pandoc.Free. Right now, I have pure versions of the EPUB, Docx, ODT, and ICML readers. There's still FB2 and RTF, but I think it's in fairly usable shape right now. I currently export a pure and IO version, but the only difference is that the The repo is here: https://github.com/jkr/pandoc/tree/free One oddness to take note of: in order to have generic IORef functions, I have to add a type parameter to the functor (PandocActionF) which produces the free monad, so the monad has a parameter ( I'll look around to see if anyone deals with it in any useful way. |
@jkr what's the status of these free monad experiments? |
I got through making pure versions of the Docx, EPUB, ICML, and ODT writers, along with a The branch is here: https://github.com/jkr/pandoc/commits/free If you remove the HEAD of that branch, you'll have working writers with a |
Okay, I rebased the free branch on master. I also moved the test runner that I couldn't quite figure out to another branch to fiddle with. So if you pull from there you should have functioning pure writers. |
I've dived back into this. In both cases (odt and epub) it looks like the only modification of IORefs is straightforward list cons-ing. So if our goal is to make pure writers, it seems like the best thing to do would just be to replace this with a plain State monad. We could also do ST, since we don't seem to use anything necessarily IO-ish, but I'll try State first and check performance. |
Sounds good. +++ Jesse Rosenthal [Nov 17 16 04:12 ]:
|
Fine with me. +++ Jesse Rosenthal [Nov 17 16 06:32 ]:
|
@jgm: okay -- we now have a functioning set of pure writers, along with a Right now, all the writers export a There are still some improvements to be made. I'd like to make PandocAction an instance of MonadError so we can throw and catch in the writers -- but I haven't quite figured out a good way to do it. I'd also like to streamline the functions we output from Text.Pandoc.Free -- for example we currently have a PandocAction version of three diffrerent file reading functions (Strict BS, Lazy, and UTF8). But this might be best because the IO interpreter can just use the different versions and not have to convert. In any case, it might be nice to prune it down a bit. Anyway, it's in a workable form now. You can take a look here: |
+++ Jesse Rosenthal [Nov 18 16 14:16 ]:
Great. That's not a big deal. If you want to look at
A bit confused here...are we talking about readers or An application of this in the readers would be include files
Agreed. |
Maybe a good thing to do would be to get a list together of which IO operations occur in which writers, and why. Looking at the list in Text.Pandoc.Free, I can't even remember why some of those things are there. |
Tempting to change the API across the board, so that ALL readers and writers uniformly go in the PandocActionF monad. So e.g. readMarkdown :: ReaderOptions -> String -> PandocF Pandoc
writeHtmlString :: WriterOptions -> Pandoc -> PandocF String
main = runIO $ readMarkdown def "hello world" >>= writeHtmlString def That would simplify the types a lot; we'd no longer need a distinction between IO writers and pure writers, for example. (If we really wanted to simplify things, I suppose we could have them all produce ByteString output; I don't know if that's a good idea, though.) We could then implement things like |
Rethinking the API a bit more radically: main = runIO $ do
setReaderOption readerSmart True
setWriterOption writerColumns 50
readFileUTF8 "myfile.txt" >=> readMarkdown >=> allCapsFilter >=> writeHtmlString
>=> writeFileUTF8 "myfile.html" |
Sorry -- I meant to say "writers" there. But as you've pointed out, it would be interesting to extend it to writers. One problem that occurs to me -- we already have a number of readers and writers that are pure (so long as you're not doing a standalone). Say, docx -> markdown. So would necessitating |
And, yes, I mean to say "extend it to readers." This might be a hardwired expressive difficulty for me. |
And with a main = runPure $ do
setCurrentTime = UTCTime{...}
writeFileUTF8 "myinclude.tex" -- this just puts a "file" in state
readLaTeX "\\today\n\\include{myinclude.tex}" >=> writeHtmlString |
For pruning, I got rid of newUUID by introducing a pure next step: |
I really like the idea of the separate |
I'm going to see if I can move |
+++ Jesse Rosenthal [Nov 21 16 08:30 ]:
Agreed. |
I think we can do exactly the same thing with typeclasses, perhaps a bit more simply. First, just to make sure we're on the same page: in the Free version I have up right now, we have this:
where
and then the user could do:
This, I think, is the best we could do as far as hiding state and still letting the user set state. But, like I said, I think the typeclass approach lets us do basically the same thing. We don't have
Note that we don't need to have the subordinate I'm not sure this is an argument for the typeclass version, but I don't see how the free monad version allows us to hide the details any more. In fact, it seems slightly clumsier, because of the different explicit interpreter we'd need to run. |
Sorry, for more generality, |
That's basically what I had in mind, but on reflection I think we might want to consider something a bit fancier: newtype PandocPure a = PandocPure (State PandocPureState a)
newtype PandocIO a = PandocIO (StateT PandocIOState IO a)
runPure :: PandocPure a -> a
runIO :: PandocIO a -> IO a Here So a user could do something like: convert :: String -> ([String], String)
convert inp = runPure $ do
setReaderOption readerSmart True
setWriterOption writerColumns 50
doc <- readMarkdown inp
let doc' = walk emphToCaps doc
out <- writeDocBook doc
warnings <- getWarnings
return (warnings, out) |
On this approach:
Note that in the above example ( As for warnings, maybe we'd need some kind of switch to allow warnings to be emitted in a streaming fashion for the IO case, instead of collecting them all. But giving the option to collect them (even in IO) seems desirable; consider a web application, for example, that would want to display the warnings on a web page rather than having them go to stderr. |
@jkr I put up a typeclass branch at jgm/pandoc -- rebased onto master from your typeclass branch. |
So readFoo :: PandocMonad m => ByteString -> m Pandoc
writeBar :: PandocMonad m => Pandoc -> m ByteString (not getting into the technicalities of whether/how we want to unify our string types, though that might be a conversation we want to have). Just wanted to make sure I've got this.
I take it we would do |
+++ Jesse Rosenthal [Nov 21 16 15:57 ]:
That was the idea -- both readers and writers would be
Yes.
Yes, that's what I had in mind. Also, we need to think about error handling. We don't want This interacts with things like include files. Say we run |
Yes, the difficulty before had to do with the specifics of the Free monad. It should be pretty easy in the typeclass version. For the IO version, we get MonadError for free from the IO. And for the pure version we just have to add an ExceptT to the stack (I have to refresh my memory on the pros and cons of doing it on the inside vs. the outside). Or we could an EitherT to both stacks. Then so long as we mandate that all of the readers/writers only use the P.namespace functions, we can handle errors in a sensible way. If we wanted to we could even rewrite the IO functions so they would be safe:
|
Well, I guess it should mirror what we do in IO. But if we have safe versions we could catch errors in IO as well. At the command line, the caught errors would exit the program and give an error message. But in the programmatic version, even in IO, it would give the user a In other words, from an executable standpoint, it would look the same. But in the library, both IO and pure would catch errors in the same way. So to go back to our previous signatures, something like:
[EDITED JGM: s/PandocPure/PandocMonad/; s/PandocIO/PandocMonad/] |
Yes, good thought. That's probably worth doing. |
@jgm: I just pushed an update to Text.Pandoc.Class that introduces the opaque newtypes PandocIO and PandocPure. Which is all to say that For the sake of compatibility, I'm also exporting |
Sorry, misspoke. And, of course, happy Thanksgiving. |
@jkr - sounds great.
|
You might have to take a look into Custom to see what would be required All the Lua functions seem to be IO, but I'm not sure I understand the interop enough to see the best way to wrap that (and to purify it). |
By the way (and this might have been clear to you from the beginning) one nice side effect of having PandocMonad be an instance on MonadError this is that all readers and writers will be able to run |
The more I look at it, the more I think the Custom writer doesn't really fit in with the proposed new architecture. Since it's necessarily IO, we can't pretend, as we currently do, that it's a normal writer (currently when it's called, at the toplevel, we pretend it's an IOStringWriter, but that sort of always-IO type won't exist anymore). Suggestion: treat it at the toplevel as if it were JSON output + IO post-processing. In other words, internally, we'd treat
as if it were
where we understand the internal lua runner to convert json first. This might not be a big difference (other than a minor slowdown) but it does keep us from trying to force the Custom writer into a parameterized box, which might make the code a bit less hacky. |
+++ Jesse Rosenthal [Nov 25 16 19:36 ]:
By the way (and this might have been clear to you from the beginning)
one nice side effect of having PandocMonad be an instance on MonadError
this is that all readers and writers will be able to run throwError in
a unified and pure way, without needing to build it into each
reader/writer's monad stack.
Yes, that's going to be super-helpful.
In addition, all the readers/writers will be able to issue
warnings. I just added warnings for unconvertible math to
the docx writer, for example -- and we could add them to
every writer with this change.
|
It seems rather silly to convert to JSON just to convert
back again to Pandoc. Note that custom writers aren't
in the list of writers exported by Text.Pandoc.
Custom writers are treated specially in pandoc.hs.
So I don't think they pose any special problem,
we can just leave them in IO, can't we? They won't
have the same type as other writers, but I think that's
okay because they're already treated specially.
+++ Jesse Rosenthal [Nov 26 16 07:31 ]:
… The more I look at it, the more I think the Custom writer doesn't
really fit in with the proposed new architecture. Since it's
necessarily IO, we can't pretend, as we currently do, that it's a
normal writer (currently when it's called, at the toplevel, we pretend
it's an IOStringWriter, but that sort of always-IO type won't exist
anymore). Suggestion: treat it at the toplevel as if it were JSON
output + IO post-processing. In other words, internally, we'd treat
pandoc foo.md -t custom/script.lua
as if it were
pandoc foo.md -t json | custom/script.lua
where we understand the internal lua runner to convert json first.
This might not be a big difference (other than a minor slowdown) but it
does keep us from trying to force the Custom writer into a
parameterized box, which might make the code a bit less hacky.
—
You are receiving this because you were mentioned.
Reply to this email directly, [1]view it on GitHub, or [2]mute the
thread.
References
1. #2930 (comment)
2. https://github.com/notifications/unsubscribe-auth/AAAL5DMpYBcRcrAfl3PHTbFt7AhSRW3Aks5rCFDvgaJpZM4Iiem3
|
Is the free monad approach abandoned in favor of the typeclass approach? Sorry to ask this, I saw a lot of commits over there but no discussion here. |
+++ ickc [Dec 06 16 06:44 ]:
Is the free monad approach abandoned in favor of the typeclass
approach? Sorry to ask this, I saw a lot of commits over there but no
discussion here.
Yes. We decided that the typeclass approach would be much
easier to implement, and that the free monad approach didn't
seem to have any real advantages over it (for our purposes).
|
I skimmed this discussion and it was interesting. I think you're right that ultimately using a type class is easier and in fact equivalently powerful. After all you can write an instance for the the type class which interprets each operation into the free monad and likewise an interpreter for the free monad which interprets each constructor as the type class method. |
+++ Matthew Pickering [Dec 11 16 13:29 ]:
. After all you can write an instance for the the type class
which interprets each operation into the free monad and likewise an
interpreter for the free monad which interprets each constructor as the
type class method.
Good point!
|
Since the typeclass branch has been merged, we can close this. |
Currently a number of writers are impure and do IO: docx, odt, epub, epub3, fb2, icml, rtf.
It would be nice to use a free monad instead, so that these writers could be used either outside of IO contexts. First step would be to catalog the places where IO is really used in these writers.
The text was updated successfully, but these errors were encountered: