PDF export #88

jgm · 2011-07-09T01:59:40Z

PDF is one of the most common formats around, and in particular, it is the
most likely to preserve all the wonderful formatting a gitit page can have.
For both of these reasons, it would be great if a Gitit page could be
exported as PDF - currently all the export options are very 'raw' formats
which just aren't very useful to someone who wants to read a page offline,
or send an article to another (perhaps the wiki in question is a personal
one). As it stands, the user must be fairly technically competent and
versed in document formats & Pandoc.

(It's also worth noting that 'export as PDF' is not unprecedented:
MediaWiki supports this (and it has proven exceptionally useful at
Wikibooks), and I'm sure many other wikis have this feature as well.)

Google Code Info:
Issue #: 64
Author: gwe...@gmail.com
Created On: 2009-09-18T15:03:35.000Z
Closed On: 2010-02-20T20:17:47.000Z

jgm · 2011-07-09T01:59:42Z

Google Code Info:
Author: fiddloso...@gmail.com
Created On: 2009-10-06T06:07:09.000Z

jgm · 2011-07-09T01:59:43Z

Here's a draft patch (I was stymied by the types involved in Handler):

diff --git a/Network/Gitit/Export.hs b/Network/Gitit/Export.hs
index f86119a..0ddeb49 100644
--- a/Network/Gitit/Export.hs
+++ b/Network/Gitit/Export.hs
@@ -21,16 +21,19 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
02111-1307 USA

module Network.Gitit.Export ( exportFormats )
where
-import Text.Pandoc
-import Text.Pandoc.ODT (saveOpenDocumentAsODT)
+
+import Control.Monad.Trans (liftIO)
import Network.Gitit.Server
-import Network.Gitit.Util (withTempDir)
import Network.Gitit.State
import Network.Gitit.Types
-import Control.Monad.Trans (liftIO)
-import Text.XHtml (noHtml)
-import qualified Data.ByteString.Lazy as B
+import Network.Gitit.Util (withTempDir)
+import System.Directory (removeFile)
import System.FilePath ((<.>), (</>))
+import System.Process (runProcess)
+import Text.Pandoc
+import Text.Pandoc.ODT (saveOpenDocumentAsODT)
+import Text.XHtml (noHtml)
+import qualified Data.ByteString.Lazy as B (readFile)

defaultRespOptions :: WriterOptions
defaultRespOptions = defaultWriterOptions { writerStandalone = True
@@ -83,6 +86,18 @@ respondMediaWiki :: String -> Pandoc -> Handler
respondMediaWiki = respond "text/plain; charset=utf-8" "" $
writeMediaWiki (defaultRespOptions {writerHeader = ""})

+respondPDF :: String -> Pandoc -> Handler
+respondPDF = respond "application/pdf" "pdf" undefined
+
+writePDF :: Pandoc -> IO String
+writePDF y = let cxt = writeConTeXt (defaultRespOptions {writerHeader =
defaultConTeXtHeader}) y in

            do writeFile ("foo.tex") cxt -- texec barfs on filenames like

"Space Ships.tex"!

               runProcess "texexec" ["--purge-all", "--quiet", "foo.tex"]

Nothing Nothing Nothing Nothing Nothing

               let pdf = readFile "foo.pdf"

```
               removeFile "foo.pdf"
```
```
               removeFile "foo.tex"
```
```
               pdf
```
respondODT :: String -> Pandoc -> Handler
respondODT page doc = do
let openDoc = writeOpenDocument
@@ -106,4 +121,5 @@ exportFormats = [ ("LaTeX", respondLaTeX) --
(description, writer)
, ("DocBook", respondDocbook)
, ("S5", respondS5)
, ("ODT", respondODT)

           , ("RTF",       respondRTF) ]

```
           , ("RTF",       respondRTF)
```

           , ("PDF",       respondPDF)]

Google Code Info:
Author: gwe...@gmail.com
Created On: 2009-12-23T20:10:46.000Z

jgm · 2011-07-09T01:59:43Z

I got a little farther this time - to actual PDF downloads. But they're somehow
corrupted every time.

diff --git a/Network/Gitit/Export.hs b/Network/Gitit/Export.hs
index e0f516f..53d946f 100644
--- a/Network/Gitit/Export.hs
+++ b/Network/Gitit/Export.hs
@@ -33,23 +33,29 @@ import Text.XHtml (noHtml)
import qualified Data.ByteString.Lazy as B
import System.FilePath ((<.>), (</>))
import Control.Exception (throwIO)
+import System.Cmd (rawSystem)
+import System.Exit (ExitCode(..))
+import System.IO.Unsafe (unsafePerformIO)
+import System.Directory (removeFile)
+-- import System.IO.UTF8 (readFile, writeFile)
+import qualified Data.ByteString as BS -- (ByteString, readFile, writeFile)

defaultRespOptions :: WriterOptions
defaultRespOptions = defaultWriterOptions { writerStandalone = True
, writerWrapText = True }

-respond :: String

```
   -> String
```
```
   -> (Pandoc -> String)
```
```
   -> String
```
```
   -> Pandoc
```
```
   -> Handler
```
+-- respond :: String
+-- -> String
+-- -> (Pandoc -> String)
+-- -> String
+-- -> Pandoc
+-- -> Handler
respond mimetype ext fn page = ok . setContentType mimetype .
(if null ext then id else setFilename (page ++ "." ++ ext)) .
toResponse . fn

-respondX :: String -> String -> String -> (WriterOptions -> Pandoc -> String)

```
    -> WriterOptions -> String -> Pandoc -> Handler
```
+-- respondX :: String -> String -> String -> (WriterOptions -> Pandoc -> String)
+-- -> WriterOptions -> String -> Pandoc -> Handler
respondX templ mimetype ext fn opts page doc = do
template' <- liftIO $ getDefaultTemplate templ
template <- case template' of
@@ -65,6 +71,19 @@ respondConTeXt :: String -> Pandoc -> Handler
respondConTeXt = respondX "context" "application/x-context" "tex"
writeConTeXt defaultRespOptions

+respondPDF :: String -> Pandoc -> Handler
+respondPDF = respondX "context" "application/pdf" "pdf" createPDF defaultRespOptions
+
+-- createPDF :: WriterOptions -> Pandoc -> String
+createPDF opts pndc = unsafePerformIO $ do writeFile "foo.tex" (writeConTeXt opts pndc)

                                      canary <- rawSystem "texexec"

["--purgeall", "foo.tex"]

                                      removeFile "foo.tex"

                                      case canary of

                                          ExitSuccess ->do x <-BS.readFile

"foo.pdf"

                                                           removeFile "foo.pdf"

                                                           return x

```
                                          ExitFailure n -> error $ "PDF
```
creation failed with code: " ++ show n
+
respondRTF :: String -> Pandoc -> Handler
respondRTF = respondX "rtf" "application/rtf" "rtf"
writeRTF defaultRespOptions
@@ -125,4 +144,5 @@ exportFormats = [ ("LaTeX", respondLaTeX) --
(description, writer)
, ("DocBook", respondDocbook)
, ("S5", respondS5)
, ("ODT", respondODT)

           , ("RTF",       respondRTF) ]

```
           , ("RTF",       respondRTF)
```

           , ("PDF",       respondPDF)]

Note that I removed the specializing type signatures. I originally tried going
through Prelude readFile/writeFile, but the downloaded PDF was completely corrupt -
even the Unicode stuff was messed up. I switched to System.IO.UTF8 (still in String),
and now the downloaded PDF had all the little incomprehensible glyphs it should, but
diff/hexdump still reported tons of differences & evince still claimed the PDF to be
corrupt. So I removed the type sigs, and switched to pure ByteString (relying on the
ByteString instance for ToMessage), but when I asked for a PDF, I get the error:

127.0.0.1 - [19/Jan/2010:01:49:34 +0000] "POST /Front+Page 1.1" 200 HTTP request
failed with: src/Happstack/Data/Xml/HaXml.hs:22:19-42: Irrefutable pattern failed for
pattern Text.XML.HaXml.Types.CElem el'

! I looked in happstack's simplehttp stuff, and that function's only caller is only
called in a ToMessage instance for [Element] - not for ByteString. So in general I
have no idea why that breaks completely, nor exactly how the IO.UTF8 pdfs are broken.

Google Code Info:
Author: gwe...@gmail.com
Created On: 2010-01-19T02:00:09.000Z

jgm · 2011-07-09T01:59:44Z

This is the wrong approach. You shouldn't be returning the PDF as a string.
Instead, use respondODT as your model.

Google Code Info:
Author: fiddloso...@gmail.com
Created On: 2010-01-19T02:46:01.000Z

jgm · 2011-07-09T01:59:45Z

Ah, I see. respondODT does work, although it took me quite a while to figure out the
template thing and then deal with texexec stupidities:

+respondPDF :: String -> Pandoc -> Handler
+respondPDF pg doc = ok $ setContentType "application/pdf" $ setFilename (pg++".pdf")
$ (toResponse noHtml) {rsBody = createPDF pg doc}
+
+-- TODO: remove unsafePerformIO
+createPDF :: String -> Pandoc -> B.ByteString
+createPDF pg pndc = unsafePerformIO $ do template' <- liftIO $ getDefaultTemplate
"context"

                                    template <- case template' of

                                        Right t  -> return t

                                        Left e   -> liftIO $ throwIO e

                                    let tex = writeConTeXt

defaultRespOptions{writerTemplate = template} pndc

                                    let page = filter isAlphaNum pg -- texexec

barfs on spaces and odd characters

                                    withTempDir "gitit-temp-pdf" $ \tempdir -> do

                                         home <- getCurrentDirectory

                                         setCurrentDirectory tempdir -- texexec

is so stupid it will always put the PDF in ./

                                         let tempfile = page <.> "tex"

                                         writeFile tempfile tex

                                         canary <- rawSystem "texexec"

["--purgeall", "--silent", tempfile]

                                         pdf <- case canary of

                                             ExitSuccess ->   B.readFile (page

<.> "pdf")

                                             ExitFailure n -> error ("PDF

creation failed with code: " ++ show n)

                                         setCurrentDirectory home -- restore

original location

                                         return pdf

Google Code Info:
Author: gwe...@gmail.com
Created On: 2010-01-19T17:00:49.000Z

jgm · 2011-07-09T01:59:46Z

PDF export (including caching!) is now in HEAD.

Google Code Info:
Author: fiddloso...@gmail.com
Created On: 2010-02-20T20:17:47.000Z

Resolves Issue jgm#88.

ghost assigned jgm Jul 9, 2011

jgm closed this as completed Jul 9, 2011

segasai pushed a commit to segasai/gitit that referenced this issue Oct 4, 2014

Fixed problem with doubled // in updir links.

1ffefa2

Resolves Issue jgm#88.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF export #88

PDF export #88

jgm commented Jul 9, 2011

jgm commented Jul 9, 2011

jgm commented Jul 9, 2011

jgm commented Jul 9, 2011

jgm commented Jul 9, 2011

jgm commented Jul 9, 2011

jgm commented Jul 9, 2011

PDF export #88

PDF export #88

Comments

jgm commented Jul 9, 2011

jgm commented Jul 9, 2011

jgm commented Jul 9, 2011

jgm commented Jul 9, 2011

jgm commented Jul 9, 2011

jgm commented Jul 9, 2011

jgm commented Jul 9, 2011