Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF export #88

Closed
jgm opened this issue Jul 9, 2011 · 6 comments
Closed

PDF export #88

jgm opened this issue Jul 9, 2011 · 6 comments
Assignees

Comments

@jgm
Copy link
Owner

jgm commented Jul 9, 2011

PDF is one of the most common formats around, and in particular, it is the
most likely to preserve all the wonderful formatting a gitit page can have.
For both of these reasons, it would be great if a Gitit page could be
exported as PDF - currently all the export options are very 'raw' formats
which just aren't very useful to someone who wants to read a page offline,
or send an article to another (perhaps the wiki in question is a personal
one). As it stands, the user must be fairly technically competent and
versed in document formats & Pandoc.

(It's also worth noting that 'export as PDF' is not unprecedented:
MediaWiki supports this (and it has proven exceptionally useful at
Wikibooks), and I'm sure many other wikis have this feature as well.)

Google Code Info:
Issue #: 64
Author: gwe...@gmail.com
Created On: 2009-09-18T15:03:35.000Z
Closed On: 2010-02-20T20:17:47.000Z

@ghost ghost assigned jgm Jul 9, 2011
@jgm jgm closed this as completed Jul 9, 2011
@jgm
Copy link
Owner Author

jgm commented Jul 9, 2011

Google Code Info:
Author: fiddloso...@gmail.com
Created On: 2009-10-06T06:07:09.000Z

@jgm
Copy link
Owner Author

jgm commented Jul 9, 2011

Here's a draft patch (I was stymied by the types involved in Handler):

diff --git a/Network/Gitit/Export.hs b/Network/Gitit/Export.hs
index f86119a..0ddeb49 100644
--- a/Network/Gitit/Export.hs
+++ b/Network/Gitit/Export.hs
@@ -21,16 +21,19 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
02111-1307 USA

module Network.Gitit.Export ( exportFormats )
where
-import Text.Pandoc
-import Text.Pandoc.ODT (saveOpenDocumentAsODT)
+
+import Control.Monad.Trans (liftIO)
import Network.Gitit.Server
-import Network.Gitit.Util (withTempDir)
import Network.Gitit.State
import Network.Gitit.Types
-import Control.Monad.Trans (liftIO)
-import Text.XHtml (noHtml)
-import qualified Data.ByteString.Lazy as B
+import Network.Gitit.Util (withTempDir)
+import System.Directory (removeFile)
import System.FilePath ((<.>), (</>))
+import System.Process (runProcess)
+import Text.Pandoc
+import Text.Pandoc.ODT (saveOpenDocumentAsODT)
+import Text.XHtml (noHtml)
+import qualified Data.ByteString.Lazy as B (readFile)

defaultRespOptions :: WriterOptions
defaultRespOptions = defaultWriterOptions { writerStandalone = True
@@ -83,6 +86,18 @@ respondMediaWiki :: String -> Pandoc -> Handler
respondMediaWiki = respond "text/plain; charset=utf-8" "" $
writeMediaWiki (defaultRespOptions {writerHeader = ""})

+respondPDF :: String -> Pandoc -> Handler
+respondPDF = respond "application/pdf" "pdf" undefined
+
+writePDF :: Pandoc -> IO String
+writePDF y = let cxt = writeConTeXt (defaultRespOptions {writerHeader =
defaultConTeXtHeader}) y in

  •             do writeFile ("foo.tex") cxt -- texec barfs on filenames like
    
    "Space Ships.tex"!
  •                runProcess "texexec" ["--purge-all", "--quiet", "foo.tex"]
    
    Nothing Nothing Nothing Nothing Nothing
  •                let pdf = readFile "foo.pdf"
    
  •                removeFile "foo.pdf"
    
  •                removeFile "foo.tex"
    
  •                pdf
    
  • respondODT :: String -> Pandoc -> Handler
    respondODT page doc = do
    let openDoc = writeOpenDocument
    @@ -106,4 +121,5 @@ exportFormats = [ ("LaTeX", respondLaTeX) --
    (description, writer)
    , ("DocBook", respondDocbook)
    , ("S5", respondS5)
    , ("ODT", respondODT)
  •            , ("RTF",       respondRTF) ]
    
  •            , ("RTF",       respondRTF)
    
  •            , ("PDF",       respondPDF)]
    

Google Code Info:
Author: gwe...@gmail.com
Created On: 2009-12-23T20:10:46.000Z

@jgm
Copy link
Owner Author

jgm commented Jul 9, 2011

I got a little farther this time - to actual PDF downloads. But they're somehow
corrupted every time.

diff --git a/Network/Gitit/Export.hs b/Network/Gitit/Export.hs
index e0f516f..53d946f 100644
--- a/Network/Gitit/Export.hs
+++ b/Network/Gitit/Export.hs
@@ -33,23 +33,29 @@ import Text.XHtml (noHtml)
import qualified Data.ByteString.Lazy as B
import System.FilePath ((<.>), (</>))
import Control.Exception (throwIO)
+import System.Cmd (rawSystem)
+import System.Exit (ExitCode(..))
+import System.IO.Unsafe (unsafePerformIO)
+import System.Directory (removeFile)
+-- import System.IO.UTF8 (readFile, writeFile)
+import qualified Data.ByteString as BS -- (ByteString, readFile, writeFile)

defaultRespOptions :: WriterOptions
defaultRespOptions = defaultWriterOptions { writerStandalone = True
, writerWrapText = True }

-respond :: String

  •    -> String
    
  •    -> (Pandoc -> String)
    
  •    -> String
    
  •    -> Pandoc
    
  •    -> Handler
    
    +-- respond :: String
    +-- -> String
    +-- -> (Pandoc -> String)
    +-- -> String
    +-- -> Pandoc
    +-- -> Handler
    respond mimetype ext fn page = ok . setContentType mimetype .
    (if null ext then id else setFilename (page ++ "." ++ ext)) .
    toResponse . fn

-respondX :: String -> String -> String -> (WriterOptions -> Pandoc -> String)

  •     -> WriterOptions -> String -> Pandoc -> Handler
    
    +-- respondX :: String -> String -> String -> (WriterOptions -> Pandoc -> String)
    +-- -> WriterOptions -> String -> Pandoc -> Handler
    respondX templ mimetype ext fn opts page doc = do
    template' <- liftIO $ getDefaultTemplate templ
    template <- case template' of
    @@ -65,6 +71,19 @@ respondConTeXt :: String -> Pandoc -> Handler
    respondConTeXt = respondX "context" "application/x-context" "tex"
    writeConTeXt defaultRespOptions

+respondPDF :: String -> Pandoc -> Handler
+respondPDF = respondX "context" "application/pdf" "pdf" createPDF defaultRespOptions
+
+-- createPDF :: WriterOptions -> Pandoc -> String
+createPDF opts pndc = unsafePerformIO $ do writeFile "foo.tex" (writeConTeXt opts pndc)

  •                                       canary <- rawSystem "texexec"
    
    ["--purgeall", "foo.tex"]
  •                                       removeFile "foo.tex"
    
  •                                       case canary of
    
  •                                           ExitSuccess ->do x <-BS.readFile
    
    "foo.pdf"
  •                                                            removeFile "foo.pdf"
    
  •                                                            return x
    
  •                                           ExitFailure n -> error $ "PDF
    
    creation failed with code: " ++ show n
    +
    respondRTF :: String -> Pandoc -> Handler
    respondRTF = respondX "rtf" "application/rtf" "rtf"
    writeRTF defaultRespOptions
    @@ -125,4 +144,5 @@ exportFormats = [ ("LaTeX", respondLaTeX) --
    (description, writer)
    , ("DocBook", respondDocbook)
    , ("S5", respondS5)
    , ("ODT", respondODT)
  •            , ("RTF",       respondRTF) ]
    
  •            , ("RTF",       respondRTF)
    
  •            , ("PDF",       respondPDF)]
    

Note that I removed the specializing type signatures. I originally tried going
through Prelude readFile/writeFile, but the downloaded PDF was completely corrupt -
even the Unicode stuff was messed up. I switched to System.IO.UTF8 (still in String),
and now the downloaded PDF had all the little incomprehensible glyphs it should, but
diff/hexdump still reported tons of differences & evince still claimed the PDF to be
corrupt. So I removed the type sigs, and switched to pure ByteString (relying on the
ByteString instance for ToMessage), but when I asked for a PDF, I get the error:

127.0.0.1 - [19/Jan/2010:01:49:34 +0000] "POST /Front+Page 1.1" 200 HTTP request
failed with: src/Happstack/Data/Xml/HaXml.hs:22:19-42: Irrefutable pattern failed for
pattern Text.XML.HaXml.Types.CElem el'

! I looked in happstack's simplehttp stuff, and that function's only caller is only
called in a ToMessage instance for [Element] - not for ByteString. So in general I
have no idea why that breaks completely, nor exactly how the IO.UTF8 pdfs are broken.

Google Code Info:
Author: gwe...@gmail.com
Created On: 2010-01-19T02:00:09.000Z

@jgm
Copy link
Owner Author

jgm commented Jul 9, 2011

This is the wrong approach. You shouldn't be returning the PDF as a string.
Instead, use respondODT as your model.

Google Code Info:
Author: fiddloso...@gmail.com
Created On: 2010-01-19T02:46:01.000Z

@jgm
Copy link
Owner Author

jgm commented Jul 9, 2011

Ah, I see. respondODT does work, although it took me quite a while to figure out the
template thing and then deal with texexec stupidities:

+respondPDF :: String -> Pandoc -> Handler
+respondPDF pg doc = ok $ setContentType "application/pdf" $ setFilename (pg++".pdf")
$ (toResponse noHtml) {rsBody = createPDF pg doc}
+
+-- TODO: remove unsafePerformIO
+createPDF :: String -> Pandoc -> B.ByteString
+createPDF pg pndc = unsafePerformIO $ do template' <- liftIO $ getDefaultTemplate
"context"

  •                                     template <- case template' of
    
  •                                         Right t  -> return t
    
  •                                         Left e   -> liftIO $ throwIO e
    
  •                                     let tex = writeConTeXt
    
    defaultRespOptions{writerTemplate = template} pndc
  •                                     let page = filter isAlphaNum pg -- texexec
    
    barfs on spaces and odd characters
  •                                     withTempDir "gitit-temp-pdf" $ \tempdir -> do
    
  •                                          home <- getCurrentDirectory
    
  •                                          setCurrentDirectory tempdir -- texexec
    
    is so stupid it will always put the PDF in ./
  •                                          let tempfile = page <.> "tex"
    
  •                                          writeFile tempfile tex
    
  •                                          canary <- rawSystem "texexec"
    
    ["--purgeall", "--silent", tempfile]
  •                                          pdf <- case canary of
    
  •                                              ExitSuccess ->   B.readFile (page
    
    <.> "pdf")
  •                                              ExitFailure n -> error ("PDF
    
    creation failed with code: " ++ show n)
  •                                          setCurrentDirectory home -- restore
    
    original location
  •                                          return pdf
    

Google Code Info:
Author: gwe...@gmail.com
Created On: 2010-01-19T17:00:49.000Z

@jgm
Copy link
Owner Author

jgm commented Jul 9, 2011

PDF export (including caching!) is now in HEAD.

Google Code Info:
Author: fiddloso...@gmail.com
Created On: 2010-02-20T20:17:47.000Z

segasai pushed a commit to segasai/gitit that referenced this issue Oct 4, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant