Skip to content
This repository has been archived by the owner on Apr 30, 2021. It is now read-only.

Looks like XML parse error, but based on location of CSL file #81

Closed
magthe opened this issue Sep 10, 2014 · 12 comments
Closed

Looks like XML parse error, but based on location of CSL file #81

magthe opened this issue Sep 10, 2014 · 12 comments

Comments

@magthe
Copy link

magthe commented Sep 10, 2014

After upgrading to 0.5 I've observed a very strange issue. Processing of several of my files resulted in an error like this:

pandoc -t latex --filter pandoc-citeproc --template template.latex --csl=style.csl -o lfs_system_utp.pdf lfs_system_utp_t.mkd
pandoc-citeproc: error while parsing the XML string
pandoc: Error running filter pandoc-citeproc

I simply stopped using the (slightly custom) CSL file I want to use and instead fell back on the default one that comes with pandoc-citeproc. That worked, and was all right for the moment.

After a few days I saw a message on a Haskell-related mailing list for the Arch Linux distro regarding this. That mail described a work-around: just replace the default CSL file with the one you want to use. Indeed, that works:

cp style.csl /usr/share/x86_64-linux-ghc-7.8.3/pandoc-citeproc-0.5/chicago-author-date.csl
pandoc -t latex --filter pandoc-citeproc --template template.latex -o lfs_system_utp.pdf lfs_system_utp_t.mkd

Clearly there is something going on here that is really surprising to a mere user.

@jgm
Copy link
Owner

jgm commented Sep 10, 2014

Can you share your custom CSL file, so I can try to reproduce
the problem?

+++ Magnus Therning [Sep 10 14 00:06 ]:

After upgrading to 0.5 I've observed a very strange issue. Processing of several of my files resulted in an error like this:

pandoc -t latex --filter pandoc-citeproc --template template.latex --csl=style.csl -o lfs_system_utp.pdf lfs_system_utp_t.mkd
pandoc-citeproc: error while parsing the XML string
pandoc: Error running filter pandoc-citeproc

I simply stopped using the (slightly custom) CSL file I want to use and instead fell back on the default one that comes with pandoc-citeproc. That worked, and was all right for the moment.

After a few days I saw a message on a Haskell-related mailing list for the Arch Linux distro regarding this. That mail described a work-around: just replace the default CSL file with the one you want to use. Indeed, that works:

cp style.csl /usr/share/x86_64-linux-ghc-7.8.3/pandoc-citeproc-0.5/chicago-author-date.csl
pandoc -t latex --filter pandoc-citeproc --template template.latex -o lfs_system_utp.pdf lfs_system_utp_t.mkd

Clearly there is something going on here that is really surprising to a mere user.


Reply to this email directly or view it on GitHub:
#81

@magthe
Copy link
Author

magthe commented Sep 10, 2014

It's here: https://gist.github.com/magthe/4c45ed79f245f6712755

Just to be clear though, copying the default CSL to the local directory, and then using the --csl argument to pandoc also results in the error message from above. So I'd be very surprised if it really is an XML parsing problem.

@jgm
Copy link
Owner

jgm commented Sep 10, 2014

+++ Magnus Therning [Sep 10 14 12:36 ]:

It's here: https://gist.github.com/magthe/4c45ed79f245f6712755

Just to be clear though, copying the default CSL to the local directory, and then using the --csl argument to pandoc also results in the error message from above. So I'd be very surprised if it really is an XML parsing problem.

Oh, thanks. That's a good clue.

@jgm
Copy link
Owner

jgm commented Sep 10, 2014

I can't reproduce this. Did you install using cabal, or in some other way? If via cabal, can you send the output of ghc-pkg list?

@magthe
Copy link
Author

magthe commented Sep 11, 2014

I install using the distro package manager. Since I also maintain the packages involved I know that the output of ghc-pkg list reflects the build environment used.

Pandoc and pandoc-citeproc are built with the following flags:

pandoc  1.13.0.1-3 (-make-pandoc-man-pages https -trypandoc -embed_data_files)
pandoc-citeproc  0.4.0.1-3 (-test_citeproc -unicode_collation -embed_data_files -hexpat bibutils small_base)

This is the output of ghc-pkg list after installing pandoc-citeproc on a clean system:

/usr/lib/ghc-7.8.3/package.conf.d:
    Cabal-1.18.1.3
    HTTP-4000.2.18
    JuicyPixels-3.1.7.1
    SHA-1.6.4.1
    aeson-0.7.0.6
    aeson-pretty-0.7.1
    array-0.5.0.0
    asn1-encoding-0.8.1.3
    asn1-parse-0.8.1
    asn1-types-0.2.3
    attoparsec-0.11.3.4
    base-4.7.0.1
    base64-bytestring-1.0.0.1
    bin-package-db-0.0.0.0
    binary-0.7.1.0
    blaze-builder-0.3.3.2
    blaze-html-0.7.0.2
    blaze-markup-0.6.1.0
    rts-1.0
    byteable-0.1.1
    bytestring-0.10.4.0
    case-insensitive-1.2.0.0
    cereal-0.4.0.1
    cipher-aes-0.2.8
    cipher-des-0.0.6
    cipher-rc4-0.1.4
    cmdargs-0.10.9
    conduit-1.2.0.2
    connection-0.2.3
    containers-0.5.5.1
    cookie-0.4.1.3
    cprng-aes-0.5.2
    crypto-cipher-types-0.0.9
    crypto-numbers-0.2.3
    crypto-pubkey-0.2.4
    crypto-pubkey-types-0.4.2.2
    crypto-random-0.0.8
    cryptohash-0.11.6
    data-default-0.5.3
    data-default-class-0.0.1
    data-default-instances-base-0.0.1
    data-default-instances-containers-0.0.1
    data-default-instances-dlist-0.0.1
    data-default-instances-old-locale-0.0.1
    deepseq-1.3.0.2
    deepseq-generics-0.1.1.1
    digest-0.0.1.2
    directory-1.2.1.0
    dlist-0.7.1
    exceptions-0.6.1
    extensible-exceptions-0.1.1.4
    filepath-1.3.0.2
    (ghc-7.8.3)
    ghc-prim-0.3.1.0
    haddock-library-1.1.1
    hashable-1.2.2.0
    haskeline-0.7.1.2
    (haskell2010-1.1.2.0)
    (haskell98-2.0.0.3)
    highlighting-kate-0.5.9
    hoopl-3.10.0.1
    hpc-0.6.0.1
    hs-bibutils-5.0
    hslua-0.3.13
    http-client-0.3.8.2
    http-client-tls-0.2.2
    http-types-0.8.5
    integer-gmp-0.5.1.0
    lifted-base-0.2.3.0
    mime-types-0.1.0.4
    mmap-0.5.9
    mmorph-1.0.4
    monad-control-0.3.3.0
    mtl-2.1.3.1
    nats-0.2
    network-2.5.0.0
    old-locale-1.0.0.6
    old-time-1.1.0.2
    pandoc-1.13.1
    pandoc-citeproc-0.5
    pandoc-types-1.12.4.1
    parsec-3.1.5
    pem-0.2.2
    pretty-1.1.1.1
    primitive-0.5.3.0
    process-1.2.0.0
    publicsuffixlist-0.1
    random-1.0.1.3
    regex-base-0.93.2
    regex-pcre-builtin-0.94.4.8.8.35
    resourcet-1.1.2.3
    rfc5051-0.1.0.3
    scientific-0.3.3.0
    securemem-0.1.3
    semigroups-0.15.2
    socks-0.5.4
    split-0.2.2
    stm-2.4.3
    streaming-commons-0.1.4.2
    syb-0.4.2
    tagsoup-0.13.2
    template-haskell-2.9.0.0
    temporary-1.2.0.3
    terminfo-0.4.0.0
    texmath-0.8
    text-1.1.1.3
    time-1.4.2
    tls-1.2.9
    transformers-0.3.0.0
    transformers-base-0.4.3
    unix-2.7.0.1
    unordered-containers-0.2.5.0
    utf8-string-0.3.8
    vector-0.10.11.0
    void-0.6.1
    x509-1.4.12
    x509-store-1.4.4
    x509-system-1.4.5
    x509-validation-1.5.0
    xhtml-3000.2.1
    xml-1.3.13
    yaml-0.8.9.1
    zip-archive-0.2.3.4
    zlib-0.5.4.1

@jgm
Copy link
Owner

jgm commented Sep 11, 2014

The -hexpat stands out as a non-default flag that would be different
from my setup. Is there a reason you don't use hexpat? It is much
faster. It may be that the non-hexpat configuration is now broken.

+++ Magnus Therning [Sep 11 14 03:24 ]:

I install using the distro package manager. Since I also maintain the packages involved I know that the output of ghc-pkg list reflects the build environment used.

Pandoc and pandoc-citeproc are built with the following flags:

pandoc  1.13.0.1-3 (-make-pandoc-man-pages https -trypandoc -embed_data_files)
pandoc-citeproc  0.4.0.1-3 (-test_citeproc -unicode_collation -embed_data_files -hexpat bibutils small_base)

This is the output of ghc-pkg list after installing pandoc-citeproc on a clean system:

/usr/lib/ghc-7.8.3/package.conf.d:
   Cabal-1.18.1.3
   HTTP-4000.2.18
   JuicyPixels-3.1.7.1
   SHA-1.6.4.1
   aeson-0.7.0.6
   aeson-pretty-0.7.1
   array-0.5.0.0
   asn1-encoding-0.8.1.3
   asn1-parse-0.8.1
   asn1-types-0.2.3
   attoparsec-0.11.3.4
   base-4.7.0.1
   base64-bytestring-1.0.0.1
   bin-package-db-0.0.0.0
   binary-0.7.1.0
   blaze-builder-0.3.3.2
   blaze-html-0.7.0.2
   blaze-markup-0.6.1.0
   rts-1.0
   byteable-0.1.1
   bytestring-0.10.4.0
   case-insensitive-1.2.0.0
   cereal-0.4.0.1
   cipher-aes-0.2.8
   cipher-des-0.0.6
   cipher-rc4-0.1.4
   cmdargs-0.10.9
   conduit-1.2.0.2
   connection-0.2.3
   containers-0.5.5.1
   cookie-0.4.1.3
   cprng-aes-0.5.2
   crypto-cipher-types-0.0.9
   crypto-numbers-0.2.3
   crypto-pubkey-0.2.4
   crypto-pubkey-types-0.4.2.2
   crypto-random-0.0.8
   cryptohash-0.11.6
   data-default-0.5.3
   data-default-class-0.0.1
   data-default-instances-base-0.0.1
   data-default-instances-containers-0.0.1
   data-default-instances-dlist-0.0.1
   data-default-instances-old-locale-0.0.1
   deepseq-1.3.0.2
   deepseq-generics-0.1.1.1
   digest-0.0.1.2
   directory-1.2.1.0
   dlist-0.7.1
   exceptions-0.6.1
   extensible-exceptions-0.1.1.4
   filepath-1.3.0.2
   (ghc-7.8.3)
   ghc-prim-0.3.1.0
   haddock-library-1.1.1
   hashable-1.2.2.0
   haskeline-0.7.1.2
   (haskell2010-1.1.2.0)
   (haskell98-2.0.0.3)
   highlighting-kate-0.5.9
   hoopl-3.10.0.1
   hpc-0.6.0.1
   hs-bibutils-5.0
   hslua-0.3.13
   http-client-0.3.8.2
   http-client-tls-0.2.2
   http-types-0.8.5
   integer-gmp-0.5.1.0
   lifted-base-0.2.3.0
   mime-types-0.1.0.4
   mmap-0.5.9
   mmorph-1.0.4
   monad-control-0.3.3.0
   mtl-2.1.3.1
   nats-0.2
   network-2.5.0.0
   old-locale-1.0.0.6
   old-time-1.1.0.2
   pandoc-1.13.1
   pandoc-citeproc-0.5
   pandoc-types-1.12.4.1
   parsec-3.1.5
   pem-0.2.2
   pretty-1.1.1.1
   primitive-0.5.3.0
   process-1.2.0.0
   publicsuffixlist-0.1
   random-1.0.1.3
   regex-base-0.93.2
   regex-pcre-builtin-0.94.4.8.8.35
   resourcet-1.1.2.3
   rfc5051-0.1.0.3
   scientific-0.3.3.0
   securemem-0.1.3
   semigroups-0.15.2
   socks-0.5.4
   split-0.2.2
   stm-2.4.3
   streaming-commons-0.1.4.2
   syb-0.4.2
   tagsoup-0.13.2
   template-haskell-2.9.0.0
   temporary-1.2.0.3
   terminfo-0.4.0.0
   texmath-0.8
   text-1.1.1.3
   time-1.4.2
   tls-1.2.9
   transformers-0.3.0.0
   transformers-base-0.4.3
   unix-2.7.0.1
   unordered-containers-0.2.5.0
   utf8-string-0.3.8
   vector-0.10.11.0
   void-0.6.1
   x509-1.4.12
   x509-store-1.4.4
   x509-system-1.4.5
   x509-validation-1.5.0
   xhtml-3000.2.1
   xml-1.3.13
   yaml-0.8.9.1
   zip-archive-0.2.3.4
   zlib-0.5.4.1

Reply to this email directly or view it on GitHub:
#81 (comment)

@magthe
Copy link
Author

magthe commented Sep 11, 2014

Well, hexpat isn't in our repo and since the dependencies can be satisfied without it that's what happens. Anyway, I modified the flag and pulled in hexpat and now it works fine. So indeed, it seems the non-hexpat XML parsing is broken.

@jgm
Copy link
Owner

jgm commented Sep 21, 2014

I've just replaced the old xml-light and hexpat based CSL parsers with a new, xml-conduit-based one (pure Haskell). It is about twice as fast as the old hexpat based parser in my tests, and will be much easier to maintain and extend. This should solve this issue once it is released.

@jgm jgm closed this as completed Sep 21, 2014
@nylki
Copy link

nylki commented Jun 11, 2015

I still have this issue with pandoc-citeproc 0.5 on Fedora 22. Is there a fix for this situation? I suppose I'd have to build pandoc-citeproc myself to get the most recent version or wait until fedora puts it into their repository?

The workaround to replace the default .csl works, but it's obviously not a very practical solution.

@ousia
Copy link

ousia commented Jun 11, 2015

I still have this issue with pandoc-citeproc 0.5 on Fedora 22. Is there a fix for this situation? I suppose I'd have to build pandoc-citeproc myself to get the most recent version or wait until fedora puts it into their repository?

@nylki, there is a copr repository with pandoc statically linked from Jens Petersen (https://copr.fedoraproject.org/coprs/petersen/pandoc/).

I have just asked him whether he could add the latest version from pandoc-citeproc.

@nylki
Copy link

nylki commented Jun 26, 2015

@ousia thanks! have you got a response from Jens Peter?

@ousia
Copy link

ousia commented Jun 26, 2015

@nylki, you have a subpackage at https://copr.fedoraproject.org/coprs/petersen/pandoc/ (only for Fedora 22 or newer).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants