Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-f docx+styles -t dokuwiki results in raw html formatted lists #8920

Closed
barredespace opened this issue Jun 27, 2023 · 3 comments
Closed

-f docx+styles -t dokuwiki results in raw html formatted lists #8920

barredespace opened this issue Jun 27, 2023 · 3 comments
Labels

Comments

@barredespace
Copy link

Pandoc 3.1.3 / OSX 12.6.5 / HomeBrew version

this is my command line that causes the bug :
pandoc -f docx+styles -t dokuwiki juste_listes.docx -o juste_liste.txt

juste_liste.docx is a document with just one unordered list in it.

If I use this command line pandoc -f docx -t dokuwiki juste_listes.docx -o juste_liste.txt, without +styles, I get this result :

  * Liste 1
  * liste 2
  * liste 3
    * liste 3a
    * liste 3b
    * liste 3c
  * liste 4

If I add +styles here is what I get :

<HTML><ul></HTML>
<HTML><li></HTML><HTML><p></HTML>Liste 1<HTML></p></HTML>
<HTML></li></HTML>
<HTML><li></HTML><HTML><p></HTML>liste 2<HTML></p></HTML>
<HTML></li></HTML>
<HTML><li></HTML><HTML><p></HTML>liste 3<HTML></p></HTML>

<HTML><ul></HTML>
<HTML><li></HTML><HTML><p></HTML>liste 3a<HTML></p></HTML>
<HTML></li></HTML>
<HTML><li></HTML><HTML><p></HTML>liste 3b<HTML></p></HTML>
<HTML></li></HTML>
<HTML><li></HTML><HTML><p></HTML>liste 3c<HTML></p></HTML>
<HTML></li></HTML><HTML></ul></HTML>
<HTML></li></HTML>
<HTML><li></HTML><HTML><p></HTML>liste 4<HTML></p></HTML>
<HTML></li></HTML><HTML></ul></HTML>

The first command, without +styles gives me syntactically correct dokuwiki format.

I need the +styles extension to retain custom styles convert them with a lua filter and later parse them with a dokuwiki plugin.

@jgm
Copy link
Owner

jgm commented Jun 27, 2023

Can you show the output of

pandoc -f docx+styles juste_liste.docx -t native

and

pandoc -f docx juste_liste.docx -t native

respectively? There must be a change in the AST that will explain why this is happening.
Generally pandoc will fall back to raw HTML when a list contains a feature that is too complex to represent using regular dokuwiki syntax.

@barredespace
Copy link
Author

Here they are :

pandoc -f docx+styles juste_liste.docx -t native

[ BulletList
    [ [ Div
          ( "" , [] , [ ( "custom-style" , "List Paragraph" ) ] )
          [ Para [ Str "Liste" , Space , Str "1" ] ]
      ]
    , [ Div
          ( "" , [] , [ ( "custom-style" , "List Paragraph" ) ] )
          [ Para [ Str "Liste" , Space , Str "2" ] ]
      ]
    , [ Div
          ( "" , [] , [ ( "custom-style" , "List Paragraph" ) ] )
          [ Para [ Str "Liste" , Space , Str "3" ] ]
      , BulletList
          [ [ Div
                ( "" , [] , [ ( "custom-style" , "List Paragraph" ) ] )
                [ Para [ Str "Liste" , Space , Str "3a" ] ]
            ]
          , [ Div
                ( "" , [] , [ ( "custom-style" , "List Paragraph" ) ] )
                [ Para [ Str "Liste" , Space , Str "3b" ] ]
            ]
          , [ Div
                ( "" , [] , [ ( "custom-style" , "List Paragraph" ) ] )
                [ Para [ Str "Liste" , Space , Str "3c" ] ]
            ]
          ]
      ]
    , [ Div
          ( "" , [] , [ ( "custom-style" , "List Paragraph" ) ] )
          [ Para [ Str "Liste" , Space , Str "4" ] ]
      ]
    ]
]

and

pandoc -f docx juste_liste.docx -t native

[ BulletList
    [ [ Para [ Str "Liste" , Space , Str "1" ] ]
    , [ Para [ Str "Liste" , Space , Str "2" ] ]
    , [ Para [ Str "Liste" , Space , Str "3" ]
      , BulletList
          [ [ Para [ Str "Liste" , Space , Str "3a" ] ]
          , [ Para [ Str "Liste" , Space , Str "3b" ] ]
          , [ Para [ Str "Liste" , Space , Str "3c" ] ]
          ]
      ]
    , [ Para [ Str "Liste" , Space , Str "4" ] ]
    ]
]

@jgm
Copy link
Owner

jgm commented Jun 28, 2023

OK, it's the Div that is inserted for custom-styles that is blocking the regular list.
However, the div isn't being represented in HTML anyway, so we can probably improve this.

@jgm jgm closed this as completed in c908867 Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants