Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot convert a Word document to DITA topic #5

Closed
raducoravu opened this issue Oct 8, 2019 · 2 comments
Closed

Cannot convert a Word document to DITA topic #5

raducoravu opened this issue Oct 8, 2019 · 2 comments

Comments

@raducoravu
Copy link

I had success in converting Markdown to DITA using the plugin.
I'm attaching a Word Document (DOCX).

Word File with various structures.docx

If I refer to it from a DITA Map:

<topicref format="pandoc" href="Word%20File%20with%20various%20structures.docx" type="topic"/>

the publishing is not able to convert it to DITA:

pandoc.process:
   [pandoc] Processing D:/projects/eXml/frameworks/dita/DITA-OT3.x/plugins/fox.jason.passthrough.pandoc-master/test/input-markdown/markdown.md
[file-rename] Moving 1 file to D:\projects\eXml\frameworks\dita\DITA-OT3.x\plugins\fox.jason.passthrough.pandoc-master\test\input-markdown\temp\html5\oxygen_dita_temp
    [pandoc] Processing D:/projects/eXml/frameworks/dita/DITA-OT3.x/plugins/fox.jason.passthrough.pandoc-master/test/input-markdown/Word File with various structures.docx
   [pandoc] Result: 1
   [pandoc] pandoc: File: openBinaryFile: does not exist (No such file or directory)

Running pandoc from the command line on the same Word document seems to work for me:

pandoc "D:/projects/eXml/frameworks/dita/DITA-OT3.x/plugins/fox.jason.passthrough.pandoc-master/test/input-markdown/Word File with various structures.docx"
@jason-fox
Copy link
Owner

@raducoravu - does commit: 934d6f2 help you?

Specifically the extra &quot; in lines 41 and 46 of process_pandoc.xml

@raducoravu
Copy link
Author

raducoravu commented Oct 10, 2019

@jason-fox I confirm it works for me 👍 One small thing, somehow in the generated TOC the title of the word document which appears there contains %20 instead of spaces.
Btw, as pandoc does not support ASCIIDoc conversions I recently worked on a plugin for converting ASCII Doc to DITA:

https://github.com/oxygenxml/dita-asciidoc

I liked your idea to use ANT build files as a way to do the actual conversion so what my "dita-asciidoc" plugin does is that its XMLReader implementation class actually runs an ANT build file passing to it parameters for the input and output files:

https://github.com/oxygenxml/dita-asciidoc/blob/master/com.oxygenxml.ant.parser.dita/src/com/oxygenxml/ant/dita/AntProcessReader.java

So instead of having the custom build.xml as part of the preprocessing stage, the custom build.xml is called for each conversion and is giving a parameter for the input file and a parameter for the output file.
Probably your way of doing things is faster though because the processing is done for all resources from a single build file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants