Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docx to PDF Converter Problem #332

Open
ksh901016 opened this issue Oct 4, 2018 · 23 comments
Open

Docx to PDF Converter Problem #332

ksh901016 opened this issue Oct 4, 2018 · 23 comments

Comments

@ksh901016
Copy link

ksh901016 commented Oct 4, 2018

Hi
First of all, I appreciate to provide this excellent API.

I use pdf converter. but I can't get pdf that I want.

I use below test code.
PdfOptions options = PdfOptions.create();
PdfConverter.getInstance().convert(doc, out, options);

test.docx
This is docx file
docx

after convert to pdf
pdf

I don't know why pdf file is breaked.
Docx file that do not include tables or that include simple table are well converted.
Docx's tables with difficulty is difficult to convert to pdf?

@angelozerr
Copy link
Member

Docx's tables with difficulty is difficult to convert to pdf?

Indeed it's hard, but I have no time to study your problem, any contribution are welcome!

@ksh901016
Copy link
Author

ksh901016 commented Oct 4, 2018

@angelozerr Thank you for your answer. I'll try .

@kan2008tnptc
Copy link

@ksh901016 hii.. are you able to solve your issue.. I have the same problem while converting,.. Also, The tables are running between pages... do you have any solution for this?

@ksh901016
Copy link
Author

@kan2008tnptc sorry, I could't solve it. T.T

@fochoac
Copy link

fochoac commented May 10, 2019

@kan2008tnptc, Hi,
In what cases you have that problem?, I tried it and i did not have your error.
I used the next code:

@Override
    public byte[] exportPdf() throws ReportException {
        ByteArrayOutputStream pdfStream = new ByteArrayOutputStream();
        ByteArrayInputStream is = new ByteArrayInputStream(exportBytes());
        Options options = Options.getFrom(DocumentKind.DOCX).to(ConverterTypeTo.PDF);
        IConverter conversor = ConverterRegistry.getRegistry().getConverter(options);
        try {
            conversor.convert(is, pdfStream, options);
        } catch (XDocConverterException e) {
            throw new ReportException("Error", e);
        }

        return pdfStream.toByteArray();
    }

@kan2008tnptc
Copy link

HI @fochoac

I have used this before your comment.

FileInputStream in = new FileInputStream(docxFile); XWPFDocument document = new XWPFDocument(in); OutputStream out = null; if(!pdfFilePath.isEmpty()) { out = new FileOutputStream(new File(pdfFilePath)); PdfOptions options = PdfOptions.create(); //PdfConverter.getInstance().convert(document, out, options); } document.close();

the table was breaking by using above code..

So i have used your code but it still the same issue appears. the table splits across the page.

Please see the image belwo

image

@fochoac
Copy link

fochoac commented May 13, 2019

Hello @ kan2008tnptc,
I tried the case and attached the results. I don't have any problem, except that your document is poorly structured the xml (it can be a word theme when saving). I made a new document with the same information and format and the PDF is good.

I tried your document "test.docx" and found many errors. Please check the attached images.

Regards.

Attachments:

@kan2008tnptc
Copy link

HI @fochoac

Can you please explain how it is poorly structured xml?

also can you also share the exact dependency names and versions (xdocreport jars) of your projects for this pdf coversion?

@fochoac
Copy link

fochoac commented May 14, 2019

Hi,

I have the next dependencies in my pom.xml:

<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.xdocreport.document.docx</artifactId>
    <version>2.0.2</version>
</dependency>
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.xdocreport.document</artifactId>
    <version>2.0.2</version>
</dependency>
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId> fr.opensagres.xdocreport.core</artifactId>
    <version>2.0.2</version>
</dependency>
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.xdocreport.converter</artifactId>
    <version>2.0.2</version>
</dependency>
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.xdocreport.converter.docx.xwpf</artifactId>
    <version>2.0.2</version>
    <exclusions>
        <exclusion>
            <groupId>fr.opensagres.xdocreport</groupId>
            <artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId>
        </exclusion>
    </exclusions>
</dependency>
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.poi.xwpf.converter.pdf.itext5</artifactId>
    <version>2.0.2</version>
</dependency>
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.xdocreport.template</artifactId>
    <version>2.0.2</version>
</dependency>

The unit test is in the next url.

The problem with the document is in the attachments:
Document.xml from test.docx

Result after conversion

I saw that the problem is this, I don't know if i am good, however, in my opinion, the problem is the gridspan with the next labels:

  • <w:gridAfter w:val="1"/>
  • <w:wAfter w:w="12" w:type="dxa"/>

Regards.

@kan2008tnptc
Copy link

HI Can you also please share your apache POI Dependencies if you have in case?

@fochoac
Copy link

fochoac commented May 15, 2019

Please check the next URL.

Regards.

@jagriti22
Copy link

Hi @fochoac I am also facing some formatting issue while converting docx into pdf.Could you please help me with updated full source code of this XWPF converter.
I would be grateful to you.

@kan2008tnptc
Copy link

kan2008tnptc commented May 28, 2020 via email

@jagriti22
Copy link

Thanks for your response @kan2008tnptc . I also started doing from using JODConverter with open office but facing this error:

Exception in thread "main" java.lang.IllegalArgumentException: officeHome must exist and be a directory
at bboss.org.artofsolving.jodconverter.office.DefaultOfficeManagerConfiguration.checkArgument(DefaultOfficeManagerConfiguration.java:221)
at bboss.org.artofsolving.jodconverter.office.DefaultOfficeManagerConfiguration.setOfficeHome(DefaultOfficeManagerConfiguration.java:54)
at bboss.org.artofsolving.jodconverter.office.DefaultOfficeManagerConfiguration.setOfficeHome(DefaultOfficeManagerConfiguration.java:49)
at ModifiedXWPF.JODConv.main(JODConv.java:29)

@kan2008tnptc
Copy link

kan2008tnptc commented May 28, 2020 via email

@kan2008tnptc
Copy link

kan2008tnptc commented May 28, 2020 via email

@jagriti22
Copy link

Thanks @kan2008tnptc

This is my source code now using libre Office ,Now getting new error.

import java.io.File;
import bboss.org.artofsolving.jodconverter.OfficeDocumentConverter;
import bboss.org.artofsolving.jodconverter.office.DefaultOfficeManagerConfiguration;
import bboss.org.artofsolving.jodconverter.office.OfficeManager;
/*import bboss.org.artofsolving.jodconverter.OfficeDocumentConverter;
import bboss.org.artofsolving.jodconverter.office.DefaultOfficeManagerConfiguration;
import bboss.org.artofsolving.jodconverter.office.OfficeManager;
*/

public class JODConv {

public static void main(String[] args) {
	
	String  libreOfficePath = "C:\\Program Files\\LibreOffice";

	OfficeManager  officeManager = new DefaultOfficeManagerConfiguration().setOfficeHome(new File(libreOfficePath)).buildOfficeManager();

	OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);
	 officeManager.start();
	 
	 if (officeManager !=null)
	 { 
		 officeManager.stop(); 
	 } 
	 
	 createPDF(converter);
	  
}
			 
			  private static void createPDF(OfficeDocumentConverter converter) { 
			  long start = System.currentTimeMillis(); converter.convert(new File("C:/Users/hp/Downloads/Resume.docx"), new File("C:/Users/hp/Downloads/JODC1.pdf"));
			  System.err.println("Generate pdf/HelloWorld.pdf with " +(System.currentTimeMillis() - start) + "ms");

log4j:WARN No appenders could be found for logger (bboss.org.artofsolving.jodconverter.office.ProcessPoolOfficeManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2257 [OfficeProcessThread-0] DEBUG Sigar - no sigar-amd64-winnt.dll in java.library.path: [C:\Program Files\Java\jdk-11.0.7\bin, C:\WINDOWS\Sun\Java\bin, C:\WINDOWS\system32, C:\WINDOWS, C:/Program Files/Java/jdk-11.0.7/bin/server, C:/Program Files/Java/jdk-11.0.7/bin, C:\Windows\System32, C:\Windows, C:\Program Files\Java\jdk-11.0.7\bin, ., C:\Program Files\Docker\Docker\resources\bin, C:\ProgramData\DockerDesktop\version-bin, C:\Users\hp\AppData\Local\Microsoft\WindowsApps, C:\Users\hp\AppData\Roaming\npm, C:\Users\hp\AppData\Local\Programs\Microsoft VS Code\bin, C:\Program Files\nodejs, C:\Users\hp\AppData\Roaming\npm, C:\Program Files\nodejs, ., C:\Liferay\workspace\liferay-developer-studio, ., .]
org.hyperic.sigar.SigarException: no sigar-amd64-winnt.dll in java.library.path: [C:\Program Files\Java\jdk-11.0.7\bin, C:\WINDOWS\Sun\Java\bin, C:\WINDOWS\system32, C:\WINDOWS, C:/Program Files/Java/jdk-11.0.7/bin/server, C:/Program Files/Java/jdk-11.0.7/bin, C:\Windows\System32, C:\Windows, C:\Program Files\Java\jdk-11.0.7\bin, ., C:\Program Files\Docker\Docker\resources\bin, C:\ProgramData\DockerDesktop\version-bin, C:\Users\hp\AppData\Local\Microsoft\WindowsApps, C:\Users\hp\AppData\Roaming\npm, C:\Users\hp\AppData\Local\Programs\Microsoft VS Code\bin, C:\Program Files\nodejs, C:\Users\hp\AppData\Roaming\npm, C:\Program Files\nodejs, ., C:\Liferay\workspace\liferay-developer-studio, ., .]
at org.hyperic.sigar.Sigar.loadLibrary(Sigar.java:172)
at org.hyperic.sigar.Sigar.(Sigar.java:100)
at bboss.org.artofsolving.jodconverter.process.SigarProcessManager.findPid(SigarProcessManager.java:40)
at bboss.org.artofsolving.jodconverter.office.OfficeProcess.start(OfficeProcess.java:67)
at bboss.org.artofsolving.jodconverter.office.OfficeProcess.start(OfficeProcess.java:62)
at bboss.org.artofsolving.jodconverter.office.ManagedOfficeProcess$7.attempt(ManagedOfficeProcess.java:167)
at bboss.org.artofsolving.jodconverter.office.Retryable.execute(Retryable.java:40)
at bboss.org.artofsolving.jodconverter.office.Retryable.execute(Retryable.java:30)
at bboss.org.artofsolving.jodconverter.office.ManagedOfficeProcess.doStartProcessAndConnect(ManagedOfficeProcess.java:183)
at bboss.org.artofsolving.jodconverter.office.ManagedOfficeProcess.access$000(ManagedOfficeProcess.java:34)
at bboss.org.artofsolving.jodconverter.office.ManagedOfficeProcess$1.run(ManagedOfficeProcess.java:61)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Exception in thread "main" bboss.org.artofsolving.jodconverter.office.OfficeException: failed to start and connect
at bboss.org.artofsolving.jodconverter.office.ManagedOfficeProcess.startAndWait(ManagedOfficeProcess.java:67)
at bboss.org.artofsolving.jodconverter.office.PooledOfficeManager.start(PooledOfficeManager.java:101)
at bboss.org.artofsolving.jodconverter.office.ProcessPoolOfficeManager.start(ProcessPoolOfficeManager.java:64)
at ModifiedXWPF.JODConv.main(JODConv.java:26)
Caused by: java.util.concurrent.ExecutionException: java.lang.UnsatisfiedLinkError: org.hyperic.sigar.ptql.SigarProcessQuery.create(Ljava/lang/String;)V
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at bboss.org.artofsolving.jodconverter.office.ManagedOfficeProcess.startAndWait(ManagedOfficeProcess.java:65)
... 3 more
Caused by: java.lang.UnsatisfiedLinkError: org.hyperic.sigar.ptql.SigarProcessQuery.create(Ljava/lang/String;)V
at org.hyperic.sigar.ptql.SigarProcessQuery.create(Native Method)
at org.hyperic.sigar.ptql.ProcessQueryFactory.getQuery(ProcessQueryFactory.java:66)
at org.hyperic.sigar.ptql.ProcessFinder.find(ProcessFinder.java:68)
at org.hyperic.sigar.ptql.ProcessFinder.find(ProcessFinder.java:56)
at bboss.org.artofsolving.jodconverter.process.SigarProcessManager.findPid(SigarProcessManager.java:42)
at bboss.org.artofsolving.jodconverter.office.OfficeProcess.start(OfficeProcess.java:67)
at bboss.org.artofsolving.jodconverter.office.OfficeProcess.start(OfficeProcess.java:62)
at bboss.org.artofsolving.jodconverter.office.ManagedOfficeProcess$7.attempt(ManagedOfficeProcess.java:167)
at bboss.org.artofsolving.jodconverter.office.Retryable.execute(Retryable.java:40)
at bboss.org.artofsolving.jodconverter.office.Retryable.execute(Retryable.java:30)
at bboss.org.artofsolving.jodconverter.office.ManagedOfficeProcess.doStartProcessAndConnect(ManagedOfficeProcess.java:183)
at bboss.org.artofsolving.jodconverter.office.ManagedOfficeProcess.access$000(ManagedOfficeProcess.java:34)
at bboss.org.artofsolving.jodconverter.office.ManagedOfficeProcess$1.run(ManagedOfficeProcess.java:61)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)

@jagriti22
Copy link

Could you please help me with full source code including dependencies?

@kan2008tnptc
Copy link

kan2008tnptc commented May 28, 2020 via email

@ShamithaSIlva
Copy link

Hi,

I have the next dependencies in my pom.xml:

<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.xdocreport.document.docx</artifactId>
    <version>2.0.2</version>
</dependency>
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.xdocreport.document</artifactId>
    <version>2.0.2</version>
</dependency>
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId> fr.opensagres.xdocreport.core</artifactId>
    <version>2.0.2</version>
</dependency>
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.xdocreport.converter</artifactId>
    <version>2.0.2</version>
</dependency>
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.xdocreport.converter.docx.xwpf</artifactId>
    <version>2.0.2</version>
    <exclusions>
        <exclusion>
            <groupId>fr.opensagres.xdocreport</groupId>
            <artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId>
        </exclusion>
    </exclusions>
</dependency>
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.poi.xwpf.converter.pdf.itext5</artifactId>
    <version>2.0.2</version>
</dependency>
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.xdocreport.template</artifactId>
    <version>2.0.2</version>
</dependency>

The unit test is in the next url.

The problem with the document is in the attachments:
Document.xml from test.docx

Result after conversion

I saw that the problem is this, I don't know if i am good, however, in my opinion, the problem is the gridspan with the next labels:

  • <w:gridAfter w:val="1"/>
  • <w:wAfter w:w="12" w:type="dxa"/>

Regards.

Hi I know it's bit too late, when I added your dependencies I'm getting "Caused by: java.lang.NoClassDefFoundError: com/itextpdf/text/Document" any idea what's going on?

@gladshop
Copy link

I have same problem, pdf export is different word file. Does any solution?

@kulwantsharma
Copy link

is there any solution with open source libraries to convert docx to pdf properly? My docx is containing tables and texts. I have not been able to find a solution. Please help guys.
I tried libraries like docx4j and others which generates the pdf but tables are not getting rendered properly.
Please help guys.

@kan2008tnptc
Copy link

kan2008tnptc commented Aug 14, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants