-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: [PDF2XML_CONVERSION_FAILURE] PDF to XML conversion failed on pdf #185
Comments
Hello which version of GROBID are you using? Normally our fork version of pdf2xml will fix most of the PDF parsing failures. To check it, you can send me this particular pdf by email for instance. The fact that the header-only works fine is because for headers, only the first two pages of the PDF are parsed. |
Just checked for the full document with the web application. It works fine. I am using 0.42 Snapshot here. Oh and using Windows, Eclipse and not Linux if that helps anyway* |
Switched to 0.41 and this is the error I just got. Any ideas as to what could be causing it? Thanks in advance. |
pdf parsing library has been updated in version 0.4.2-SNAPSHOT, version in 0.4.1 is less robust. |
So, with the 0.4.2-SNAPSHOT version and this as the error: What could be the issue? |
I can't really say without testing the pdf myself. |
I have tried with different pdfs giving the same error. If it adds to anything, I added some of the libraries independently as external JARS and haven't used Maven dependency. Thanks in advance towards your reply. |
The error points to this statement btw: |
This PDF is working fine on Linux with the new pdf2xml fork. So you need to wait for a recompiled version of this new pdf2xml on Windows. I am not able to recompile it on Windows because I have no Windows machine but hopefully it will be done when releasing version 0.4.2 of GROBID by another contributor. Here is the resulting TEI: it looks actually very nice ;) |
It does indeed. I will give it a try in Linux once. Thanks. |
Not to sound silly but in the crash #166, it is stated that it works well with 0.4.1 which provides an error in my case as stated above. Is it due to pdf3xml fork for Win 64? |
The fact that #166 doesn't work with 0.4.2-SNAPSHOT is because pdf2xml is not recompiled for Windows - a different symptom but the same cause ;) |
thanks @rathancage you're welcome! |
Dear @rathancage, @UntoterOstgote, You should be able to run grobid on Windows without problems. Please bear in mind that the reference architecture for GROBID is Linux (moreover compiling things on windows is a real pain). Feel free to reopen the ticket if you have further problems. |
Hi, This is very urgent. Any help is highly appreciated. |
Hi, We talked today and saw that some files work, some don't on windows (both work on Unix).
Cheers, |
>>>>>>>> GROBID_HOME=C:\grobid-master\grobid-home [main] INFO org.grobid.core.main.LibraryLoader - Loading external native CRF library [main] INFO org.grobid.core.main.LibraryLoader - Loading Wapiti native library... [main] INFO org.grobid.core.main.LibraryLoader - Library crfpp loaded [main] INFO org.grobid.core.jni.WapitiModel - Loading model: C:\grobid-master\grobid-home\models\header\model.wapiti (size: 36094028) org.grobid.core.exceptions.GrobidException: [PDF2XML_CONVERSION_FAILURE] PDF to XML conversion failed on pdf file 1.pdf at org.grobid.core.document.DocumentSource.processPdf2XmlThreadMode(DocumentSource.java:184) at org.grobid.core.document.DocumentSource.pdf2xml(DocumentSource.java:133) at org.grobid.core.document.DocumentSource.fromPdf(DocumentSource.java:62) at org.grobid.core.document.DocumentSource.fromPdf(DocumentSource.java:49) at org.grobid.core.engines.HeaderParser.processing2(HeaderParser.java:84) at org.grobid.core.engines.Engine.processHeader(Engine.java:434) at org.grobid.core.engines.Engine.processHeader(Engine.java:410) at WeRe.Grobid.performFun(Grobid.java:25) at WeRe.MainClass.main(MainClass.java:12) [Wapiti] Loading model: "C:\grobid-master\grobid-home\models\header\model.wapiti" Model path: C:\grobid-master\grobid-home\models\header\model.wapiti
The above error shows up when I select a particular pdf. The same pdf gets processed for the header document over the web application. Can you inform as to what the error could be?
The text was updated successfully, but these errors were encountered: