Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BAD_INPUT_DATA] PDF to XML conversion failed with error code: 99 #166

Closed
yusuf-nathani opened this issue Mar 31, 2017 · 7 comments
Closed
Assignees
Labels
need help Issues where the contributors are even more incompetent than usual

Comments

@yusuf-nathani
Copy link

Hi, When using grobid with windows 8.1, TEI, for any pdf file i get following error.

Error encountered while requesting the server.
[BAD_INPUT_DATA] PDF to XML conversion failed with error code: 99

Following are from log

Model path: D:\...\grobid-master\grobid-home\models\header\model.wapiti
[DEBUG] org.grobid.core.document.DocumentSource: start pdf2xml
[DEBUG] org.grobid.core.document.DocumentSource: Executing: [D:\...\grobid-master\grobid-home\pdf2xml\win-64\pdftoxml_server, -blocks, -noImageInline, -fullFontName, -noImage, -annotation, -l, 2, D:\...\grobid-master\grobid-home\tmp\origin1660742701096340959.pdf, D:\...\grobid-master\grobid-home\tmp\CSccvkYTdu.lxml]
[ERROR] org.grobid.core.process.ProcessPdf2Xml: pdftoxml process finished with error code: 99. [D:\...\grobid-master\grobid-home\pdf2xml\win-64\pdftoxml_server, -blocks, -noImageInline, -fullFontName, -noImage, -annotation, -l, 2, D:\...\grobid-master\grobid-home\tmp\origin1660742701096340959.pdf, D:\...\grobid-master\grobid-home\tmp\CSccvkYTdu.lxml]
[ERROR] org.grobid.core.process.ProcessPdf2Xml: pdftoxml return message:pdftoxml version 1.0
(Based on Xpdf version 3.01, Copyright 1996-2005 Glyph & Cog, LLC)
Copyright 2004-2006 XEROX XRCE
Usage: pdftoxml [options] <PDF-file> [<xml-file>]
  -f <int>               : first page to convert
  -l <int>               : last page to convert
  -verbose               : display pdf attributes
  -noText                : do not extract textual objects
  -noImage               : do not extract Images (Bitmap and Vectorial)
  -noImageInline         : do not include images inline in the stream
  -outline               : create an outline file xml
  -annots                : create an annotations file xml
  -cutPages              : cut all pages in separately files
  -blocks                : add blocks informations whithin the structure
  -fullFontName          : fonts names are not normalized
  -nsURI <string>        : add the specified namespace URI
  -opw <string>          : owner password (for encrypted files)
  -upw <string>          : user password (for encrypted files)
  -q                     : don't print any messages or errors
  -v                     : print copyright and version info
  -h                     : print usage information
  -help                  : print usage information
  --help                 : print usage information
  -?                     : print usage information

[ERROR] org.grobid.service.process.GrobidRestProcessFiles: An unexpected exception occured: org.grobid.core.exceptions.GrobidException: [BAD_INPUT_DATA] org.gro
bid.core.exceptions.GrobidException: [BAD_INPUT_DATA] PDF to XML conversion failed with error code: 99
[DEBUG] org.grobid.core.utilities.IOUtilities: Removing D:\...\grobid-master\grobid-home\tmp\origin1660742701096340959.pdf
[DEBUG] org.grobid.service.process.GrobidRestProcessFiles: << org.grobid.service.process.GrobidRestProcessFiles.methodLogOut

Here command have constructed {{-annotation}} as an argument, but if look at log trace {{-annots}} should be argument name. If i manually change command with {{-annots}} and try to run from windows command prompt it does able to convert pdf2xml.

I have checked code here and -annotation is as an argument, which is not configurable to change it to {{-annots}}, https://github.com/kermitt2/grobid/blob/master/grobid-core/src/main/java/org/grobid/core/document/DocumentSource.java#L81

Can you please suggest workaround or possible solution here ?

@lfoppiano
Copy link
Collaborator

Dear @yusuf-nathani,
thanks for your report. Indeed it seems to be a difference between the version used and your version. We need to look into it to understand whether it's a problem between your version or windows version in general.

For the moment if you need to process pdf on your local windows machine, you can keep using the workaround you've successfully tested changing the parameter directly in the code. I will investigate into this.

Cheers
Luca

@lfoppiano lfoppiano self-assigned this Mar 31, 2017
@lfoppiano lfoppiano added the bug From Hemiptera and especially its suborder Heteroptera label Mar 31, 2017
@kermitt2
Copy link
Owner

Thanks ! yes, pdf2xml needs to be updated for win64.

@yusuf-nathani if I remember well, you should be able to use the latest stable release (grobid-0.4.1) without this problem.

@yusuf-nathani
Copy link
Author

@kermitt2 - Yes, stable grobid-0.4.1 working without any problem. Thanks.

@kermitt2
Copy link
Owner

kermitt2 commented May 11, 2017

see #185 and #161
need to recompile pdf3xml fork for Windows 64

@kermitt2 kermitt2 added need help Issues where the contributors are even more incompetent than usual and removed bug From Hemiptera and especially its suborder Heteroptera labels May 11, 2017
@lfoppiano
Copy link
Collaborator

Should be fixed with commit 75536cd. See comment on issue #185

Should you have still problems, feel free to reopen it.

@vishnudas-raveendran
Copy link

Hi @lfoppiano ,
I cloned the latest Grobid today (v0.6.1). Faced with PDF to XML conversion error. Attaching grobid-service.log and console.log. When running from cloud-miner service it is working fine.

I need to run it locally.
Grobid-service.log file: grobid-service.log

Console log on running batch from GrobidMain
console_Err

Screenshot of running as service:
99

The result is same with PDF file from any source.

Let me know, if you need anymore info

@lfoppiano
Copy link
Collaborator

@vishnudas-raveendran grobid is not supported to work on Windows. Unfortunately three platforms are too many for us, I recommend you to run it using docker.

See some more information: https://grobid.readthedocs.io/en/latest/Troubleshooting/#windows-related-issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need help Issues where the contributors are even more incompetent than usual
Projects
None yet
Development

No branches or pull requests

4 participants