New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ingest-attachment plugin Font not found: TimesNewRomanPS-BoldMT #27198
Comments
thank you for creating the upstream issue against PDFBox! |
This is not the first issue we've seen dealing with parsing specific fonts. I think we can do better with the latest version of PDFBox that, if I am not mistaken, logs (instead of throws) these exceptions. That way we can still extract what we can from the pdf. |
I looked at this and it seems like that Apache Tika 1.17 depends on PDFBox 2.0.8:
I can see that TIKA will be updated to a new pdfbox version with https://issues.apache.org/jira/browse/TIKA-2178 (for other reasons). I'm unsure though if that will really fix the problem though. As PDFBox team asked, @TomonoriSoejima could you share the failing PDF document so they can reproduce the problem and we can also add it to make sure that Tika next version will fix it? Thanks! |
Ping @TomonoriSoejima. Could you please share a document? |
Unfortunately, a user I was dealing with the support case declined to share the reproducible file with us due to privacy and I don't have the file. |
https://issues.apache.org/jira/browse/TIKA-2579 has been fixed. \o/ |
No further feedback so closing. If this can be reproduced we can reopen the issue |
Describe the feature:
Elasticsearch version (
bin/elasticsearch --version
):ES 5.x
Plugins installed: []
ingest-attachment
JVM version (
java -version
):1.8.x
OS version (
uname -a
if on a Unix-like system):Description of the problem including expected versus actual behavior:
Steps to reproduce:
Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.
I have created an issue here.
https://issues.apache.org/jira/browse/PDFBOX-3985
Provide logs (if relevant):
The text was updated successfully, but these errors were encountered: