-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating Tika (and associated PDF parser code) to version 2.4.1. #298
Updating Tika (and associated PDF parser code) to version 2.4.1. #298
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 22 of 22 files at r1, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @brosenberg42 and @hhuangMITRE)
a discussion (no related file):
We should also update the version of Tika used by the WFM. I created a separate task. I don't plan on landing that in 7.0.
java/TikaImageDetection/src/main/java/org/apache/tika/parser/pdf/image/ImageGraphicsEngine.java
line 1 at r1 (raw file):
/******************************************************************************
Please add a comment in the code near your modifications that explains them. For example:
// OpenMPF modification: blah blah
java/TikaTextDetection/pom.xml
line 74 at r1 (raw file):
</dependency> <dependency> <groupId>org.apache.tika</groupId>
Add a comment here explaining why this is still 1.28.1.
…imaize Language Detection library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 20 of 23 files reviewed, 3 unresolved discussions (waiting on @brosenberg42, @hhuangMITRE, and @jrobble)
java/TikaImageDetection/src/main/java/org/apache/tika/parser/pdf/image/ImageGraphicsEngine.java
line 1 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Please add a comment in the code near your modifications that explains them. For example:
// OpenMPF modification: blah blah
Done! I've added more details to the OpenMPF edit lines below. Let me know if any other edits are needed, thanks!
java/TikaTextDetection/pom.xml
line 74 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Add a comment here explaining why this is still 1.28.1.
Done. I did another sweep through the documents again, and then dug into the source code for the optimaize package. I found it next to four other language detection modules. I've updated the POM file and the library import for the Optimaize library, then confirmed the changes worked in mvn test
. Creating a new task to investigate the other modules next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 3 files at r2, all commit messages.
Reviewable status: 22 of 23 files reviewed, 1 unresolved discussion (waiting on @brosenberg42, @hhuangMITRE, and @jrobble)
java/TikaImageDetection/src/main/java/org/apache/tika/parser/pdf/image/ImageGraphicsEngine.java
line 1 at r1 (raw file):
Previously, hhuangMITRE (Howard W Huang) wrote…
Done! I've added more details to the OpenMPF edit lines below. Let me know if any other edits are needed, thanks!
Thanks. I'm going to update the lines to say "OpenMPF" just because that's the search term I use and why I didn't see the lines the first time. Not that I would expect you to know that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r2.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @brosenberg42 and @hhuangMITRE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r3, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @brosenberg42)
* Support new document formats. * Use PAGE_NUM = -1 where appropriate. * Remove leading 0's to PAGE_NUM and SECTION_NUM. * Updating Tika (and associated PDF parser code) to version 2.4.1. (#298) Co-authored-by: Jeff Robble <jrobble@mitre.org> Co-authored-by: Brian Rosenberg <brosenberg@mitre.org>
Issues:
This change is