-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tabulizer: Bindings for Tabula PDF Table Extractor Library #42
Comments
Thanks for your submission @leeper Seeking reviewers now |
Reviewers: @lmullen, @davidgohel |
This package is a very useful new package. It is also well written ; code is simple and clear, standards are respected. It was easy and fun to dive into the code. (Also I realized I could simplify some of my old This work is a very good reference for package developpement with I have tested it on:
Everything worked as expected. Installation is simple and the vignette is clear. I enjoy the There is nothing to complain about in the code. However, I made efforts to find things to say ;) Headless ModeI found out one issue that can be easily solved. Headless Mode: it should be set to true by default (this can be an issue with RStudio server for example). If not, pdfbox might not work on some servers. See this article. In
Encoding issue with WindowsI met one annoying issue. I made the french test, trying to use Note : I tried to set encoding parameters but it did not solve the issue.
It seems the problem come from pdfbox and the way encodings are managed (not sure about that). To demo the issue:
Below a partial copy of the result :
Below a partial copy of the result on my mac :
Minor issues / suggestionsDESCRIPTION fileI think R-Core team could ask you to add authors of the java packages as contributor and copyright holder (that was my case for package
file path with
|
Thank you, @davidgohel! I have now implemented most of this, but am still working on the non-latin character issue and finalizing the logging functionality. |
@lmullen @davidgohel |
Following up on the first review, I believe I have now addressed all issues:
Thank you again for the feedback! |
This is a very well done package. The code is a pleasure to read, and the interface is well thought out. The core functionality works as expected. This package is so useful that I would really love it if this package worked on PDFs of 19th-century documents: alas, it's not to be. (Hardly the fault of the package, of course.) @davidgohel has already offered a number of suggestions which have been dealt with, so I'll deal mostly with unexpected errors when using the interface. TestsI've run R CMD check on this package with R 3.3.0 on Mac OS X 10.11.5 and R 3.3.0 on Ubuntu 16.04. (The rest of my review was on the Mac. I don't have a Windows machine to test that installation, but the instructions in the README seemed clear.)
I gather that this is part of the issue which @davidgohel mentioned which is being worked out. InterfaceFor testing I used a PDF form the U.S. Census's Statistical Abstract Some minor questions:
This comes from
If this isn't a problem with the underlying Java code, then this error message should let the user know why the spreadsheet option shouldn't be used.
After playing with it for a while, I was able to figure out the interface to Miscellaneous
Nice work, and I look forward to using the package myself. |
Thank you, @lmullen! I will work to address these points and post an update once I have finished. |
I have attempted to address all of your comments and have made the following revisions:
On smaller points:
Thank you, again, @davidgohel and @lmullen, for your feedback! It is very much appreciated. |
thanks so so much for your reviews @davidgohel and @lmullen ! having a quick look myself ... |
a few things:
#> Error in (function (title, width, height, pointsize, family, antialias, :
#> unable to create quartz() device target, given type may not be supported
#> In addition: Warning message:
#> In (function (title, width, height, pointsize, family, antialias, :
#> No displays are available I guess lincoln and david didn't get this error though, so maybe i'm just an edge case |
@sckott Are you using the version of R downloadable from CRAN? I'm using the version from Homebrew. I wonder if that makes the difference about the Quartz error. I know at one point or another I've installed XQuartz because of the Homebrew version. |
yeah, from CRAN - I imagine we'd want this to work regardless of where R was installed from? |
Thanks, @sckott. I'm working on these issues now. |
I've just pushed an update to the |
will try soon |
Hi I've just tested on mac. There is a first issue in But then I got a
Here are my dev.cap:
and my session info:
|
And all is ok when changing line 98 to
|
Same problem, getting an error with |
Thank you everyone for the help so far. I've made some pretty radical changes to the |
I still get the same quartz error as above - haven't been able to google any solutions for this, weird |
Ah, that's frustrating. You're on OS X, right? What does |
"Darwin" |
👍 👍 👍 for the new Shiny interface. Any particular reason why you can't use the Shiny interface outside of RStudio and just open the gadget in a browser instead of the Viewer? |
yeah, should be possible |
Yes, I can do that. I'd like to get the current version working, too, though because I think the graphics device interface is actually even better because it is responsive to keystrokes, so you can navigate through the pages of the file and make changes to area selections. As far as I can tell, the Shiny version can't be configured in that way. |
I've now made the shiny interface the default, and added an option |
weird, now I'm getting the java error |
@leeper any progress on this? |
Well, I think it's basically ready, with the exception of the installation issue on OS X. (I don't have a mac, so I'm not sure what I can really do about it, unfortunately.) I'm experimenting with travis OS X builds https://travis-ci.org/leeper/tabulizer/builds/152634437, which seem mostly to be working? |
approved! ran
|
just a note about installation on my OSX 10.11.3 - I had to install a legacy version of Java from https://support.apple.com/kb/DL1572?locale=en_US instead of the version from Oracle that I had installed - after that installation works great, anyway ... now google know about this |
If you can transfer to |
Great! I've transferred the repositories, along with making all of the changes suggested in your goopractice note (except line length, because that's a bit of a pain to fix at the moment but I'll get to it). I also added a note about Java on Mac OS. |
Nice, thanks, will tweet about it today |
Thanks everyone from the @tabulapdf team! |
Sounds great, but I can't get it to install: In response to ghit::install_github(c("ropensci/tabulizerjars", "ropensci/tabulizer")) I get this: Warning in as.POSIXlt.POSIXct(x, tz) : Warning in as.POSIXlt.POSIXct(x, tz) : ropensci/tabulizerjars ropensci/tabulizer Why? |
@ajduncanson This is not the right place for support. Try the tabulizer repo itself. That said, the problem is probably not with tabulizer. The time zone warning sounds like the one with a recent version of R on MacOS, and you may have warnings set to errors. I suggest that you search for a solution to the time zone problem, check whether you have warnings set to errors, and if neither of those work, try an tabulizer repo. |
devtools
install instructionsdevtools::check()
produce any errors or warnings? If so paste them below.The text was updated successfully, but these errors were encountered: