New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for PDF format #82
Conversation
Think you need to add pdf tables to the test requirements file, assuming it's on pypi. |
My opinion is that you should never, ever change the history of something in the main repo (not even on a branch). Better create a new pr. However, I'm for rebasing on external branches or private branches because this keeps the history cleaner. |
I meant opinion on the feature, not on rebasing on their private branch ;) |
Ahh. IMHO, parsing tables in PDFs is super difficult but would be really awesome. As long as someone who just wants simple csv parsing does not have to install pdfminer and everything, I am for this feature. @rossjones We talked about this before: I think we should move the requirements, that are only important for certain features, to a |
@domoritz Agreed on it being super difficult. We'll stick to this approach of PDF support being optional. |
I agree, as long as it is only the optional requirements rather than the core ones I am all for it. Also @paulfurley don't forget the changelog ;) |
I'll get pdftables working on python 2.6 now and I'll give you a shout once I've rebased and modded the changelog :) |
… the underlying library ideally.
OK, tests passing and rebased, think we're good to go :) @rossjones |
We've been exploring different options for parsing PDFs. Currently we're using an (alpha) in-house library called pdftables (we blogged about it here)
This pull request integrates pdftables into messytables. It is an optional requirement - if pdftables is not installed, messytables will work as usual and the PDF tests will be skipped.
We're looking into other ways of extracting tables from PDFs, but either way we'll need the messytables integration.