Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for PDF format #82

Merged
merged 12 commits into from Jul 31, 2013
Merged

Conversation

fawkesley
Copy link
Contributor

We've been exploring different options for parsing PDFs. Currently we're using an (alpha) in-house library called pdftables (we blogged about it here)

This pull request integrates pdftables into messytables. It is an optional requirement - if pdftables is not installed, messytables will work as usual and the PDF tests will be skipped.

We're looking into other ways of extracting tables from PDFs, but either way we'll need the messytables integration.

@rossjones
Copy link
Contributor

Think you need to add pdf tables to the test requirements file, assuming it's on pypi.

@rossjones
Copy link
Contributor

Sorry you might need to rebase since I merged #81. I'm interested in @domoritz's opinion on this one :)

@domoritz
Copy link
Contributor

My opinion is that you should never, ever change the history of something in the main repo (not even on a branch). Better create a new pr. However, I'm for rebasing on external branches or private branches because this keeps the history cleaner.

@rossjones
Copy link
Contributor

I meant opinion on the feature, not on rebasing on their private branch ;)

@domoritz
Copy link
Contributor

Ahh. IMHO, parsing tables in PDFs is super difficult but would be really awesome. As long as someone who just wants simple csv parsing does not have to install pdfminer and everything, I am for this feature.

@rossjones We talked about this before: I think we should move the requirements, that are only important for certain features, to a requirements.text file.

@fawkesley
Copy link
Contributor Author

@domoritz Agreed on it being super difficult. We'll stick to this approach of PDF support being optional.

@rossjones
Copy link
Contributor

I agree, as long as it is only the optional requirements rather than the core ones I am all for it.

Also @paulfurley don't forget the changelog ;)

@fawkesley
Copy link
Contributor Author

I'll get pdftables working on python 2.6 now and I'll give you a shout once I've rebased and modded the changelog :)

@fawkesley
Copy link
Contributor Author

OK, tests passing and rebased, think we're good to go :) @rossjones

rossjones added a commit that referenced this pull request Jul 31, 2013
@rossjones rossjones merged commit 64a72a0 into okfn:master Jul 31, 2013
@fawkesley fawkesley deleted the pdftables-support-okfn branch July 31, 2013 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants