Skip to content

Commit

Permalink
version 20140915 pushed to PyPi as pdfminer_six
Browse files Browse the repository at this point in the history
  • Loading branch information
Goulu committed Sep 15, 2014
1 parent 4f8aa9f commit 8861d7e
Show file tree
Hide file tree
Showing 3 changed files with 17 additions and 94 deletions.
88 changes: 7 additions & 81 deletions docs/index.html
Expand Up @@ -82,14 +82,14 @@ <h3>Features</h3>
<h3><a name="download">Download</a></h3>
<p>
<strong>Source distribution:</strong><br>
<a href="http://pypi.python.org/pypi/pdfminer/">
http://pypi.python.org/pypi/pdfminer/
<a href="http://pypi.python.org/pypi/pdfminer_six/">
http://pypi.python.org/pypi/pdfminer_six/
</a>

<P>
<strong>github:</strong><br>
<a href="https://github.com/euske/pdfminer/">
https://github.com/euske/pdfminer/
<a href="https://github.com/goulu/pdfminer/">
https://github.com/goulu/pdfminer/
</a>

<h3><a name="wheretoask">Where to Ask</a></h3>
Expand All @@ -100,11 +100,9 @@ <h3><a name="wheretoask">Where to Ask</a></h3>
http://groups.google.com/group/pdfminer-users/
</a>


<h2><a name="install">How to Install</a></h2>
<ol>
<li> Install <a href="http://www.python.org/download/">Python</a> 2.6 or newer.
(<font color=red><strong>Python 3 is not supported.</strong></font>)
<li> Download the <a href="#source">PDFMiner source</a>.
<li> Unpack it.
<li> Run <code>setup.py</code> to install:<br>
Expand Down Expand Up @@ -372,82 +370,10 @@ <h4>Options</h4>
<dd> Increases the debug level.
</dl>

<h2><a name="changes">Changes</a></h2>
<h2><a name="changes">Changes:</a></h2>
<ul>
<li> 2014/03/28: Further bugfixes.
<li> 2014/03/24: Bugfixes and improvements for fauly PDFs.<br>
API changes:
<ul>
<li> <code>PDFDocument.initialize()</code> method is removed and no longer needed.
A password is given as an argument of a PDFDocument constructor.
</ul>
<li> 2013/11/13: Bugfixes and minor improvements.<br>
As of November 2013, there were a few changes made to the PDFMiner API
prior to October 2013. This is the result of code restructuring. Here
is a list of the changes:
<ul>
<li> <code>PDFDocument</code> class is moved to <code>pdfdocument.py</code>.
<li> <code>PDFDocument</code> class now takes a <code>PDFParser</code> object as an argument.
<li> <code>PDFDocument.set_parser()</code> and <code>PDFParser.set_document()</code> is removed.
<li> <code>PDFPage</code> class is moved to <code>pdfpage.py</code>.
<li> <code>process_pdf</code> function is implemented as <code>PDFPage.get_pages</code>.
</ul>
<li> 2013/10/22: Sudden resurge of interests. API changes.
Incorporated a lot of patches and robust handling of broken PDFs.
<li> 2011/05/15: Speed improvements for layout analysis.
<li> 2011/05/15: API changes. <code>LTText.get_text()</code> is added.
<li> 2011/04/20: API changes. LTPolygon class was renamed as LTCurve.
<li> 2011/04/20: LTLine now represents horizontal/vertical lines only. Thanks to Koji Nakagawa.
<li> 2011/03/07: Documentation improvements by Jakub Wilk. Memory usage patch by Jonathan Hunt.
<li> 2011/02/27: Bugfixes and layout analysis improvements. Thanks to fujimoto.report.
<li> 2010/12/26: A couple of bugfixes and minor improvements. Thanks to Kevin Brubeck Unhammer and Daniel Gerber.
<li> 2010/10/17: A couple of bugfixes and minor improvements. Thanks to standardabweichung and Alastair Irving.
<li> 2010/09/07: A minor bugfix. Thanks to Alexander Garden.
<li> 2010/08/29: A couple of bugfixes. Thanks to Sahan Malagi, pk, and Humberto Pereira.
<li> 2010/07/06: Minor bugfixes. Thanks to Federico Brega.
<li> 2010/06/13: Bugfixes and improvements on CMap data compression. Thanks to Jakub Wilk.
<li> 2010/04/24: Bugfixes and improvements on TOC extraction. Thanks to Jose Maria.
<li> 2010/03/26: Bugfixes. Thanks to Brian Berry and Lubos Pintes.
<li> 2010/03/22: Improved layout analysis. Added regression tests.
<li> 2010/03/12: A couple of bugfixes. Thanks to Sean Manefield.
<li> 2010/02/27: Changed the way of internal layout handling. (LTTextItem -&gt; LTChar)
<li> 2010/02/15: Several bugfixes. Thanks to Sean.
<li> 2010/02/13: Bugfix and enhancement. Thanks to Andr&eacute; Auzi.
<li> 2010/02/07: Several bugfixes. Thanks to Hiroshi Manabe.
<li> 2010/01/31: JPEG image extraction supported. Page rotation bug fixed.
<li> 2010/01/04: Python 2.6 warning removal. More doctest conversion.
<li> 2010/01/01: CMap bug fix. Thanks to Winfried Plappert.
<li> 2009/12/24: RunLengthDecode filter added. Thanks to Troy Bollinger.
<li> 2009/12/20: Experimental polygon shape extraction added. Thanks to Yusuf Dewaswala for reporting.
<li> 2009/12/19: CMap resources are now the part of the package. Thanks to Adobe for open-sourcing them.
<li> 2009/11/29: Password encryption bug fixed. Thanks to Yannick Gingras.
<li> 2009/10/31: SGML output format is changed and renamed as XML.
<li> 2009/10/24: Charspace bug fixed. Adjusted for 4-space indentation.
<li> 2009/10/04: Another matrix operation bug fixed. Thanks to Vitaly Sedelnik.
<li> 2009/09/12: Fixed rectangle handling. Able to extract image boundaries.
<li> 2009/08/30: Fixed page rotation handling.
<li> 2009/08/26: Fixed zlib decoding bug. Thanks to Shon Urbas.
<li> 2009/08/24: Fixed a bug in character placing. Thanks to Pawan Jain.
<li> 2009/07/21: Improvement in layout analysis.
<li> 2009/07/11: Improvement in layout analysis. Thanks to Lubos Pintes.
<li> 2009/05/17: Bugfixes, massive code restructuring, and simple graphic element support added. setup.py is supported.
<li> 2009/03/30: Text output mode added.
<li> 2009/03/25: Encoding problems fixed. Word splitting option added.
<li> 2009/02/28: Robust handling of corrupted PDFs. Thanks to Troy Bollinger.
<li> 2009/02/01: Various bugfixes. Thanks to Hiroshi Manabe.
<li> 2009/01/17: Handling a trailer correctly that contains both /XrefStm and /Prev entries.
<li> 2009/01/10: Handling Type3 font metrics correctly.
<li> 2008/12/28: Better handling of word spacing. Thanks to Christian Nentwich.
<li> 2008/09/06: A sample pdf2html webapp added.
<li> 2008/08/30: ASCII85 encoding filter support.
<li> 2008/07/27: Tagged contents extraction support.
<li> 2008/07/10: Outline (TOC) extraction support.
<li> 2008/06/29: HTML output added. Reorganized the directory structure.
<li> 2008/04/29: Bugfix for Win32. Thanks to Chris Clark.
<li> 2008/04/27: Basic encryption and LZW decoding support added.
<li> 2008/01/07: Several bugfixes. Thanks to Nick Fabry for his vast contribution.
<li> 2007/12/31: Initial release.
<li> 2004/12/24: Start writing the code out of boredom...
<li> 2014/09/15: pushed on PyPi</li>
<li> 2014/09/10: pdfminer_six forked from pdfminer since Yusuke didn't want to merge and pdfminer3k is outdated</li>
</ul>

<h2><a name="todo">TODO</a></h2>
Expand Down
2 changes: 1 addition & 1 deletion pdfminer/__init__.py
@@ -1,5 +1,5 @@
#!/usr/bin/env python
__version__ = '20140829'
__version__ = '20140915'

if __name__ == '__main__':
print (__version__)
21 changes: 9 additions & 12 deletions setup.py
Expand Up @@ -3,10 +3,13 @@
from pdfminer import __version__

setup(
name='pdfminer',
name='pdfminer_six',
version=__version__,
packages=['pdfminer',],
package_data={'pdfminer': ['cmap/*.pickle.gz']},
description='PDF parser and analyzer',
long_description='''PDFMiner is a tool for extracting information from PDF documents.
long_description='''fork of PDFMiner using six for Python 2+3 compatibility
PDFMiner is a tool for extracting information from PDF documents.
Unlike other PDF-related tools, it focuses entirely on getting
and analyzing text data. PDFMiner allows to obtain
the exact location of texts in a page, as well as
Expand All @@ -15,15 +18,9 @@
into other text formats (such as HTML). It has an extensible
PDF parser that can be used for other purposes instead of text analysis.''',
license='MIT/X',
author='Yusuke Shinyama',
author_email='yusuke at cs dot nyu dot edu',
url='http://euske.github.io/pdfminer/index.html',
packages=[
'pdfminer',
],
package_data={
'pdfminer': ['cmap/*.pickle.gz']
},
author='Yusuke Shinyama + Philippe Guglielmetti',
author_email='pdfminer@goulu.net',
url='http://github.com/goulu/pdfminer',
scripts=[
'tools/pdf2txt.py',
'tools/dumppdf.py',
Expand All @@ -34,7 +31,7 @@
'Programming Language :: Python',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3.4',
'Development Status :: 4 - Beta',
'Development Status :: 5 - Production/Stable',
'Environment :: Console',
'Intended Audience :: Developers',
'Intended Audience :: Science/Research',
Expand Down

0 comments on commit 8861d7e

Please sign in to comment.