Permalink
Commits on Sep 26, 2016
Commits on Sep 14, 2016
  1. Merge pull request #149 from jwilk/spelling

    Fix typos
    committed on GitHub Sep 14, 2016
Commits on Sep 13, 2016
  1. Fix typos

    jwilk committed Sep 13, 2016
Commits on Sep 11, 2016
  1. Merge pull request #145 from vinayak-mehta/glyphlist_link

    Replace old Adobe glyphlist link
    committed on GitHub Sep 11, 2016
  2. MANIFEST: include LICENSE

    committed Sep 11, 2016
  3. Merge pull request #95 from ashleyblackmore/patch-1

    Update setup.py
    committed on GitHub Sep 11, 2016
  4. Merge pull request #111 from PabloCastellano/master

    README: Changed line about Python 3 support
    committed on GitHub Sep 11, 2016
  5. Merge pull request #117 from native-api/png_pred_errors

    make ValueError's descriptive
    committed on GitHub Sep 11, 2016
  6. Merge pull request #129 from lucanaso/lucanaso-patch-1

    Fixed for rendering non breaking spaces (cid:160)
    committed on GitHub Sep 11, 2016
  7. Merge pull request #134 from speedplane/feature/Fix-Get-Filters

    Fix Bug with PDF Stream Decoder
    committed on GitHub Sep 11, 2016
  8. Merge pull request #139 from Daniel-KM/fix-tests

    Fixed tests.
    committed on GitHub Sep 11, 2016
  9. Added: LICENSE

    committed Sep 11, 2016
Commits on Sep 8, 2016
Commits on Jun 26, 2016
  1. Fixed tests.

    Daniel-KM committed Jun 26, 2016
Commits on Jun 14, 2016
  1. Revert changes unrelated to this feature.

    speedplane committed Jun 14, 2016
  2. Revert changes unrelated to this feature.

    speedplane committed Jun 14, 2016
  3. Revert changes unrelated to this branch.

    speedplane committed Jun 14, 2016
  4. Fix a bug with pdfminer which occurs when two or more filters are app…

    …lied to a stream, even though no parameters are specified. The code would previously drop all of the streams after the first due to misapplication of the zip function.
    speedplane committed Jun 14, 2016
Commits on Dec 9, 2015
  1. Fixed for rendering non breaking spaces (cid:160)

    As stated in the PDF specification ISO 32000-1, table in Annex D.2 Latin Character Set and Encodings page 653 to 656 (available here: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf):
    "The SPACE character shall also be encoded as 312 in MacRomanEncoding and as 240 in WinAnsiEncoding. This duplicate code shall signify a nonbreaking space; it shall be typographically the same as (U+003A) SPACE."
    The duplicate key was missing, therefore PDFMiner was returning the string "(cid:160)". 
    
    This fix adds the duplicate key in latin_enc.py
    glyphlist.py does not need to be modified as it already contains a key for non breaking space https://github.com/lucanaso/pdfminer/blob/master/pdfminer/glyphlist.py#L2755.
    lucanaso committed Dec 9, 2015
Commits on Aug 10, 2015
Commits on Jun 14, 2015
Commits on Apr 5, 2015
Commits on Feb 18, 2015
  1. Update setup.py

    Install missing pycrypto lib
    ashleyblackmore committed Feb 18, 2015
Commits on Dec 14, 2014
  1. Add gz to gitignore.

    speedplane committed Dec 14, 2014
  2. Use a .gitignore file.

    speedplane committed Dec 14, 2014
Commits on Dec 12, 2014
  1. Add my own .gitignore.

    speedplane committed Dec 12, 2014
  2. More fixes to layout. The compute neighbors function for horizontal l…

    …ines is only intended to find neighbors on differing lines. However, it's entirely possible that horizontal neighbors could appear.
    
    This commit finds horizontal neighbors in a horizonal line and merges them together into a single horizontal line if necessary.  This leads to much better text extraction  if the PDF was created in a funky way.
    
    For example (test case coming), I have seen PDFs which are written almost like vertical columns, but the text is entirely horizontal.
    speedplane committed Dec 12, 2014
  3. There are a number of relatively complex changes here. Comments are i…

    …n order of where the change appears.
    
    1.
    When detecting text in a horizontal line, we already add a space between words if separated by more than word_margin apart.  However now, we only do it if there is not already an existing space. This prevents multiple spaces being placed between words.
    
    2.
    Detect a horizontal line if the line is zero width. This improves our detection of horizonal lines when looking for both horizontal and vertical.
    
    3.
    Don't detect a vertical line if the previous letter is whitspace. Prevents double spaces being caught as vert lines.
    
    4.
    Improve upon an unfortunate O(N^2) algorithm which I have seen taking many minutes to execute.  Unfortunately, while the "fix" reduces algorithmic complexity, it isn't technically correct, so we only do it when we know things will take a long time.
    speedplane committed Nov 12, 2014
Commits on Dec 11, 2014
  1. Merge branch 'fix-makefile'

    speedplane committed Dec 11, 2014
  2. Remove the dependancy on python2. Also, allow tests to be run on cygw…

    …in by checking for it, and converting unix2dos line endings.
    speedplane committed Dec 11, 2014
  3. Merge branch 'euske-main/master'

    speedplane committed Dec 11, 2014
Commits on Dec 9, 2014