Permalink
Browse files

Documentation updates.

  • Loading branch information...
1 parent cf1e3c9 commit e39e39fa12f7c4ef35029a0469d9dc98875b4a8f @euske committed Nov 17, 2013
Showing with 34 additions and 2 deletions.
  1. +21 −0 README.md
  2. +13 −2 docs/index.html
View
@@ -10,6 +10,7 @@ It includes a PDF converter that can transform PDF files
into other text formats (such as HTML). It has an extensible
PDF parser that can be used for other purposes than text analysis.
+
Features
--------
@@ -23,6 +24,7 @@ Features
* Tagged contents extraction.
* Automatic layout analysis.
+
How to Install
--------------
@@ -37,6 +39,7 @@ How to Install
$ pdf2txt.py samples/simple1.pdf
+
For CJK Languages
-----------------
@@ -60,6 +63,7 @@ paste the following commands on a command line prompt:
python tools\conv_cmap.py -c KSC-EUC=euc-kr -c KSC-Johab=johab -c KSCms-UHC=cp949 -c UniKS-UTF8=utf-8 pdfminer\cmap Adobe-Korea1 cmaprsrc\cid2code_Adobe_Korea1.txt
python setup.py install
+
Command Line Tools
------------------
@@ -87,6 +91,21 @@ but it's also possible to extract some meaningful contents (e.g. images).
(For details, refer to the html document.)
+
+API Changes
+-----------
+
+As of November 2013, there were a few changes made to the PDFMiner API
+prior to October 2013. This is the result of code restructuring. Here
+is a list of the changes:
+
+ * PDFDocument class is moved to pdfdocument.py.
+ * PDFDocument class now takes a PDFParser object as an argument.
+ PDFDocument.set_parser() and PDFParser.set_document() is removed.
+ * PDFPage class is moved to pdfpage.py
+ * process_pdf function is implemented as a class method PDFPage.get_pages.
+
+
TODO
----
@@ -97,6 +116,7 @@ TODO
* Better documentation.
* Crypt stream filter support.
+
Related Projects
----------------
@@ -105,6 +125,7 @@ Related Projects
* <a href="http://www.pdfbox.org/">pdfbox</a>
* <a href="http://mupdf.com/">mupdf</a>
+
Terms and Conditions
--------------------
View
@@ -9,7 +9,7 @@
<div align=right class=lastmod>
<!-- hhmts start -->
-Last Modified: Sat Oct 26 15:03:35 UTC 2013
+Last Modified: Sun Nov 17 06:32:44 UTC 2013
<!-- hhmts end -->
</div>
@@ -368,7 +368,18 @@ <h4>Options</h4>
<h2><a name="changes">Changes</a></h2>
<ul>
-<li> 2013/10/22: Sudden resurge of interests.
+<li> 2013/11/13: Bugfixes and minor improvements.<br>
+As of November 2013, there were a few changes made to the PDFMiner API
+prior to October 2013. This is the result of code restructuring. Here
+is a list of the changes:
+ <ul>
+ <li> <code>PDFDocument</code> class is moved to <code>pdfdocument.py</code>.
+ <li> <code>PDFDocument</code> class now takes a <code>PDFParser</code> object as an argument.
+ <li> <code>PDFDocument.set_parser()</code> and <code>PDFParser.set_document()</code> is removed.
+ <li> <code>PDFPage</code> class is moved to <code>pdfpage.py</code>.
+ <li> <code>process_pdf</code> function is implemented as <code>PDFPage.get_pages</code>.
+</ul>
+<li> 2013/10/22: Sudden resurge of interests. API changes.
Incorporated a lot of patches and robust handling of broken PDFs.
<li> 2011/05/15: Speed improvements for layout analysis.
<li> 2011/05/15: API changes. <code>LTText.get_text()</code> is added.

0 comments on commit e39e39f

Please sign in to comment.