Skip to content
Browse files

Documentation updated.

  • Loading branch information...
1 parent 8784223 commit 86348eba2f32c7e41fd1bafb4782193a3682f7fc @euske committed Oct 23, 2013
Showing with 65 additions and 4 deletions.
  1. +1 −1 Makefile
  2. +62 −0 README.md
  3. +0 −1 README.txt
  4. +2 −2 docs/index.html
View
2 Makefile
@@ -27,7 +27,7 @@ sdist: distclean MANIFEST.in
register: distclean MANIFEST.in
$(PYTHON) setup.py sdist upload register
-WEBDIR=$$HOME/Site/unixuser.org/python/$(PACKAGE)
+WEBDIR=$$HOME/work/Site/unixuser.org/python/$(PACKAGE)
publish:
$(CP) docs/*.html docs/*.png docs/*.css $(WEBDIR)
View
62 README.md
@@ -0,0 +1,62 @@
+## PDFMiner
+
+PDFMiner is a tool for extracting information from PDF documents.
+Unlike other PDF-related tools, it focuses entirely on getting
+and analyzing text data. PDFMiner allows one to obtain
+the exact location of text in a page, as well as
+other information such as fonts or lines.
+It includes a PDF converter that can transform PDF files
+into other text formats (such as HTML). It has an extensible
+PDF parser that can be used for other purposes than text analysis.
+
+
+** Features **
+
+ * Written entirely in Python.
+ * Parse, analyze, and convert PDF documents.
+ * PDF-1.7 specification support. (well, almost)
+ * CJK languages and vertical writing scripts support.
+ * Various font types (Type1, TrueType, Type3, and CID) support.
+ * Basic encryption (RC4) support.
+ * Outline (TOC) extraction.
+ * Tagged contents extraction.
+ * Automatic layout analysis.
+
+
+** How to Install **
+
+ * Install Python 2.4 or newer. (**Python 3 is not supported.**)
+ * Download the source code.
+ * Unpack it.
+ * Run `setup.py`:
+
+ $ python setup.py install
+
+ * Do the following test:
+
+ $ pdf2txt.py samples/simple1.pdf
+
+
+** For CJK Languages **
+
+In order to process CJK languages, do the following before
+running setup.py install:
+
+ $ make cmap
+ python tools/conv_cmap.py pdfminer/cmap Adobe-CNS1 cmaprsrc/cid2code_Adobe_CNS1.txt
+ reading 'cmaprsrc/cid2code_Adobe_CNS1.txt'...
+ writing 'CNS1_H.py'...
+ ...
+ $ python setup.py install
+
+On Windows machines which don't have <code>make</code> command,
+paste the following commands on a command line prompt:
+
+ mkdir pdfminer\cmap
+ python tools\conv_cmap.py -c B5=cp950 -c UniCNS-UTF8=utf-8 pdfminer\cmap Adobe-CNS1 cmaprsrc\cid2code_Adobe_CNS1.txt
+ python tools\conv_cmap.py -c GBK-EUC=cp936 -c UniGB-UTF8=utf-8 pdfminer\cmap Adobe-GB1 cmaprsrc\cid2code_Adobe_GB1.txt
+ python tools\conv_cmap.py -c RKSJ=cp932 -c EUC=euc-jp -c UniJIS-UTF8=utf-8 pdfminer\cmap Adobe-Japan1 cmaprsrc\cid2code_Adobe_Japan1.txt
+ python tools\conv_cmap.py -c KSC-EUC=euc-kr -c KSC-Johab=johab -c KSCms-UHC=cp949 -c UniKS-UTF8=utf-8 pdfminer\cmap Adobe-Korea1 cmaprsrc\cid2code_Adobe_Korea1.txt
+ python setup.py install
+
+
View
1 README.txt
@@ -1 +0,0 @@
-See docs/index.html
View
4 docs/index.html
@@ -9,7 +9,7 @@
<div align=right class=lastmod>
<!-- hhmts start -->
-Last Modified: Tue Oct 22 13:19:10 UTC 2013
+Last Modified: Tue Oct 22 15:16:49 UTC 2013
<!-- hhmts end -->
</div>
@@ -139,7 +139,7 @@
during installation:
<blockquote><pre>
# <strong>make cmap</strong>
-python tools/conv_cmap.py pdfminer/cmap Adobe-CNS1 cmaprsrc/cid2code_Adobe_CNS1.txt cp950 big5
+python tools/conv_cmap.py pdfminer/cmap Adobe-CNS1 cmaprsrc/cid2code_Adobe_CNS1.txt
reading 'cmaprsrc/cid2code_Adobe_CNS1.txt'...
writing 'CNS1_H.py'...
...

0 comments on commit 86348eb

Please sign in to comment.
Something went wrong with that request. Please try again.