Permalink
Fetching contributors…
Cannot retrieve contributors at this time
547 lines (401 sloc) 18.3 KB
ocrodjvu (0.10.3) UNRELEASED; urgency=low
*
-- Jakub Wilk <jwilk@jwilk.net> Mon, 20 Feb 2017 20:41:14 +0100
ocrodjvu (0.10.2) unstable; urgency=low
* Make --version print also versions of Python and the libraries.
* Make --version print to stdout, not stderr.
* Make bad usage exit status 1.
* Drop support for PyICU < 1.0.
* Update DocBook XSL homepage URL.
-- Jakub Wilk <jwilk@jwilk.net> Tue, 07 Feb 2017 23:58:08 +0100
ocrodjvu (0.10.1) unstable; urgency=low
* Don't hardcode the Python interpreter path in script shebangs; use
“#!/usr/bin/env python” instead.
* Include a missing test image in the tarball.
* Update Tesseract homepage URL.
* Update bug tracker URLs.
The project repo has moved to GitHub.
-- Jakub Wilk <jwilk@jwilk.net> Tue, 22 Nov 2016 16:45:13 +0100
ocrodjvu (0.10) unstable; urgency=low
* Add support for cuneiform-multilang as OCR engine.
Thanks to Alexey Shipunov for the bug report.
* Improve error handling.
-- Jakub Wilk <jwilk@jwilk.net> Fri, 17 Jun 2016 11:04:16 +0200
ocrodjvu (0.9.2) unstable; urgency=low
* Fix crashes on empty pages.
https://github.com/jwilk/ocrodjvu/issues/18
https://github.com/jwilk/ocrodjvu/issues/7
Thanks to Janusz S. Bień for the bug report.
* Fix typos.
* Ignore boring diagnostic messages from Tesseract.
* Update the HTML5 specification URLs.
* Update the ICU website URL.
* Update the PyICU website URL.
* Rename the test modules, so that passing --all to nosetests is no longer
necessary.
-- Jakub Wilk <jwilk@jwilk.net> Tue, 31 May 2016 22:10:30 +0200
ocrodjvu (0.9.1) unstable; urgency=low
* Use the subprocess32 module (a thread-safe replacement for the subprocess
module) when it's available.
* Issue a warning when the -j/--jobs is enabled, but the subprocess module
is not thread-safe.
* Include an example script for converting scans to DjVu + hOCR.
* Improve error handling.
-- Jakub Wilk <jwilk@jwilk.net> Tue, 25 Aug 2015 23:36:12 +0200
ocrodjvu (0.9) unstable; urgency=low
* If python-djvulibre >= 0.4 is installed, don't escape non-ASCII characters
in djvused scripts.
https://github.com/jwilk/ocrodjvu/issues/13
Thanks to Janusz S. Bień for the bug report.
* Improve error handling.
-- Jakub Wilk <jwilk@jwilk.net> Mon, 27 Jul 2015 21:34:39 +0200
ocrodjvu (0.8) unstable; urgency=low
* Change the default OCR engine to Tesseract.
* Add the “tesseract: ” prefix to messages Tesseract prints on stderr.
https://github.com/jwilk/ocrodjvu/issues/10
Thanks to Janusz S. Bień for the bug report.
* Ensure that exit code is non-zero if the program recovered from an error.
https://github.com/jwilk/ocrodjvu/issues/6
* Improve error handling.
-- Jakub Wilk <jwilk@jwilk.net> Wed, 10 Jun 2015 21:17:32 +0200
ocrodjvu (0.7.19) unstable; urgency=low
* Make sure that text zones are at least 1 pixel wide and 1 pixel high.
* Tesseract: fix splitting bounding boxes for character clusters.
https://github.com/jwilk/ocrodjvu/issues/12
Thanks to Janusz S. Bień for the bug report.
* Fix typos in the documentation.
-- Jakub Wilk <jwilk@jwilk.net> Tue, 11 Nov 2014 18:11:36 +0100
ocrodjvu (0.7.18) unstable; urgency=low
[ Filip Graliński ]
* Fix counting pages when file identifier cannot be converted to locale
encoding.
[ Jakub Wilk ]
* Use HTTPS URLs when they are available, in documentation and code.
* Update some stale URLs in documentation and code.
-- Jakub Wilk <jwilk@jwilk.net> Tue, 22 Apr 2014 11:22:01 +0200
ocrodjvu (0.7.17) unstable; urgency=low
* Fix compatibility with Tesseract > 3.02.
https://github.com/jwilk/ocrodjvu/issues/9
Thanks to Heinrich Schwietering for the bug report.
* ocrodjvu:
+ Ensure that exit code is non-zero if the program was interrupted by
user.
+ Fix typos in the documentation.
-- Jakub Wilk <jwilk@jwilk.net> Tue, 04 Feb 2014 11:28:46 +0100
ocrodjvu (0.7.16) unstable; urgency=low
* Use “en-US-POSIX” as the default locale for ICU.
* ocrodjvu:
+ Fix option names in documentation of the --ocr-only option.
+ Don't crash if file identifier is not in UTF-8 or if it cannot be
converted to locale encoding; use the page number instead.
https://github.com/jwilk/ocrodjvu/issues/4
+ Don't hang if a page cannot be decoded.
https://github.com/jwilk/ocrodjvu/issues/5
-- Jakub Wilk <jwilk@jwilk.net> Sun, 28 Apr 2013 15:08:19 +0200
ocrodjvu (0.7.15) unstable; urgency=low
* Strip trailing whitespace from text zones bigger than words (lines,
paragraphs, …).
* Fix compatibility with Tesseract 3.02.
Thanks to Janusz S. Bień for the bug report.
* ocrodjvu:
+ Make it possible to pass multiple languages to Tesseract ≥ 3.02.
https://github.com/jwilk/ocrodjvu/issues/3
Thanks to Janusz S. Bień for the bug report.
+ Cuneiform: rename mixed Russian-English language code:
“rus-eng” → “rus+eng”. This is for consistency with Tesseract.
+ Tesseract: fix support for Chinese language pack.
+ Tesseract: make it possible to pass the -psm option in order to
customize layout analysis. For example, to enable OSD, use:
“-X extra_args='-psm 1'”.
+ Make --list-languages output sorted.
+ Tesseract: remove “osd” from language list.
+ Accept both ISO 639-2/T and ISO 639-2/B language codes.
+ Add the --save-raw-ocr option.
+ Add the --raw-ocr-filename-template option.
+ Improve documentation of the --ocr-only option.
* Require Python ≥ 2.6.
* Fix compatibility with nose 1.2.
Thanks to Kyrill Detinov for the bug report.
-- Jakub Wilk <jwilk@jwilk.net> Wed, 17 Apr 2013 00:59:23 +0200
ocrodjvu (0.7.14) unstable; urgency=low
* Document which versions of OCRopus are supported.
* Document that PyICU and html5lib are only required for some optional
features.
* Document what software is needed to rebuild the manual pages from source.
* djvu2hocr:
+ Add the --title option.
+ Add the --css option.
+ Document the -p/--pages option.
-- Jakub Wilk <jwilk@jwilk.net> Fri, 15 Mar 2013 13:54:05 +0100
ocrodjvu (0.7.13) unstable; urgency=low
* Abort early if one tries to use an incompatible Python version.
* Improve the manual pages, as per man-pages(7) recommendations:
+ Remove the “AUTHOR” sections.
+ Rename the “REPORTING BUGS” sections as “BUGS”.
* Improve the test suite.
* Make “setup.py clean -a” remove compiled manual pages (unless they were
built by “setup.py sdist”).
-- Jakub Wilk <jwilk@jwilk.net> Thu, 14 Feb 2013 23:28:56 +0100
ocrodjvu (0.7.12) unstable; urgency=low
* Don't let “-X fix-html=1” break HTML snippets ocrodjvu generates itself
for the “-t chars” Tesseract support.
Thanks to Janusz S. Bień for the test case.
-- Jakub Wilk <jwilk@jwilk.net> Wed, 15 Aug 2012 19:32:57 +0200
ocrodjvu (0.7.11) unstable; urgency=low
* hocr2djvused:
+ Allow processing multiple hOCR documents at once.
https://github.com/jwilk/ocrodjvu/issues/1
Thanks to Thomas Koch for the bug report and the initial patch.
* Fix merging results of two Tesseract runs.
Thanks to Janusz S. Bień for the bug report.
-- Jakub Wilk <jwilk@jwilk.net> Mon, 28 May 2012 19:43:22 +0200
ocrodjvu (0.7.10) unstable; urgency=low
* Improve error handling.
* ocrodjvu:
+ Attempt to fix encoding issues and eliminate unwanted control characters
in files produced by Tesseract and Cuneiform.
https://bugs.debian.org/671764
Thanks to Thomas Koch for the bug report.
* hocr2djvused:
+ Add the --fix-utf8 option.
* djvu2hocr:
+ Translate DjVu “region” to <div class="ocrx_block"> (instead of <span…>,
which was causing XHTML validity errors).
* Tests: fix compatibility with PIL ≥ 1.2.
* Include example scans2djvu+hocr script.
* Fix merging results of two Tesseract runs.
Thanks to Janusz S. Bień for the bug report.
* Use RFC 3339 date format in the manual page. Don't call external programs
to build it.
-- Jakub Wilk <jwilk@jwilk.net> Sat, 12 May 2012 00:37:50 +0200
ocrodjvu (0.7.9) unstable; urgency=low
* Improve error handling.
* Fix compatibility with Tesseract > 3.01.
-- Jakub Wilk <jwilk@jwilk.net> Sat, 10 Mar 2012 23:36:03 +0100
ocrodjvu (0.7.8) unstable; urgency=low
* Improve test suite.
-- Jakub Wilk <jwilk@jwilk.net> Sun, 22 Jan 2012 00:04:16 +0100
ocrodjvu (0.7.7) unstable; urgency=low
* Raise proper import error if html5lib is not installed.
Thanks to Kyrill Detinov for the bug report.
-- Jakub Wilk <jwilk@jwilk.net> Sun, 11 Dec 2011 23:08:05 +0100
ocrodjvu (0.7.6) unstable; urgency=low
* Improve error handling.
* ocrodjvu:
+ Fix a regression in gocr, ocrad and tesseract engines, which made them
unusable.
-- Jakub Wilk <jwilk@jwilk.net> Thu, 27 Oct 2011 18:06:38 +0200
ocrodjvu (0.7.5) unstable; urgency=low
* Check Python version in setup.py.
* Accept slightly malformed hOCR documents (with a text zone not completely
within the page area).
https://bugs.debian.org/575484#35
* Fix compatibility with Tesseract > 3.00.
Thanks to Janusz S. Bień for the bug report.
* ocrodjvu, hocr2djvused:
+ Add the --html5 option.
-- Jakub Wilk <jwilk@jwilk.net> Sat, 27 Aug 2011 01:25:33 +0200
ocrodjvu (0.7.4) unstable; urgency=low
* Use a better method to detect Debian-based systems.
* hocr2djvused:
+ Ignore comments and <script> elements in hOCR.
* ocrodjvu:
+ For Tesseract ≥ 3.00, extract bounding boxes of particular characters
with higher accuracy.
-- Jakub Wilk <jwilk@jwilk.net> Wed, 27 Jul 2011 17:34:38 +0200
ocrodjvu (0.7.2) unstable; urgency=low
* Don't hang if one of the threads raises an exception.
* Use the logging module for printing progress messages, errors etc.
* Produce more useful import error messages on Debian-based systems.
-- Jakub Wilk <jwilk@jwilk.net> Mon, 04 Apr 2011 01:14:22 +0200
ocrodjvu (0.7.1) unstable; urgency=low
* Windows: guess location of the DjVuLibre DLL (requires python-djvulibre
≥ 0.3.3).
* ocrodjvu:
+ Work around a bug in Cuneiform, which mistakenly use “slo” (rather than
“slv”) as language code for Slovenian.
https://bugs.launchpad.net/cuneiform-linux/+bug/707951
+ Accept “ces”, “nld”, “slv”, “ron” as language codes for Czech, Dutch,
Slovenian and Romanian languages, even when Cuneiform internally use
different ones.
* djvu2hocr:
+ Don't flip hOCR upside-down.
https://bugs.debian.org/611460
-- Jakub Wilk <jwilk@jwilk.net> Sat, 29 Jan 2011 18:14:40 +0100
ocrodjvu (0.7.0) unstable; urgency=low
* Correctly handle empty pages recognized by Cuneiform and Ocrad.
Thanks to Alexey Shipunov for the bug report.
* Fix crash on Cuneiform-generated hOCR with bounding boxes for whitespace
characters.
Thanks to Alexey Shipunov for the bug report.
* Fix compatibility with Tesseract 3.00.
* Fix colors in 24-bit BMP images.
* ocrodjvu:
+ Make “-e” an alias for “--engine”.
+ Make “-l” an alias for “--language”.
+ Add the -X option (for advanced users).
+ Work-around for Cuneiform returning files with control characters is now
disabled by default. Use “-X fix-html=1” to re-enable it.
+ Add the --on-error option (for advanced users).
* djvu2hocr:
+ Fix a typo, which prevented hocr2djvused from correctly parsing files
produced by it.
https://bugs.debian.org/600539
* Extend the test suite.
-- Jakub Wilk <jwilk@jwilk.net> Sun, 07 Nov 2010 21:37:00 +0100
ocrodjvu (0.6.1) unstable; urgency=high
* Improve detection of Tesseract.
* Correctly handle unrecognized and non-ASCII characters in Ocrad ORF output.
Thanks to Heinrich Schwietering for the bug report.
* Correctly handle text that is closer than 100 pixels from the left edge in
Ocrad ORF output.
Thanks to Heinrich Schwietering for the test case.
* Fix crash on hOCR with image elements.
https://bugs.debian.org/598139
Thanks to Alexey Shipunov for the bug report.
* Fix insecure use of temporary files when using Cuneiform.
https://bugs.debian.org/598134
CVE-2010-4338
-- Jakub Wilk <jwilk@jwilk.net> Sun, 26 Sep 2010 15:01:51 +0200
ocrodjvu (0.6.0) unstable; urgency=low
* Add support for the Tesseract OCR engine.
* Fix Cuneiform support (a regression introduced in 0.5).
Thanks to Kyrill Detinov for the bug report.
-- Jakub Wilk <jwilk@jwilk.net> Thu, 16 Sep 2010 19:24:20 +0200
ocrodjvu (0.5.1) unstable; urgency=low
* Fix crash when listing engines/languages if OCRopus is not found.
Thanks to Kyrill Detinov for the bug report.
* lxml is no longer required for OCR engines that are not using hOCR as
output format.
-- Jakub Wilk <jwilk@jwilk.net> Wed, 15 Sep 2010 18:38:00 +0200
ocrodjvu (0.5.0) unstable; urgency=low
* Add support for the Ocrad OCR engine.
* Add support for the GOCR engine.
* Cuneiform is no longer required to be linked with ImageMagick.
* Prevent Cuneiform from asking interactive questions.
Thanks to Heinrich Schwietering for the bug report.
* Make sure that signals are handled in a sane way.
Thanks to Heinrich Schwietering for the bug report.
* Drop support for guessing page size from image (scan) contents.
* Let the setup.py script install manual pages.
Thanks to Kyrill Detinov and Heinrich Schwietering for bug reports.
-- Jakub Wilk <jwilk@jwilk.net> Tue, 14 Sep 2010 23:00:35 +0200
ocrodjvu (0.4.7) unstable; urgency=low
* Preserve as much environment as possible when calling external programs.
https://bugs.debian.org/594385
Thanks to Heinrich Schwietering for the bug report.
-- Jakub Wilk <jwilk@jwilk.net> Wed, 25 Aug 2010 20:27:17 +0200
ocrodjvu (0.4.6) unstable; urgency=low
* Implement work-around for Cuneiform returning files with control
characters.
Thanks to Kyrill Detinov for the bug report.
* Avoid deprecation warnings with PyICU ≥ 1.0.
https://bugs.debian.org/589027
* djvu2hocr:
+ Don't crash on very long documents.
https://bugs.debian.org/591389
-- Jakub Wilk <jwilk@jwilk.net> Tue, 03 Aug 2010 20:33:49 +0200
ocrodjvu (0.4.5) unstable; urgency=low
* Fix handling of “deu” and “rus-eng” languages.
Thanks to Kyrill Detinov for the bug report.
* Properly handle hOCR with inline formatting.
Thanks to Kyrill Detinov for the bug report.
* djvu2hocr:
+ Add ocr-system and ocr-capabilities meta information.
-- Jakub Wilk <jwilk@jwilk.net> Mon, 24 May 2010 21:22:39 +0200
ocrodjvu (0.4.4) unstable; urgency=low
* Document that ocrodjvu honours TMPDIR environment variable.
https://bugs.debian.org/575488
* Don't remove temporary directory if ocrodjvu crashed.
https://bugs.debian.org/575487
-- Jakub Wilk <jwilk@jwilk.net> Fri, 02 Apr 2010 12:00:11 +0200
ocrodjvu (0.4.3) unstable; urgency=low
* Don't crash on --version.
https://bugs.debian.org/573496
* Give more meaningful error messages on a malformed hOCR produced by
Cuneiform.
https://bugs.debian.org/572522
* Document how djvu2hocr deals with non-XML characters.
-- Jakub Wilk <jwilk@jwilk.net> Fri, 19 Mar 2010 01:22:54 +0100
ocrodjvu (0.4.2) unstable; urgency=low
* New options for ocrodjvu:
+ --render=mask,
+ --render=foreground,
+ --render=all.
https://bugs.debian.org/572081
* Fix off-by-one error in text area coordinates.
* Add support for Cuneiform 0.9.
-- Jakub Wilk <jwilk@jwilk.net> Wed, 03 Mar 2010 21:27:15 +0100
ocrodjvu (0.4.1) unstable; urgency=low
* Be stricter when reading hOCR produced by OCRopus 0.3.1.
-- Jakub Wilk <jwilk@jwilk.net> Fri, 22 Jan 2010 20:25:54 +0100
ocrodjvu (0.4.0) unstable; urgency=low
* Add support for the Cuneiform OCR engine.
New options for ocrodjvu:
+ --engine,
+ --list-engines.
* Don't crash on non-ASCII file names.
Thanks to Jean-Christophe Heger for the bug report.
* hocr2djvused:
+ Add the --page-size option.
* ocrodjvu:
+ Add the -j/--jobs option.
-- Jakub Wilk <jwilk@jwilk.net> Thu, 21 Jan 2010 23:41:37 +0100
ocrodjvu (0.3.2) unstable; urgency=low
* Accept negative numbers in hOCR bounding boxes.
* djvu2hocr:
+ Fix broken UAX #29 segmentation.
+ Provide correct page bounding boxes.
-- Jakub Wilk <jwilk@jwilk.net> Fri, 08 Jan 2010 17:46:51 +0100
ocrodjvu (0.3.1) unstable; urgency=low
* djvu2hocr:
+ Fix broken UAX #29 segmentation.
-- Jakub Wilk <jwilk@jwilk.net> Sun, 03 Jan 2010 12:56:08 +0100
ocrodjvu (0.3.0) unstable; urgency=low
* Python ≥ 2.5 is now required.
* argparse module in now required.
* Add support for OCRopus 0.3.1.
* Give better error messages when Tesseract language pack cannot be found.
* New options for ocrodjvu:
+ -t/--details;
+ --word-segmentation.
* New options for hocr2djvused:
+ --rotation,
+ -t/--details,
+ --word-segmentation,
* New tool: djvu2hocr.
-- Jakub Wilk <jwilk@jwilk.net> Wed, 16 Dec 2009 18:42:21 +0100
ocrodjvu (0.2.1) unstable; urgency=low
* Give a clearer error message if OCRopus were interrupted by a signal.
* Add the --language option.
* Add the --list-languages option.
-- Jakub Wilk <jwilk@jwilk.net> Sat, 17 Oct 2009 17:34:43 +0200
ocrodjvu (0.2.0) unstable; urgency=low
* Provide a manual page.
* Add the -D/--debug option.
* Add options to specify how results are stored:
+ -o/--save-bundled,
+ -i/--save-indirect,
+ --save-script,
+ --in-place,
+ --dry-run.
* Add the --clear-text option.
* Add the --ocr-only option.
* Please use the --in-place and --clear-text options to retain compatibility
with ocrodjvu < 0.2.
-- Jakub Wilk <jwilk@jwilk.net> Wed, 14 Oct 2009 20:53:48 +0200
ocrodjvu (0.1.3) unstable; urgency=low
* Use ocroscript, rather than ocrocmd.
-- Jakub Wilk <jwilk@jwilk.net> Sun, 15 Mar 2009 19:01:11 +0100
ocrodjvu (0.1.2) unstable; urgency=low
* Make hocr2djvused work with hOCR for multiple pages.
* Handle rotated pages correctly.
* Ignore IW44-only pages.
-- Jakub Wilk <jwilk@jwilk.net> Mon, 23 Jun 2008 20:14:42 +0200
ocrodjvu (0.1.1) unstable; urgency=low
* Depend on python-lxml.
* Better compatibility with Python 2.4.
-- Jakub Wilk <jwilk@jwilk.net> Wed, 14 May 2008 11:23:13 +0200
ocrodjvu (0.1) unstable; urgency=low
* Initial release.
-- Jakub Wilk <jwilk@jwilk.net> Wed, 07 May 2008 18:29:40 +0200