Skip to content

smoothscan is a tool to convert scanned text into a vectorized output form.


Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



4 Commits

Repository files navigation

smoothscan 0.1.0


smoothscan is a tool to convert scanned text into a vectorized output
form. Because printed text is assembled from fonts, each particular
letter (like 'o') will have the same shape as every other 'o' in the
document. We can take advantage of this, by building a table of such
symbols, and represent each occurrence of a symbol with a reference to
that symbol's table entry. This will save a lot of space, and a
similar idea is used in djvu's jb2 mode and JBIG2 for PDF.

smoothscan builds up this table, but instead of filling the table with
the original raster images, it vectorizes each symbol. Vector images
will look smoother than their raster equivalents, and can be scaled
without introducing pixelation. These properties result in a smaller
output file size, as well as making the scanned text images more

smoothscan saves the vectorized images into a custom TrueType font and
embeds the font into the output pdf file. Currently each symbol is
mapped to an arbitrary letter in the font, but in future versions you
could run OCR on each symbol, and ensure that the 'o' image is
associate with the 'o' character encoding in the generated font.

To get good results, you must have good input. Higher resolution scans
capture more detail about the shape of each symbol, so a higher
quality vectorized version can be created. It's a good idea to process
your scanned images using a tool like ScanTailor before running

Current smoothscan can only process pure black and white 1bpp images,
but in the future support will be added for other formats, especially
ScanTailor's Mixed output mode.

smoothscan is currently targeted at GNU/Linux based systems, but
Windows and OS X will be supported in future versions.


You can always find the latest release on the project home page at
<>. Developers can check out
the latest bleeding edge source code on the project's Github page:


Please see the INSTALL file for installation instructions.


Please see the NEWS file for a high level overview of changes since
the last release.


smoothscan is licensed under the GPLv3+ Please see the COPYING file
for the full text of this license.


Please see the AUTHORS file for the list of contributors.


smoothscan is a tool to convert scanned text into a vectorized output form.