DenoisingAutoencoder

Removing noise from scanned noisy Office documents using Convolutional Denoising Autoencoder

The Denoising Autoencoder is an extension of a classical Autoencoder, which aims to do some spatial operation on input to match the given output. Generally an Autoencoder is trained to copy the inputs, in order to learn latent features in lower dimensional space. An Autoencoder finds its applications in dimensionality reduction.

A Denoising Autoencoder follows similar principle but they try to remove noise from the input images. These find applications in computer vision to remove noise from a noisy stream on images.

1. Problem Statement

Can a denoising autoencoder be used to remove stains, footprints, marks resulting from folding or wrinkles from scanned documents containing text.

2. Data Description

The data used in this project is obtained from UCI Machine Learning Repository. The data used is NoisyOffice Dataset.

The dataset consists of 18 ground truth images, and 72 noisy images i.e. each clear image simulated with 4 kinds of noise (4*18 = 72). The 72 noisy images divided into training, validation and testing sets. The 4 kinds of noises which are simulated are folded sheets, wrinkled sheets, coffee stains, and footprints. There three different fonts used in the given scanned documents, which also has different foot note sizes and emphasis.

Clean Document

Document with noise

3. Approach

A Convolutional Denoising Autoencoder has been trained to remove noise from the noisy scanned documents. The evaluation metric used here is Mean Squared Error (MSE) to compare how far is the denoised image from the ground truth.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
imgs		imgs
.gitignore		.gitignore
DenoisingAutoEncoder_NoisyOfficeData.ipynb		DenoisingAutoEncoder_NoisyOfficeData.ipynb
README.md		README.md
_config.yml		_config.yml
denoiser.jpg		denoiser.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

imgs

imgs

.gitignore

.gitignore