Document Denoising Convolutional Autoencoder using Tensorflow

This repository contains the implementation of a Denoising Convolutional Autoencoder (CAE) using TensorFlow, OpenCV, Keras, Scikit-Learn, and Python. The goal of this project is to perform noise reduction in noisy documents, such as scanned documents, or images of documents.

The autoencoder architecture used in this project is a Convolutional Neural Network (CNN). It consists of two components:

An encoder that takes a noisy document as input and encodes it into a low-dimensional representation, and
A decoder that takes the low-dimensional representation outputted by the encoder and reconstructs the original document discarding the noise.

Dataset

The denoising-dirty-documents dataset is used in this project for training and testing the models. The dataset provides images of documents containing various style of texts. It has three sets of data:

train data: images of documents used for training the model to which synthetic noise has been added to simulate real-world, messy artifacts,
train_cleaned data: dataset with denoised train data used for validation during the training procedure, and
test data: noisy images of documents to be used for testing the mode.

Usage

You can run this project either 1) in Colab, or 2) in our own machine installing TensorFlow, cv2, and scikit-learn.

To run the project in Google Colab, you need to open the denoising_convolutional_autoencoder.ipynb file from the notebooks directory. The notebook contains all the required codes along with suitable comments. Since the datate is hosted in Kaggle, the detailed instructions of how to download and preprocess the dataset correctly are also included in the notebook.
To run the project in your own machine, use the following commands to install necessary tools/libraries:

python -m pip install -U pip # to install pip
pip install tensorflow
pip install pip install opencv-python
pip install -U scikit-learn

This project also uses two widely used Python libraries: numpy and matplotlib. If your machine doesn't have these libraries included in your Python, use the following commands to install them:

pip install numpy
python -m pip install -U matplotlib

Once all the necessary dependencies are installed, simply run the convolutional_autoencoder.py file from the convolutional_autoencoder directory.

Note: The dataset files are not uploaded in the project due to the lack of storage space. To ensure that the project runs without errors, please create a data/raw/ directory on the root folder and keep the unzipped train, test, and train_cleaned directories there. Find the config.py in the the convolutional_autoencoder directory for more details about the configurations.

Evaluation

The reports directory contains visual plots showing the evolution of loss and errors during the training process and the outputs of denoising operations over the test dataset. Here goes a plot to show how loss and mean absolute error (MAE) changes over epochs during the training process.

Conclusion

This Denoising Convolutional Autoencoder (CAE) can be utilized to denoise any noisy documents, including scanned documents, photographs of documents, or low-quality PDFs, minimizing the training loss and errors almost close to zero, i.e., with very high accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
convolutional_autoencoder		convolutional_autoencoder
models		models
notebooks		notebooks
reports		reports
saved_model/denoised_autoencoder_trained		saved_model/denoised_autoencoder_trained
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Document Denoising Convolutional Autoencoder using Tensorflow

Dataset

Usage

Evaluation

Conclusion

About

Uh oh!

Releases

Packages

Languages

mmalam3/Document-Denoising-Convolutional-Autoencoder-using-TensorFlow

Folders and files

Latest commit

History

Repository files navigation

Document Denoising Convolutional Autoencoder using Tensorflow

Dataset

Usage

Evaluation

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages