<a href="https://colab.research.google.com/github/ziatdinovmax/atomai/blob/master/examples/notebooks/ImageDenoising.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Autoencoders for Image Denoising

*  *Notebook prepared by Maxim Ziatdinov  (email: maxim.ziatdinov@gmail.com)*

*  *The simulated data (atomic coordinates) comes from MD calculations by Bobby Sumpter and Ayana Ghosh at Oak Ridge National Lab*

*  *Experimental data by Ondrej Dyck at Oak ridge National Lab*



---


This notebook provides a simple example of training a denoising Autoencoder for simple image cleaning (denoising) using [AtomAI](https://github.com/pycroscopy/atomai). Generally, autoencoders refer to the class of the neural networks that compress the data set to a small number of bottleneck features, and then expand back to original data size. The training aims to minimize information loss between the initial and reconstructed images via usual backpropagation. This process tends to select the relevant features in the data set and reject the noise, giving rise to applications for denoising.


---







Install AtomAI:

In [None]:
!pip install atomai gdown

Imports:

In [None]:
import atomai_app as aoi

import numpy as np
import matplotlib.pyplot as plt

Define a helper function for preparing data:

In [None]:
def split_denoising_data(imgdata_noisy, imgdata, test_size=0.2, holdout_size=0.1, random_state=0):

    from sklearn.model_selection import train_test_split

    # First split: separate holdout set
    X_noisy_temp, X_noisy_holdout, X_clean_temp, X_clean_holdout = train_test_split(
        imgdata_noisy, imgdata, test_size=holdout_size, random_state=random_state
    )

    # Second split: train/test from remaining data
    relative_test_size = test_size / (1 - holdout_size)
    X_noisy_train, X_noisy_test, X_clean_train, X_clean_test = train_test_split(
        X_noisy_temp, X_clean_temp, test_size=relative_test_size,
        random_state=random_state + 1
    )

    return (X_noisy_train, X_noisy_test, X_noisy_holdout,
            X_clean_train, X_clean_test, X_clean_holdout)


def scale_to_training_range(expdata, training_data):
    scale_factor = training_data.max() / expdata.max()
    return expdata * scale_factor

Download simulated data of graphene:

In [None]:
# Download data
!gdown -O "graphene_MD_imgs.npy" "https://drive.google.com/uc?id=1iFZvHKkOLWxPVe6dlm5GTJOSimAZJCMf"

Load data into the notebook:


In [None]:
imgdata = np.load("graphene_MD_imgs.npy")[::2] # take every 2nd sample
print(imgdata.shape)

Now let's corrupt our data with noise and then use a denoising autoencoder to reconstruct the original images.

In [None]:
np.random.seed(0) # for reproducibility
# Add noise to data
imgdata_noisy = imgdata + np.random.normal(scale=8, size=imgdata.shape)

View selected pairs of images (images from the left subplot will be inputs into a neural network and images from the right subplot will be our targets)

In [None]:
k = 15

_, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
ax1.imshow(imgdata_noisy[k])
ax2.imshow(imgdata[k])
ax1.set_title("Corrupted image")
ax2.set_title("Original image")

Split data into train, test, and holdout sets:

In [None]:
(X_noisy_train, X_noisy_test, X_noisy_holdout,
 X_clean_train, X_clean_test, X_clean_holdout) = split_denoising_data(imgdata_noisy, imgdata)

Initialize and train AtomAI's denoiser model:

In [None]:
denoiser = aoi.models.DenoisingAutoencoder()
denoiser.fit(X_noisy_train, X_clean_train, X_noisy_test, X_clean_test, training_cycles=500)

Make a prediction on the holdout dataset:

In [None]:
predictions = denoiser.predict(X_noisy_holdout)

Plot results:

In [None]:
k = 5 # select a prediction to plot
_, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(12, 5))
ax1.imshow(X_noisy_holdout[k])
ax2.imshow(predictions[k])
ax3.imshow(X_clean_holdout[k] - predictions[k])
ax1.set_title("Input (test) noisy data")
ax2.set_title("Cleaned data")
ax3.set_title("Difference")

Now we are going to gradually increase the noise level and see how well our model can generalize:

In [None]:
img = X_noisy_holdout[k]
for s in range(0, 100, 5):
    img = img + np.random.normal(scale=8+s, size=img.shape)
    prediction = denoiser.predict(img)
    _, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
    ax1.imshow(img)
    ax2.imshow(prediction)
    ax1.set_title("Input noisy data")
    ax2.set_title("Cleaned data")
    plt.show()

Finally, let's apply it to experimental data. Note that the current model is by no means optimized to work with experimental data - we didn't even consider any scale changes. Still, it is interesting to see how it will perform on real-world data.

In [None]:
# download data
!gdown -O "graphene_exp.npy" "https://drive.google.com/uc?id=18U8YHZUbSZj0Q1__zup5-ABrjaEZmiPc"

In [None]:
# Load experimental iamge
expdata = np.load("graphene_exp.npy")

# Scale it to the range of pixel values used in training data
expdata_scaled = scale_to_training_range(expdata, X_noisy_train)

Visualize predictions:

In [None]:
prediction = denoiser.predict(expdata_scaled)

_, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 9))
ax1.imshow(expdata_scaled)
ax2.imshow(prediction.squeeze())

Looks like a decent prediction!