# Image Preprocessing Experiments
Optimizing mammography for deep learning will comprise a series of image preprocessing steps. Denoising, artifact removal, pectoral muscle removal, and image enhancement are among the most essential steps in medical image preprocessing, and determining the optimal methods for these steps will be to a degree, an exercise in experimentation.  In this section, we conduct experiments that will determine the image preprocessing methods that will ultimately be applied to each image before model training.

This section will be organized as follows:
| # | Phase            | Step                   | Description                                                                               |
|---|------------------|------------------------|-------------------------------------------------------------------------------------------|
| 0 | Setup            | Initialize Repositories | Reset the repositories and extract test bed images.                                       |
| 1 | Setup            | Load Experiment Data | Extract stratfied sampling of images for experimentation.                                 |
| 2 | Artifact Removal | Denoise                | Explore basic denoising techniques such as MeanFilter, MedianFilter and   Gaussian Filter |
| 3 | Artifact Removal | Thresholding           | Select thresholding technique for artifact removal                                        |



Import modules

In [1]:
from bcd.config import Config
from bcd.container import BCDContainer
from bcd.etl.load import Loader
from bcd.preprocess.image.experiment.denoise import DenoiseExperiment
from bcd.preprocess.image.method.denoise import (
    BilateralFilter,
    GaussianFilter,
    MeanFilter,
    MedianFilter,
)

In [2]:
Config.set_log_level('INFO')
Config.set_mode('exp')

Wire our dependencies.

In [3]:
container = BCDContainer()
container.init_resources()
container.wire(
    packages=[
        "bcd.dal.repo", "bcd.preprocess.image.experiment", "bcd.dal.io", "bcd.etl"
    ]
)

In [None]:
# Section parameters
SETUP_COMPLETE = False
DENOISE_COMPLETE = False
BATCHSIZE = 16

## Setup
### Initialize Repositories
Experiment repositories are reset.

In [4]:
if not SETUP_COMPLETE:
    uow = container.dal.uow()
    uow.reset()



### Load Data
We will load 5% of the data, stratified by abnormality type, image view, BI-RADS assessment and cancer diagnosis.

In [5]:
if not SETUP_COMPLETE:
    loader = Loader(frac=0.05, groupby=['abnormality_type', 'image_view', 'assessment', 'cancer'])
    loader.run()

100%|██████████| 354/354 [05:08<00:00,  1.15it/s]


## Denoise
Noise in mammography is random variations in image brightness, color, or contrast that may have been produced during the image capture process. These fluctuations are largely categorized as salt and pepper noise, speckle noise, Gaussian noise, and Poisson noise. Salt and pepper noise, also known as spike noise, impulsive noise or flat-tail distributed noise will appear as black and white dots on the image.  Speckle noise is mainly found in radar images whereby the return signal from an object causes random fluctuations within the image. Gaussian noise is additive in nature and follows a Gaussian distribution. Finally, Poisson noise or shot noise appears when there is statistical variation in image brightness, primarily due to characteristics of the capturing device, such as the number of photons used in low-dose X-ray mammography.

### Denoising Methods
These experiments will focus on linear (MeanFilter, GaussianFilter) and non-linear (MedianFilter) spatial domain filters for noise reduction.  

#### MeanFilter
The MeanFilter simply replaces each pixel value in an image with the mean value of its neighbors, including itself. A kernel specifies the shape and size of the neighborhood to be sampled when computing the mean and must be a positive and odd integer. Typical kernel sizes of 3,5, or 7 are used and represent square kernels of 3,5, or 7 pixels in the horizontal and vertical directions. The larger the kernel, the greater the blurring or smoothing effect in the image.
MeanFilter is simple, intuitive, and easy to implement; however, it has two drawbacks, namely:
-	A single outlier pixel value can significantly affect the mean value of all the pixels in its neighborhood, and
-	Edges are blurred, which can be problematic if sharp edges are required in the output.

#### GaussianFilter
Like the MeanFilter, the GaussianFilter is a 2-D convolution operator that is used to remove noise. By contrast, the GaussianFilter uses a different kernel that represents the shape of an isotropic (i.e. circularly symmetric) Gaussian distribution with the following form:
$$
G(x,y) = \frac{1}{2\pi\sigma^2}e^{-{\frac{x^2+y^2}{2\sigma^2}}}
$$



```{figure} /home/john/projects/bcd/jbook/figures/gaussian.png
---
name: gaussian
---
2D Gaussian Distribution
```

The distribution is shown in {numref}`gaussian`

### Mean Filter

In [None]:
params = {"kernel": [3, 5, 7]}
task = DenoiseExperiment(method=MeanFilter, params=params, batchsize=BATCHSIZE)
task.run()

## Median Filter

In [None]:
params = {"kernel": [3, 5, 7]}
task = DenoiseExperiment(method=MedianFilter, params=params, batchsize=BATCHSIZE)
task.run()

## Gaussian Filter

In [None]:
params = {"kernel": [3, 5, 7]}
task = DenoiseExperiment(method=GaussianFilter, params=params, batchsize=BATCHSIZE)
task.run()

## Bilateral Filter