# Testing of Img Utils (Data Engineering & Exploration)

Michael Janus, May/June 2018

Goal of this notebook is to test and validate the functions in **imgutils**, which functions as the infrastructure for the data engineering and exploration. For most functions, there are test-functions in **imgutils_test**, which also show how to use the functions together.

## 1. Import the used modules, including the one with test functions:

In [None]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

import matplotlib

import imgutils
import imgutils_test as tst

In [None]:
# Re-run this cell if you altered imgutils or imgutils_test
import importlib
importlib.reload(imgutils)
importlib.reload(tst)

## 2. Test the basic image IO and display

In [None]:
tst.test_scanimgdir()

In [None]:
tst.test_loadandshowimg()

In [None]:
tst.test_loadandshowimgs()   # shows array of images

## 3. Test image slicing
The image slice functions cut-up an image into sub-images. 
The test function loads an image, slices it up and shows the array of images

In [None]:
tst.test_sliceimage(6,5)

## 4. Test the heatmap display 
The heatmap slices up an image and overlays a heat color over the image slice. The test function uses fake heats

In [None]:
tst.test_heatmap()

## 5. Test the slice statistics functions
There are individual functions to return the statistics of an image. 
The **slicestats()** function combines image slicing and 

In [None]:
# first test dataframe stuff without statistics:
df1 = tst.test_slicestats_df()
df1.head()

n_y and n_x are the number of slices in the image in y and x direction; (s_x, s_y) is the slice index

In [None]:
# get a single slice from this dataframe:
sliceimg = imgutils.getimgslice(df1, 4)
imgutils.showimg(sliceimg)

In [None]:
# test the  image statistics functions:
tst.test_statfuncs(sliceimg)

#### Now test the function that combines slicing and statistics:

In [None]:
df2 = tst.test_slicestats()


## 6. Test the visualization of stats (interactive graph with image display)

This is based on a matplotlib graph with events hooked up to show the image that corresponds to the datapoint when clicked.

Notes:
- This function is not without issues, as it requires a switch to turn on interactivity.
- This sometimes requires restarting the kernel for it to work
- Behavior in e.g. PyCharm is slightly different, only updating when rescaling the graph window
- click the 'standby button' (top-right) to fix it into the notebook (if you don't click it, next graphs replace the one still open)

In [None]:
# need to tell matplotlib it's in a notebook, otherwise interactivity does not work
%matplotlib notebook   
imgutils.plotwithimg(df2, 'img_mean', 'img_std', imgutils.getimgslice)

#### Click on point in lower-right; data point 4 ... is that noise or on a crystal?


## Without context (i.e. image surrounding), still hard to judge image!
### So I create an alternative image display, which shows the slice in context

(I modified interactive graph plotwithimg so you can inject it with different image display)


In [None]:
imgutils.plotwithimg(df2, 'img_mean', 'img_std', imgutils.highlightimgslice, True)

Now it's much clearer what the slice of the data point really is.

## 7. Normalization
The data should actually be normalized to reasonable values.
A common way is to use 'standardization' (see https://en.wikipedia.org/wiki/Normalization_(statistics) ).

In [None]:
imgutils.normalize(df2,['img_min'])
df2.head(3)

In [None]:
# check if indeed the mean of standardized column is 0 and has std_dev of 1:
print(df2['|img_min|'].mean())
print(df2['|img_min|'].std())

Ok (apart from some rounding)

Now apply it to the other columns

In [None]:
imgutils.normalize(df2, ['img_max', 'img_mean', 'img_std'])
df2.head(3)


Plot the normalized version

In [None]:
imgutils.plotwithimg(df2, '|img_mean|', '|img_std|', imgutils.highlightimgslice, True)

## 8. Play more with this test dataset...
Instead of the test-functions, let's use the imgutils functions directly

In [None]:
statfuncs = [imgutils.img_min, imgutils.img_max, imgutils.img_range, imgutils.img_mean, imgutils.img_std]
df_imgfiles = imgutils.scanimgdir('', '.tif')
imgfiles = list(df_imgfiles['filename'])
df3 = imgutils.slicestats(imgfiles, 4, 4, statfuncs)
df3.head()

In [None]:
imgutils.normalize(df3, ['img_min', 'img_max', 'img_range', 'img_mean', 'img_std'])
df3.head(2)

Let's do a 'pair-plot' to see if there something obvious

In [None]:
import seaborn as sb

In [None]:
sb.pairplot(df3, vars=['|img_min|','|img_max|', '|img_range|','|img_mean|', '|img_std|'])

### Let's inspect some combinations that have 'signs of clustering' in the interactive graph

In [None]:
imgutils.plotwithimg(df3, '|img_mean|', '|img_range|', imgutils.highlightimgslice, True)

In [None]:
imgutils.plotwithimg(df3, '|img_mean|', '|img_std|', imgutils.highlightimgslice, True)

In [None]:
imgutils.plotwithimg(df3, '|img_range|', '|img_std|', imgutils.highlightimgslice, True)

## 9. Conclusions
- Build a number of infrastructural functions for the data engineering and exploration
- This notebook demonstrates how to use these functions
- It also shows with the test images that the concept of using simple statics on sub-images to reveal particles looks promosing!


## 10. Next steps: Try this out on larger set!



Michael Janus, 14 June 2018