# Dataset Simple Filter Creation
This section will focus on establishing a simple filter based on mean and standard deviation of pixel values of the tiled sildes to establish a filter that would select images with GOOD, PARTIAL or BLANK tissues within the tiles generated. This filter exploits the average pixel value found within the tile images and gets the average mean and standard deviation of a tile that fits in the category. Note: The intial data used was manually seperated into GOOD PARTIAL and BLANK folders.
#### Category
* Good - Tiles that have greater than 80% tissue
* Partial -  Tiles that have less than 80% tissue
* Blank - Tiles that have no tissue at all

#### Importing Required Libraries

In [21]:
import os
openslide_path = r"C:\Users\aaron\openslide-win64-20171122\openslide-win64-20171122\bin"
os.environ['PATH'] = openslide_path + ";" + os.environ['PATH']
from openslide import open_slide
import openslide
from PIL import Image
import numpy as np
from matplotlib import pyplot as plt
import tifffile as tiff
import glob
%matplotlib inline

#### Loading in the dataset
The dataset was loaded from a tile whole slide image which was manually seperated into Good, Partial, and Blank. Library glob was used to iterate over the all the .tiff file inside the folder and subfolders. Note: This could also be done with OS.walk similar to what was done on Section 1

In [29]:
good_dir = (glob.glob("C:/Users/aaron/Image_3/Good/*.tiff")) # Manually seperated directory for images with >80% objects in the image
blank_dir = (glob.glob("C:/Users/aaron/Image_3/Blank/*.tiff")) # Manually seperated directory for images with little to no objects in the image
partial_dir = (glob.glob("C:/Users/aaron/Image_3/Partial/*.tiff")) # Manually seperated directory for images with <80% objects in the image\

#### Getting the Pixel Values of the Image
Since we want to figure out the amount of tissue slide in the tiled images, one way of doing it is by taking advantage of pixel values since we would know that blank tiles would have higher mean and lower standard deviation this is because white pixels tend to have higher values and the lack of deviation between the pixel values since everything is white. On the other hand, good pixels would have lower pixel values since the pixel would have a darker shade and have higher standard deviation since the tiles would have much further range of pixel values.

In [24]:
def mean_std_pixel(img_list): # Creating a function
    avg_px_value = [] # Creating a blank list
    std_px_value = [] # Creating a blank list
    for file in img_list: # Creating a while loop that will go over the list of tiff files from the step above
        image = tiff.imread(file) # Reading the tiff files
        avg = image.mean() # Getting the mean of pixel values of the image
        std = image.std() # Getting the standard deviation of pixel values of the image
        avg_px_value.append(avg) # Putting the gathered mean of the image into the blank list
        std_px_value.append(std) # Putting the gathered standard deviation of the image into the blank list
        # The blank list above will slowly be filled up with the values of every image
    avg_px_value = np.array(avg_px_value) # converting the list into an array
    std_px_value = np.array(std_px_value) # converting the list into an array 
    print("Average pixel value for all images is: ",avg_px_value.mean()) # getting the mean of the array
    print("Average standard deviation of pixel values for all images is: ",std_px_value.mean()) # getting the mean of the array
    return(avg_px_value,std_px_value)

#### Good Images
Getting the mean and standard deviation of good images which is done with the function created above

In [26]:
goodmean,goodstd = mean_std_pixel(good_dir) # Getting the mean and std from good images

Average pixel value for all images is:  169.85473363017033
Average standard deviation of pixel values for all images is:  72.60558665914053


In [None]:
gmean = np.mean(goodmean) # Getting the average value for mean
gstd = np.mean(goodstd) # Getting the average value for std
print("Average pixel value",gmean) # Results
print("Average Standard Deviation",gstd) # Results

#### Blank Images

In [30]:
meanblank,stdblank = mean_std_pixel(blank_dir) # Getting the mean and std from blank images

Average pixel value for all images is:  244.32868270496706
Average standard deviation of pixel values for all images is:  6.770464078397775


In [None]:
bmean = np.mean(meanblank) # Getting the average value for mean
bstd = np.mean(stdblank) # Getting the average value for std
print("Average pixel value",bmean) # Results
print("Average Standard Deviation",bstd) # Results

#### Partial Images

In [None]:
meanpartial,stdpartial = mean_std_pixel(partial_dir) # Getting the mean and std from partial images

In [None]:
ptmean = np.mean(meanpartial) # Getting the average value for mean
ptstd = np.mean(stdpartial) # Getting the average value for std
print("Mean of partial images =",ptmean) # Results
print("Standard Deviation of partial images =",ptstd) # Results