## 3. Read images, predict and save to specific folders
Author: Gayathri Nadar, SCF MPI-CBG </br>
Date: 2021-10-04 </br>
For: Rudrarup </br>
Lab: Tang </br> 

### About 
- This notebook is a part of 3 notebooks in sequence:
    * 1_read_images
    * 2_train_save_classifier 
    * 3_do_prediction 
- This notebook **reads images from path specified by user and applies a trained classifier model on it to make a predicition.**
- A new folder called **predicitions** is created outside the folder specified by user. 
- A table is created to save the output. The table contains three columns: `Path`, `Image name`, `Predicted class`

### Preparations
Data:

- **Add all images to be classified into ONE folder**. It could be on the project space!
- **If folder with images is on the project space, mount the server space `tanglab-tds` on your laptop** before running this notebook. 

Python:

- **Keep this jupyter notebook and `functions.py` together!**
- Set up python and jupyter, and conda env. Check document 'Readme_Python_installation'.
- **When you open this notebook. Click on Kernel > Change kernal and change it to 'imageclassification'**

### General Jupyter Notebook Usage: The basics
- The notebook is executed cell-by-cell.
- The active/current cell is visible by being surrounded by a blue or green box.
- Green box: edit mode. Blue box: view mode. To exit the edit mode: press Esc
- To run a cell: Press Shift+Enter (runs the cell and automatically advances to the next cell)
- Auto-complete is supported (convenient when you e.g. type a folder location): start typing, then press Tab key.
- When you are completely finished: Click on Kernel->Shutdown and close all notebooks related tabs.
- **If you want to do a fresh start of the analysis: click on Kernel -> Restart and Clear Output**. Do this if your notebook seems to have hung. 

### Usage of this notebook:
- Start at the top.
- Run cells step-by-step.
- For cells titled **\"User Input\"**. Adjust the content (data folder etc.) before running them.
- Note: if you accidentally ran it already, simply click on it, adjust the content, and run it again.

### Current workflow 
User Input:

- Folder containing all images to be classified. 
- Folder containing the classifier file. 
- Name of classifier with extension (in case multiple present)

Steps:

- Images and image names are read and added to a list.
- Classifier is loaded.
- For every image in list, features are computed and on this the classifier is applied to make a prediction.
- A table is created to save the output. The table contains three columns: `Path`, `Image name`, `Predicted class`

### Output 
Found in the folder `predictions`


### Prep: Always run this cell

In [1]:
import numpy as np 
import os
import matplotlib.pyplot as plt
import pickle
import skimage 
from skimage import data, segmentation, feature, future
from functools import partial
from skimage.transform import resize
from pathlib import Path
from functions import * 
import tifffile
import shutil
import csv, datetime

### User Input 

1. Enter the path to the folder which contains images you want to classify. ***Note: ALL THE 'TIF' FILES FROM THIS PATH WITHIN FOLDERS AND SUBFOLDERS WILL BE READ AND CLASSIFIED!!***. Make sure to arrange your data properly!
2. Enter the path to the folder which contains classifier file (.pkl file)
3. If `display_predictions` is set to `True` the image and its prediction will be displayed. Caution: might be slow!

In [6]:
# enter values here 
path_images = "../data/testing1/"
path_classifier = "../data/classifierfile/"
classifier_name_with_extension = "classifier_final.pkl"
display_predictions = False 

# nothing to do from here
shape = (512, 512)

# folder to save prediction output 
predictions_output = os.path.join(os.path.dirname(os.path.dirname(path_images)), "predictions")

if not os.path.exists(predictions_output):
    os.makedirs(predictions_output)

### Prep: Always run this cell

In [7]:
# load classifier
with open(os.path.join(path_classifier, classifier_name_with_extension), 'rb') as fid:
    randomforest_classifier = pickle.load(fid)

### Read images, apply loaded model, predict

In [8]:
imagenames = []
imagepaths = []
predictions = []

print("Starting predictions on ALL images in folder: ", path_images)
print("This might take time...\n")

for root, dirs, files in os.walk(path_images):
    for f in files:
        if f.endswith(".tif"):
            filepath = os.path.join(root, f)
            img = tifffile.imread(filepath)
            
            # reshape image, get features
            image_reshaped = cv2.resize(img, dsize=shape, interpolation=cv2.INTER_CUBIC)
#             feature_img = getMultiscaleFeature(image_reshaped)
            
            fv_hu_moments = getMomentsFeature(image_reshaped)
            fv_haralick   = getHaralickFeature(image_reshaped)
            fv_histogram  = getHistogramFeature(image_reshaped)
            feature_img = np.hstack([fv_histogram, fv_haralick, fv_hu_moments])
            
            # apply model and predict
            # shape = (1, image shape * feature vec size)
            print("Predicting image: ", filepath)
            prediction = randomforest_classifier.predict(feature_img.reshape(1, -1))[0]
            print("Prediction:" , prediction, "\n")
            
            if display_predictions:
                plt.figure(figsize=(2, 2))
                plt.imshow(img, interpolation='nearest', cmap='gray')
                plt.title(f'Prediction: {prediction}')
                plt.show()
                print("\n")
    
            imagenames.append(os.path.basename(filepath))
            imagepaths.append(root)
            predictions.append(prediction)
            del img 
            del feature_img
            
print("Done")

Starting predictions on ALL images in folder:  ../data/testing1/
This might take time...

Predicting image:  ../data/testing1/001RB201016A-pR-ATP-00h_G06.d2_T0001F001L01A02Z04C01.tif
Prediction: aggregates 

Predicting image:  ../data/testing1/005RB210727A-PolyAr-NADHbcst_E05.c3_T0001F001L01A01Z01C01.tif
Prediction: droplet 

Predicting image:  ../data/testing1/001RB201016A-pR-ATP-00h_G06.d2_T0001F001L01A02Z05C01.tif
Prediction: aggregates 

Predicting image:  ../data/testing1/001RB201016A-pR-ATP-00h_G06.d2_T0001F001L01A02Z03C01.tif
Prediction: aggregates 

Done


### Save CSV file with predicitions

- Columns: `Path`, `Image name`, `Predicted class`
- Output found in folder `predictions` outside of the folder containing images for predicitions (set by user).


In [9]:
ts = datetime.datetime.now().strftime("%Y%m%d_%H%M")

out_f = open(join(predictions_output, ts + "_prediction.csv"), 'w')

# set tab as delimiter and add header
w = csv.writer(out_f, delimiter='\t')  
w.writerow(["Path", "Image name", "Predicted class"])

for p, n, out in zip(imagepaths, imagenames, predictions): 
    row_val = [p, n, out]
    w.writerow(row_val)

out_f.close()
    
print("DONE")

DONE
