## Score a new selfie image and evaluate the acne severity of the face
Given a set of selfie images, this script is used to score the acne severity levels of the selfie face images. We score each skin patch from the selfie and the average score is chosen as the image label. 

The scoring pipelines have the following four steps:
1. Extract skin patches
2. Predict the label for each skin patch
 *  Extract features from CNTK pretrained model
 *  Score the features using the trained full connected neural network model
3. infer the whole face label based on predicted labels of the skin patches from that selfie image. The inferred labels of the entire selfie images in the test directory are output to a csv file. 
4. (Optional) If you also have a csv file with the ground truth labels of the test images, we also compare the predicted label and the ground truth labels, and calculate the RMSE on the golden set. 

***Note***: With the model you trained on Step 3, where random seed = 5 for splitting the training images into training and validation, and initializing the weights of the neural network models, you should get ***RMSE = 0.4819*** on golden set images.

## Prerequisites

### Test images
- Test images should be put in a directory on the machine, and provide the path to this directory to parameter image\_path.
- (***Optional***) Ground truth labels of the test images should be put in a csv file with a headerline, and two columns: Image_Name, Ground_Truth. If this file is None, calculating the RMSE on the test images will be skipped.

### Python, Python libraries and self-defined Python Script
- Python 3.5 or later version 
- CNTK, PIL
- getPatches.py to extract patches from a selfie (modified from ***[Step 1. Extract Forehead, cheeks, and chin skin patches from raw images using facial landmark model and One Eye model](../01_DataPrep/Step 1. Extract Forehead, cheeks, and chin skin patches from raw images using facial landmark model and One Eye model.ipynb)***)
***Note*** You need to mofidy the path to the pretrained landmark model and the Cascade Eye model in the getPatches.py file. 

### Pretrained models
- [Frontal face landmark model](https://github.com/AKSHAYUBHAT/TensorFace/blob/master/openface/models/dlib/shape_predictor_68_face_landmarks.dat) in the same directory as this jupyter notebook.
- [One Eye model](https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarcascade_eye.xml) in the same directory as this jupyter notebook.
- [ResNet152\_ImageNet\_Caffe.model](https://www.cntk.ai/Models/Caffe_Converted/ResNet152_ImageNet_Caffe.model)
- Trained full connected neural network regression model from ***[Step 3. Training_Pipeline](Step3_Training_Pipeline.ipynb)***


## Parameters

In [1]:
pretrained_model_name = 'ResNet152_ImageNet_Caffe.model'
pretrained_model_path = '../models'
pretrained_node_name = 'pool5' 

label_mapping = {1: '1-Clear', 2: '2-Almost Clear', 3: '3-Mild', 4: '4-Moderate', 5: '5-Severe'}

img_path = '<path to test images>'
test_ground_truth = '<csv file for ground truth of test images>' #If None, calculating RMSE on test set will be skipped
result_file = '<csv file to store predicted labels of test images>'
patch_path = '<path to test images path>'
regression_model_path = '../models/cntk_regression.dat'
eye_cascade_model = '../models/haarcascade_eye.xml'

image_height = 224 # Here are the image height and width that the skin patches of the testing selfie are going to be resized to.
image_width  = 224 # They have to be the same as the ResNet-152 model requirement.
num_channels = 3

In [2]:
from __future__ import print_function
import os
from os import listdir
from os.path import join, isfile, splitext
import numpy as np
import pandas as pd
import cntk as C
from PIL import Image
import pickle
import time
import json
from cntk import load_model, combine
import cntk.io.transforms as xforms
from cntk.logging import graph
from cntk.logging.graph import get_node_outputs
import getPatches
import cv2

## Extract Skin Patches

In the scoring pipeline, the first step is to extract forehead, left cheek, right cheek, and chin skin patches from the original selfie images. The skin patch images are going to be saved in the directory specified in variable patch\_path.

During this skin patch extraction process, a facial landmark model will be first applied. If face is detected by this model, skin patches will be extracted. If face is not detected by the landmark model, OneEye model will be applied to detect the location of a single eye. Then, forehead and cheek will be extracted based on the eye location.

If neither landmark nor the OneEye model works, the entire selfie will be used to make predictions.

Landmark needs to see both eyes open from the camera. OneEye model works if there is only one open eye identified. Skin patches can be extracted more accurately based on landmark model. So, if possible, we encourage users to face the camera directly with both eyes opened. 

Keep in mind, depending on the angle of the face facing the camera, this step might result in 1, 2, 3, or 4 skin patch images. 


In [4]:
# get the dimension of each patch of images in the testing image directory
dimension_dict = dict()
imageFiles = [f for f in listdir(img_path) if isfile(join(img_path, f))]
eye_cascade =  cv2.CascadeClassifier(eye_cascade_model)
for imagefile in imageFiles:
    dim = getPatches.extract_patches(join(img_path, imagefile), {}, patch_path, eye_cascade) #extract_patches function is defined in getPatches.py
    dimension_dict[imagefile] = dim

forehead dim (x_min, x_max, y_min, y_max): 709,1304, 637, 882
chin dim (x_min, x_max, y_min, y_max): 911,1101, 1541, 1700
right cheek dim (x_min, x_max, y_min, y_max): 682,805, 1118, 1640
left cheek dim (x_min, x_max, y_min, y_max): 1101,1343, 1111, 1640
forehead dim (x_min, x_max, y_min, y_max): 1866,3253, 403, 860
chin dim (x_min, x_max, y_min, y_max): 2349,2790, 2090, 2385
right cheek dim (x_min, x_max, y_min, y_max): 1790,2127, 1258, 2270
left cheek dim (x_min, x_max, y_min, y_max): 2790,3377, 1248, 2269
forehead dim (x_min, x_max, y_min, y_max): 862,2062, 1277, 1690
chin dim (x_min, x_max, y_min, y_max): 1266,1656, 2755, 3069
right cheek dim (x_min, x_max, y_min, y_max): 827,1074, 1928, 2953
left cheek dim (x_min, x_max, y_min, y_max): 1656,2129, 1931, 2953
forehead dim (x_min, x_max, y_min, y_max): 1055,1678, 614, 862
chin dim (x_min, x_max, y_min, y_max): 1313,1504, 1500, 1690
right cheek dim (x_min, x_max, y_min, y_max): 849,1212, 1043, 1624
forehead dim (x_min, x_max, y_min, y

forehead dim (x_min, x_max, y_min, y_max): 1379,2650, 605, 1095
chin dim (x_min, x_max, y_min, y_max): 1821,2246, 2403, 2729
right cheek dim (x_min, x_max, y_min, y_max): 1261,1605, 1488, 2603
left cheek dim (x_min, x_max, y_min, y_max): 2246,2664, 1544, 2608
forehead dim (x_min, x_max, y_min, y_max): 427,1214, 513, 777
chin dim (x_min, x_max, y_min, y_max): 711,938, 1466, 1659
right cheek dim (x_min, x_max, y_min, y_max): 383,582, 980, 1589
left cheek dim (x_min, x_max, y_min, y_max): 938,1275, 964, 1588
forehead dim (x_min, x_max, y_min, y_max): 351,1068, 541, 790
chin dim (x_min, x_max, y_min, y_max): 601,820, 1432, 1622
right cheek dim (x_min, x_max, y_min, y_max): 301,488, 973, 1556
left cheek dim (x_min, x_max, y_min, y_max): 820,1124, 963, 1555
forehead dim (x_min, x_max, y_min, y_max): 2151,3436, 896, 1340
chin dim (x_min, x_max, y_min, y_max): 2588,2996, 2446, 2821
right cheek dim (x_min, x_max, y_min, y_max): 2044,2393, 1677, 2704
left cheek dim (x_min, x_max, y_min, y_max): 

forehead dim (x_min, x_max, y_min, y_max): 2248,3454, 878, 1262
chin dim (x_min, x_max, y_min, y_max): 2685,3037, 2209, 2544
right cheek dim (x_min, x_max, y_min, y_max): 2143,2483, 1545, 2442
left cheek dim (x_min, x_max, y_min, y_max): 3037,3531, 1557, 2443
forehead dim (x_min, x_max, y_min, y_max): 531,1664, 439, 879
chin dim (x_min, x_max, y_min, y_max): 956,1326, 2019, 2348
right cheek dim (x_min, x_max, y_min, y_max): 436,765, 1258, 2237
forehead dim (x_min, x_max, y_min, y_max): 264,698, 263, 427
chin dim (x_min, x_max, y_min, y_max): 399,549, 852, 975
right cheek dim (x_min, x_max, y_min, y_max): 249,334, 574, 934
left cheek dim (x_min, x_max, y_min, y_max): 549,722, 572, 934
forehead dim (x_min, x_max, y_min, y_max): 499,1371, 651, 964
chin dim (x_min, x_max, y_min, y_max): 807,1075, 1740, 2008
right cheek dim (x_min, x_max, y_min, y_max): 476,667, 1184, 1924
left cheek dim (x_min, x_max, y_min, y_max): 1075,1395, 1197, 1925
forehead dim (x_min, x_max, y_min, y_max): 482,1427,

forehead dim (x_min, x_max, y_min, y_max): 255,700, 251, 414
chin dim (x_min, x_max, y_min, y_max): 400,550, 819, 958
right cheek dim (x_min, x_max, y_min, y_max): 221,335, 556, 917
left cheek dim (x_min, x_max, y_min, y_max): 550,727, 559, 917
forehead dim (x_min, x_max, y_min, y_max): 477,1406, 511, 858
chin dim (x_min, x_max, y_min, y_max): 776,1088, 1803, 2015
right cheek dim (x_min, x_max, y_min, y_max): 480,626, 1156, 1928
left cheek dim (x_min, x_max, y_min, y_max): 1088,1449, 1148, 1927
forehead dim (x_min, x_max, y_min, y_max): 520,1702, 661, 1086
chin dim (x_min, x_max, y_min, y_max): 916,1302, 2247, 2504
right cheek dim (x_min, x_max, y_min, y_max): 502,716, 1463, 2398
left cheek dim (x_min, x_max, y_min, y_max): 1302,1796, 1446, 2396
forehead dim (x_min, x_max, y_min, y_max): 556,1750, 614, 1049
chin dim (x_min, x_max, y_min, y_max): 946,1315, 2188, 2502
right cheek dim (x_min, x_max, y_min, y_max): 516,732, 1401, 2390
left cheek dim (x_min, x_max, y_min, y_max): 1315,1805,

forehead dim (x_min, x_max, y_min, y_max): 626,1329, 652, 895
chin dim (x_min, x_max, y_min, y_max): 873,1074, 1546, 1707
right cheek dim (x_min, x_max, y_min, y_max): 586,757, 1108, 1646
left cheek dim (x_min, x_max, y_min, y_max): 1074,1372, 1108, 1646
forehead dim (x_min, x_max, y_min, y_max): 307,681, 267, 401
chin dim (x_min, x_max, y_min, y_max): 427,550, 745, 848
right cheek dim (x_min, x_max, y_min, y_max): 292,373, 506, 813
left cheek dim (x_min, x_max, y_min, y_max): 550,703, 502, 813
forehead dim (x_min, x_max, y_min, y_max): 296,651, 280, 406
chin dim (x_min, x_max, y_min, y_max): 414,529, 728, 826
right cheek dim (x_min, x_max, y_min, y_max): 284,362, 515, 794
left cheek dim (x_min, x_max, y_min, y_max): 529,680, 503, 793
forehead dim (x_min, x_max, y_min, y_max): 300,680, 239, 374
chin dim (x_min, x_max, y_min, y_max): 424,550, 728, 825
right cheek dim (x_min, x_max, y_min, y_max): 279,367, 480, 790
left cheek dim (x_min, x_max, y_min, y_max): 550,702, 478, 789
forehead d

forehead dim (x_min, x_max, y_min, y_max): 449,1436, 679, 979
chin dim (x_min, x_max, y_min, y_max): 809,1098, 1784, 1982
right cheek dim (x_min, x_max, y_min, y_max): 403,657, 1222, 1904
left cheek dim (x_min, x_max, y_min, y_max): 1098,1479, 1199, 1902
forehead dim (x_min, x_max, y_min, y_max): 457,1437, 654, 963
chin dim (x_min, x_max, y_min, y_max): 815,1105, 1778, 1994
right cheek dim (x_min, x_max, y_min, y_max): 406,662, 1210, 1914
left cheek dim (x_min, x_max, y_min, y_max): 1105,1481, 1185, 1911
forehead dim (x_min, x_max, y_min, y_max): 489,1479, 654, 960
chin dim (x_min, x_max, y_min, y_max): 848,1137, 1767, 1981
right cheek dim (x_min, x_max, y_min, y_max): 445,697, 1192, 1900
left cheek dim (x_min, x_max, y_min, y_max): 1137,1508, 1192, 1900
forehead dim (x_min, x_max, y_min, y_max): 519,1512, 724, 1022
chin dim (x_min, x_max, y_min, y_max): 869,1166, 1831, 2017
right cheek dim (x_min, x_max, y_min, y_max): 461,716, 1269, 1940
left cheek dim (x_min, x_max, y_min, y_max): 1

forehead dim (x_min, x_max, y_min, y_max): 501,1429, 646, 966
chin dim (x_min, x_max, y_min, y_max): 815,1098, 1821, 2033
right cheek dim (x_min, x_max, y_min, y_max): 479,667, 1208, 1949
left cheek dim (x_min, x_max, y_min, y_max): 1098,1456, 1204, 1948
forehead dim (x_min, x_max, y_min, y_max): 491,1443, 671, 981
chin dim (x_min, x_max, y_min, y_max): 823,1113, 1826, 2015
right cheek dim (x_min, x_max, y_min, y_max): 481,677, 1201, 1932
left cheek dim (x_min, x_max, y_min, y_max): 1113,1455, 1203, 1932
forehead dim (x_min, x_max, y_min, y_max): 519,1420, 677, 990
chin dim (x_min, x_max, y_min, y_max): 832,1110, 1833, 2034
right cheek dim (x_min, x_max, y_min, y_max): 515,692, 1245, 1953
left cheek dim (x_min, x_max, y_min, y_max): 1110,1469, 1221, 1951
forehead dim (x_min, x_max, y_min, y_max): 440,1474, 584, 941
chin dim (x_min, x_max, y_min, y_max): 799,1112, 1893, 2132
right cheek dim (x_min, x_max, y_min, y_max): 426,632, 1211, 2038
left cheek dim (x_min, x_max, y_min, y_max): 11

## Predict Image Patch Label

### Load CNTK Pretrained Model

In [3]:
# define pretrained model location, node name
model_file  = os.path.join(pretrained_model_path, pretrained_model_name)
loaded_model  = load_model(model_file)
node_in_graph = loaded_model.find_by_name(pretrained_node_name)
output_nodes  = combine([node_in_graph.owner])

node_outputs = C.logging.get_node_outputs(loaded_model)
for l in node_outputs: 
    if l.name == pretrained_node_name:
        num_nodes = np.prod(np.array(l.shape))
        
print ('the pretrained model is %s' % pretrained_model_name)
print ('the selected layer name is %s and the number of flatten nodes is %d' % (pretrained_node_name, num_nodes))

the pretrained model is ResNet152_ImageNet_Caffe.model
the selected layer name is pool5 and the number of flatten nodes is 2048


In [4]:
def extract_features(image_path):   
    img = Image.open(image_path)       
    resized = img.resize((image_width, image_height), Image.ANTIALIAS)  
    
    bgr_image = np.asarray(resized, dtype=np.float32)[..., [2, 1, 0]]    
    hwc_format = np.ascontiguousarray(np.rollaxis(bgr_image, 2)) 
    
    arguments = {loaded_model.arguments[0]: [hwc_format]}    
    output = output_nodes.eval(arguments)   
    return output

### Load NN model

In [41]:
#load the stored regression model
read_model = pd.read_pickle(regression_model_path)
regression_model = read_model['model'][0]
train_regression = pickle.loads(regression_model)

### Score Image

In [42]:
# get the score value for each patch
patch_score = dict()
for file in next(os.walk(patch_path))[2]:
    file_path = os.path.join(patch_path, file)
    # extract features from CNTK pretrained model
    score_features = extract_features (file_path)[0].flatten()
    # score the extracted features using trained regression model
    pred_score_label = train_regression.predict(score_features.reshape(1,-1))
    patch_score[file] = float("{0:.2f}".format(pred_score_label[0]))

### Check the predicted labels of skin patches

In [43]:
patch_score

{'0006_chin.jpg': 2.91,
 '0006_fh.jpg': 3.49,
 '0006_lc.jpg': 4.14,
 '0006_rc.jpg': 2.0,
 '0007_chin.jpg': 2.5,
 '0007_fh.jpg': 3.41,
 '0007_lc.jpg': 2.77,
 '0007_rc.jpg': 2.96,
 '0010_chin.jpg': 2.24,
 '0010_fh.jpg': 2.43,
 '0010_lc.jpg': 2.68,
 '0010_rc.jpg': 2.74,
 '0025_chin.jpg': 3.24,
 '0025_fh.jpg': 2.45,
 '0025_rc.jpg': 3.22,
 '0030_chin.jpg': 2.52,
 '0030_fh.jpg': 2.73,
 '0030_lc.jpg': 2.33,
 '0030_rc.jpg': 3.13,
 '0050_chin.jpg': 1.79,
 '0050_fh.jpg': 3.95,
 '0050_rc.jpg': 3.83,
 '0078_chin.jpg': 2.18,
 '0078_fh.jpg': 1.78,
 '0078_lc.jpg': 3.61,
 '0078_rc.jpg': 2.58,
 '0079_chin.jpg': 2.74,
 '0079_fh.jpg': 2.01,
 '0079_lc.jpg': 2.85,
 '0079_rc.jpg': 2.44,
 '0090_chin.jpg': 2.88,
 '0090_fh.jpg': 2.67,
 '0090_lc.jpg': 3.34,
 '0090_rc.jpg': 2.86,
 '0091_chin.jpg': 2.32,
 '0091_fh.jpg': 3.34,
 '0091_lc.jpg': 3.48,
 '0091_rc.jpg': 2.99,
 '0097_chin.jpg': 3.16,
 '0097_fh.jpg': 3.69,
 '0097_lc.jpg': 2.67,
 '0097_rc.jpg': 3.29,
 '0108_chin.jpg': 2.23,
 '0108_fh.jpg': 4.56,
 '0108_lc.

## Infer Image Label

After each skin patch image is predicted, we need to infer the predicted label of the entire selfie from the predicted skin patch labels. In this application, we choose to maximal predicted skin patch label as the predicted label of the entire selfie. This logic can be modified further to balance the performance of the model on the entire 5 levels of the acne severity.

In [44]:
# get the max score value among the patches and record the image name
image_patch_scores = {}
for key in patch_score:
    image_id = key[0:4]
    image_patch_scores_i = image_patch_scores.get(image_id, {"patch_name":[], "patch_score":[]})
    image_patch_scores_i["patch_name"].append(key)
    image_patch_scores_i["patch_score"].append(patch_score[key])
    image_patch_scores[image_id] = image_patch_scores_i

fp = open(result_file, 'w')
fp.write("Image_Name, Predicted_Label_Avg, Most_Severe_Patch\n")

for key in image_patch_scores:
    image_name = key + ".jpg"
    max_index = np.argmax(image_patch_scores[key]['patch_score'])
    Predicted_Label_Avg = np.mean(image_patch_scores[key]['patch_score'])
    Most_Severe_Patch = image_patch_scores[key]['patch_name'][max_index]
    fp.write('%s, %.4f, %s\n'%(image_name, Predicted_Label_Avg, Most_Severe_Patch))

fp.close()

## (Optional) Calculate RMSE on Test Images

In [45]:
if test_ground_truth is not None:
    fp = open(test_ground_truth, 'r')
    fp.readline()
    ground_truth = {}
    for row in fp:
        row = row.strip().split(',')
        ground_truth[row[0]] = float(row[1])
    fp.close()
    
    fp = open(result_file, 'r')
    fp.readline()
    num_images = 0.0
    RMSE = 0.0
    for row in fp:
        row = row.strip().split(',')
        RMSE += (ground_truth[row[0]] - float(row[1]))**2
        num_images += 1.0
    fp.close()
    RMSE = np.sqrt(RMSE/num_images)
print("RMSE=%.4f"%RMSE)

RMSE=0.4819
