# Train custom landmark localiser

https://www.pyimagesearch.com/2019/12/16/training-a-custom-dlib-shape-predictor/

http://dlib.net/train_shape_predictor.py.html

https://docs.opencv.org/master/d6/d49/md__build_master-contrib_docs-lin64_opencv_contrib_modules_face_tutorials_face_landmark_face_landmark_trainer.html -- only in c++


https://stackoverflow.com/questions/36908402/dlib-training-shape-predictor-for-194-landmarks-helen-dataset

-------

First of all, the idea was to create the xml file based on the coordinates from the txt files asociated with each of the images. That is what I eventually ended up doing with some improvement. For that, first of all I had to rename the txt files so that they match the image name. Each txt file has the following structure:
```
100032540_1
565.86 , 758.98
564.27 , 781.14
563.87 , 805.77
566.21 , 829.85
569.75 , 856.19
...
```
The first line is the name of the image, then there are (x , y) coordinates of the landmarks. This is done in:
1. [Rename and organise txt files with landmarks](#Rename-and-organise-txt-files-with-landmarks)

There was a problem, however, how to read those txt files. Two ways, ```np.genfromtxt``` and ```np.loadtxt``` were proposed but since `np` returns 0D array (`array('100032540_1', dtype='<U11')`) which I do not know how to work with, I used `pandas` to get the name out and then load the txt using `np`. The logic is then this: open up a text file and find the relevant image, then save this txt coordinates with the image name. See [this section](#Working-piece-for-pairing-landmark-files-with-image-names). Based on the landmarks positions on the face ([plot](#Plot-the-coordinates-so-that-I-see-the-landmarks)).

2. Then I used the csv to [split the dataset](#Split-the-dataset) into testing and training part. The training set (names, `training_set.csv`) was taken from the [HELEN](http://www.ifp.illinois.edu/~vuongle2/helen/) dataset webpage (see [training names](http://www.ifp.illinois.edu/~vuongle2/helen/data/trainnames.txt)).


3. What followed was the idea of using the ibug dataset and modify the existing file. [Create the xml based on an existing xml](#Create-the-xml-based-on-an-existing-xml) has a few parts.

  a. [Get only HELEN pictures](#Extract-Helen-images-from-ibug-full-dataset) is a snipet which extracts just the HELEN dataset related entries.
  
  b. [Split them on the fly to test and train, remove mirror images](#Split-them-on-the-fly-to-test-and-train,-remove-mirror-images) is a bit heavier part. It does a few things. *First*, it implements *a)*. *Second*, there are images which are mirror images of the ones in the dataset. It removes them from the list. **TODO** would be to implement them. For that we would need to get the coordinates of the mirrored landmarks or to [mirror them](https://github.com/davisking/dlib/issues/1003) by ourselves. Anyway, for now, they're removed. *Third*, it renames the file path to fit our needs. *Lastly*, it [splits the images](#Split-based-on-the-csv-data-file) into test and train.
  
  **That gives ->** *helen_no_mirror_test.xml* and *helen_no_mirror_train.xml*.
  
4. [Concat/create xml based on the existing coordinates and the box from HELEN](#Concat/create-xml-based-on-the-existing-coordinates-and-the-box-from-HELEN) 
is the last part which builds on all the previous parts. It is written very badly and slowly but I did not find a better way. Basically, it does the same job as *3.* but it add the box tag from *4.*. It iterates again over all of the txt files, then it finds the corresponding reg in the *helen_no_mirror...* files, adds it, then finds and adds coordinates which I need. There must be a better way of doing this (eg replace the 68 <part ..> tags with the 154 <part ..> tags which we want. But I didn't figure it out and it didn't worth my time. Eventually, this yields two files: *training_set_linesep_box.xml* and *testing_set_linesep_box.xml*. They are line separated (with "\n") to be readable and they have the box tag.
    
5. [Train the shape predictor](#Train-the-shape-predictor) using the code from [dlib](http://dlib.net/train_shape_predictor.py.html) which follow the [Kazemi](https://www.csc.kth.se/~vahidk/papers/KazemiCVPR14.pdf) paper.


## Rename and organise txt files with landmarks

In [1]:
import os
import dlib,cv2
import numpy as np
import pandas as pd

# Define path
path = os.getcwd()
coordinates_path = os.path.join(path,'annotation')
image_path = os.path.join(path,'images')

In [2]:
# Get list of images and coordinate file names in the folders
image_list = os.listdir(image_path)
annotation_list = os.listdir(coordinates_path)

In [29]:
np.genfromtxt(os.path.join(coordinates_path,os.listdir(coordinates_path)[0]), max_rows=1, dtype=str)

array('100032540_1', dtype='<U11')

In [18]:
np.loadtxt(os.path.join(coordinates_path,os.listdir(coordinates_path)[0]), max_rows=1, dtype=str)

array('100032540_1', dtype='<U11')

### Working piece for pairing landmark files with image names
Open up a text file and find the relevant image

In [3]:
# create dir for order labels
os.mkdir(os.path.join(coordinates_path,"label"))

# Save the coordinates based on the file name and without the header
for i,name in enumerate(annotation_list):
    tmp_name = pd.read_csv(os.path.join(coordinates_path, name), nrows=0).keys()[0]
    tmp_coordinates = np.genfromtxt(os.path.join(coordinates_path, name), skip_header=1, delimiter=",", dtype=int)
    
    # rename it
    np.savetxt(os.path.join(coordinates_path, "label",f"{tmp_name}.txt"), tmp_coordinates, fmt="%d")

#### Plot the coordinates so that I see the landmarks

In [4]:
color = (100,100,255)
for i,name in enumerate(annotation_list[:1]):
    tmp_name = pd.read_csv(os.path.join(coordinates_path, name), nrows=0).keys()[0]
    tmp_coordinates = np.genfromtxt(os.path.join(coordinates_path, name), skip_header=1, delimiter=",", dtype=int)
    
    image= cv2.imread(os.path.join(image_path, tmp_name+'.jpg'))
    
    # show the points
    for p in tmp_coordinates:
        cv2.circle(image, (p[0], p[1]), 1, color, thickness=-1)
    
    cv2.imshow("Face landmark result", image)

    # Pause screen to wait key from user to see result
    cv2.waitKey(0)
    cv2.destroyAllWindows()    

In [5]:
image= cv2.imread(os.path.join(image_path, tmp_name+'.jpg'))

# show the points
for p in tmp_coordinates:
    cv2.circle(image, (p[0], p[1]), 1, color, thickness=-1)

# cv2.imshow("Face landmark result", image)

# show one specific point
idx=41
cv2.circle(image, tuple(tmp_coordinates[idx]), 1, (250,250,0), thickness=10) 

# save
cv2.imwrite(f"test.png", image)

# # Pause screen to wait key from user to see result
# cv2.waitKey(0)
# cv2.destroyAllWindows()   

True

## Split the dataset

In [6]:
# split data based on the HELEN dataset
training_set = np.genfromtxt(os.path.join(path,'training_set.csv'), dtype=str).tolist()
testing_set = list(set([x.split(".")[0] for x in image_list])-set(training_set))
print("Length of testing set:", len(testing_set))
print("Length of training set:", len(training_set))

Length of testing set: 330
Length of training set: 2000


## Create the xml based on an existing xml

Data extracted from *ibug_300W_large_face_landmark_dataset* (can be downloaded [here](http://dlib.net/files/data/)), then adapted for our use using similar logic to what has been described here https://www.pyimagesearch.com/2019/12/16/training-a-custom-dlib-shape-predictor/. The xml file itself is, however, also part of this folder and can be used from there.

#### Extract Helen images from ibug full dataset

In [7]:
import os, re
import dlib, cv2
import time
import pandas as pd

path2xml = 'C:\\Users\\janka.WISMAIN\\Downloads\\ibug_300W_large_face_landmark_dataset\\ibug_300W_large_face_landmark_dataset\\ibug_300W_large_face_landmark_dataset\\'
xml_filename = 'labels_ibug_300W.xml'

# time it
start = time.time()

# split into a list of "images" based on the tag
sep="</image>\n"
split_img_tag = [x + sep for x in open(os.path.join(path2xml, xml_filename)).read().strip().split(sep)]
# select only the "image tags" which have HELEN dataset in them
helen_xml = [x for x in split_img_tag if "<image file='helen" in x]

# save helen part
save_helen_xml = ''.join(helen_xml)
open("helen.xml", "w").write(save_helen_xml)

end = time.time()
print(f"Process took {end-start} seconds")

Process took 0.5227799415588379 seconds


In [8]:
helen_xml

["  <image file='helen/testset/296814969_3.jpg'>\n    <box top='262' left='297' width='310' height='311'>\n      <part name='00' x='265' y='352'/>\n      <part name='01' x='271' y='392'/>\n      <part name='02' x='282' y='428'/>\n      <part name='03' x='294' y='464'/>\n      <part name='04' x='315' y='498'/>\n      <part name='05' x='345' y='526'/>\n      <part name='06' x='382' y='551'/>\n      <part name='07' x='421' y='562'/>\n      <part name='08' x='461' y='560'/>\n      <part name='09' x='489' y='549'/>\n      <part name='10' x='512' y='523'/>\n      <part name='11' x='532' y='495'/>\n      <part name='12' x='553' y='462'/>\n      <part name='13' x='565' y='427'/>\n      <part name='14' x='565' y='393'/>\n      <part name='15' x='562' y='360'/>\n      <part name='16' x='562' y='326'/>\n      <part name='17' x='331' y='327'/>\n      <part name='18' x='351' y='311'/>\n      <part name='19' x='375' y='304'/>\n      <part name='20' x='400' y='305'/>\n      <part name='21' x='423' y=

In [9]:
len(helen_xml)

4660

### Split them on the fly to test and train, remove mirror images

Since iBUG contains mirror images to make the training dataset larger but there are no mirror points for the rest of the landmarks, we need to remove these extra labels from the xml file. Later, hopefully, I will get also to the part of creating a mirror set for the HELEN set. At the moment, there is none.

In [10]:
import os, re
import dlib, cv2
import time

path2xml = 'C:\\Users\\janka.WISMAIN\\Downloads\\ibug_300W_large_face_landmark_dataset\\ibug_300W_large_face_landmark_dataset\\ibug_300W_large_face_landmark_dataset\\'
xml_filename = 'labels_ibug_300W.xml'

# time
start = time.time()

# split into a list of "images" based on the tag
sep="</image>\n"
split_img_tag = [x + sep for x in open(os.path.join(path2xml, xml_filename)).read().strip().split(sep)]
# select only the "image tags" which have HELEN dataset in them
helen_xml = [x for x in split_img_tag if "<image file='helen" in x]

helen_xml_no_mirror = [x for x in helen_xml if "_mirror" not in x]
# helen_xml_no_mirror_train = [x for x in helen_xml_no_mirror if "trainset" in x]
# helen_xml_no_mirror_test = [x for x in helen_xml_no_mirror if "testset" in x]

# change the path -- since we have all images in one folder
helen_xml_no_mirror = [x.replace('helen/testset', 'images') for x in helen_xml_no_mirror]
helen_xml_no_mirror = [x.replace('helen/trainset', 'images') for x in helen_xml_no_mirror]

# save helen part
save_helen_xml_no_mirror = ''.join(helen_xml_no_mirror)
open("helen_no_mirror_path.xml", "w").write(save_helen_xml_no_mirror)

end = time.time()
print("Execution took ", end-start)

took  0.2521810531616211


In [11]:
len(helen_xml_no_mirror)

2330

#### Split based on the csv data file

In [12]:
# split based on the csv data file

# load the training set data
training_names = pd.read_csv('./training_set.csv', header=None).iloc[:,0].tolist()

test_helen_xml = []
train_helen_xml = []

_ = [train_helen_xml.append(x) if x.split(".jpg")[0].split("/")[1] in training_names else test_helen_xml.append(x) for x in helen_xml_no_mirror]

print("Length of testing set:", len(test_helen_xml))
print("Length of training set:", len(train_helen_xml))
print("Length of training names as appear in the csv from HELEN database:", len(training_names))

Length of testing set: 330
Length of training set: 2000
Length of training names as appear in the csv from HELEN database: 2000


In [13]:
# save helen part
save_test_helen_xml = ''.join(test_helen_xml)
open("helen_no_mirror_test.xml", "w").write(save_test_helen_xml)

save_train_helen_xml = ''.join(train_helen_xml)
open("helen_no_mirror_train.xml", "w").write(save_train_helen_xml)

save_full_helen_xml = ''.join(helen_xml_no_mirror)
open("helen_no_mirror_full.xml", "w").write(save_full_helen_xml)


6687908

## Concat/create xml based on the existing coordinates and the box from HELEN

At the moment, we are able to extract (x,y) coordinates from the txt files for each image and create a functioning xml. However, it is not functioning since it needs the box tag. That is in the Helen dataset, which is extracted in the code above. So one would just need to replace the <part ...> tag in the Helen with the points from the txt files. That I do not know how to do...
    
Therefore, I am creating a new txt and then I'm searching for the correct <box..> tag in the 68 point Helen dataset (I know, it's a mess, if someone has a better solution how to do this, I would very appreciate it.). Then it is simple -- I'm creating the xml on the fly, going over all the images and their corresponding coordinates and merging them with the box information.

In [14]:
# test if all is working and we get box label as we want

train_helen_68="helen_no_mirror_train.xml"
test_helen_68="helen_no_mirror_test.xml"

# load the 68 helen dataset which we created above, split it based on lines (crete a list)
load_data = open(os.path.join(path, test_helen_68)).read().strip().split('\n')


# go over all the testing images
for tmp_name in testing_set:
    
    # go over the lines
    for i, line in enumerate(load_data):
        # and if you find the image we are now making labels for
        if tmp_name in line:
            # take the next element, that will be the box
            print(load_data[i+1])
            # and leave
            break

    <box top='738' left='738' width='772' height='772'>
    <box top='315' left='315' width='372' height='372'>
    <box top='228' left='21' width='311' height='310'>
    <box top='315' left='356' width='373' height='372'>
    <box top='281' left='278' width='281' height='278'>
    <box top='536' left='832' width='1332' height='1332'>
    <box top='328' left='427' width='447' height='447'>
    <box top='850' left='770' width='827' height='814'>
    <box top='232' left='108' width='373' height='373'>
    <box top='453' left='453' width='536' height='536'>
    <box top='578' left='989' width='925' height='925'>
    <box top='772' left='346' width='1918' height='1918'>
    <box top='193' left='90' width='311' height='311'>
    <box top='315' left='315' width='372' height='372'>
    <box top='315' left='356' width='373' height='372'>
    <box top='570' left='570' width='1110' height='1110'>
    <box top='477' left='526' width='447' height='447'>
    <box top='783' left='886' width='925' he

    <box top='116' left='116' width='643' height='643'>
    <box top='156' left='334' width='536' height='536'>
    <box top='262' left='297' width='310' height='311'>
    <box top='680' left='783' width='926' height='926'>
    <box top='297' left='297' width='310' height='310'>
    <box top='693' left='1063' width='1110' height='1111'>
    <box top='315' left='356' width='373' height='372'>
    <box top='80' left='279' width='446' height='447'>
    <box top='940' left='940' width='1110' height='1110'>
    <box top='684' left='388' width='1332' height='1332'>
    <box top='187' left='401' width='643' height='643'>
    <box top='193' left='228' width='310' height='311'>
    <box top='77' left='86' width='1232' height='1229'>
    <box top='56' left='90' width='311' height='310'>
    <box top='681' left='651' width='782' height='781'>
    <box top='315' left='273' width='373' height='372'>
    <box top='315' left='356' width='373' height='372'>
    <box top='378' left='378' width='447' he

#### Training set

In [17]:
start = time.time()
train_helen_68="helen_no_mirror_train.xml"

# load the 68 helen dataset which we created above, split it based on lines (crete a list)
load_data = open(os.path.join(path, train_helen_68)).read().strip().split('\n')

x = "<?xml version='1.0' encoding='ISO-8859-1'?>\n<dataset>\n<name>Helen face point dataset - training set images for 194 landmarks with box</name>\n <images>\n"  

# go over all the training images
for tmp_name in training_set:
    # write image name
#     x += f"<image file={os.path.join(image_path, tmp_name+'.jpg')}> \n"
    x += f"  <image file='{'images/'+tmp_name+'.jpg'}'> \n"
    
    # go over the lines
    for i, line in enumerate(load_data):
        # and if you find the image we are now making labels for
        if tmp_name in line:
            # take the next element, that will be the box
#             print(load_data[i+1])
            x += load_data[i+1]+" \n"
            # and leave
            break
    
    # write coordinates
    tmp_coordinates = np.genfromtxt(os.path.join(coordinates_path, "label",f"{tmp_name}.txt"), dtype=int)

    # append points until 10
    for i, tag in enumerate(tmp_coordinates[:9]):
        i +=1 # stupid but I don't know how else to increment i
        x += f"    <part name='00{i}' x='{tag[0]}' y='{tag[1]}'/> \n"

    # append after 10
    for tag in tmp_coordinates[9:99]:
        i +=1 
        x += f"      <part name='0{i}' x='{tag[0]}' y='{tag[1]}'/> \n"
        
    # append after 100
    for tag in tmp_coordinates[99:]:
        i +=1 
        x += f"      <part name='{i}' x='{tag[0]}' y='{tag[1]}'/> \n"    

    x += "    </box> \n  </image>\n"

x += " </images>\n</dataset>\n"

open("training_set_linesep_box.xml", "w").write(x)

end = time.time()
print(f"Process took {end-start} seconds")

Process took 58.68884563446045 seconds


#### Testing set

In [16]:
test_helen_68="helen_no_mirror_test.xml"

# load the 68 helen dataset which we created above, split it based on lines (crete a list)
load_data = open(os.path.join(path, test_helen_68)).read().strip().split('\n')

x = "<?xml version='1.0' encoding='ISO-8859-1'?>\n<dataset>\n<name>Helen face point dataset - testing set images for 194 landmarks with box</name>\n <images>\n"  

# go over all the testing images
for tmp_name in testing_set:
    # write image name
#     x += f"<image file={os.path.join(image_path, tmp_name+'.jpg')}> \n"
    x += f"  <image file='{'images/'+tmp_name+'.jpg'}'> \n"
    
    # go over the lines
    for i, line in enumerate(load_data):
        # and if you find the image we are now making labels for
        if tmp_name in line:
            # take the next element, that will be the box
            x += load_data[i+1]+" \n"
            # and leave
            break
    
    # write coordinates
    tmp_coordinates = np.genfromtxt(os.path.join(coordinates_path, "label",f"{tmp_name}.txt"), dtype=int)

    # append points until 10
    for i, tag in enumerate(tmp_coordinates[:9]):
        i +=1 # stupid but I don't know how else to increment i
        x += f"    <part name='00{i}' x='{tag[0]}' y='{tag[1]}'/> \n"

    # append after 10
    for tag in tmp_coordinates[9:99]:
        i +=1 
        x += f"      <part name='0{i}' x='{tag[0]}' y='{tag[1]}'/> \n"
        
    # append after 100
    for tag in tmp_coordinates[99:]:
        i +=1 
        x += f"      <part name='{i}' x='{tag[0]}' y='{tag[1]}'/> \n"    

    x += "    </box> \n  </image>\n"

x += " </images>\n</dataset>\n"

open("testing_set_linesep_box.xml", "w").write(x)

2763078

#### Full

In [18]:
# check that we have all
set([x.split(".")[0] for x in image_list]) - (set(training_set)|set(testing_set))

set()

In [19]:
start = time.time()

full_helen_68="helen_no_mirror_full.xml"

# load the 68 helen dataset which we created above, split it based on lines (crete a list)
load_data = open(os.path.join(path, full_helen_68)).read().strip().split('\n')

x = "<?xml version='1.0' encoding='ISO-8859-1'?>\n<dataset>\n<name>Helen face point dataset - full set images for 194-40 landmarks with box</name>\n <images>\n"  

# go over all the testing images
for tmp_name in list((set(training_set)|set(testing_set))):
    # write image name
#     x += f"<image file={os.path.join(image_path, tmp_name+'.jpg')}> \n"
    x += f"  <image file='{'images/'+tmp_name+'.jpg'}'> \n"
    
    # go over the lines
    for i, line in enumerate(load_data):
        # and if you find the image we are now making labels for
        if tmp_name in line:
            # take the next element, that will be the box
            x += load_data[i+1]+" \n"
            # and leave
            break
    
    # write coordinates
    tmp_coordinates = np.genfromtxt(os.path.join(coordinates_path, "label",f"{tmp_name}.txt"), dtype=int)

    # append points until 10
    for i, tag in enumerate(tmp_coordinates[:9]):
        i +=1 # stupid but I don't know how else to increment i
        x += f"    <part name='00{i}' x='{tag[0]}' y='{tag[1]}'/> \n"

    # append after 10
    for tag in tmp_coordinates[9:99]:
        i +=1 
        x += f"      <part name='0{i}' x='{tag[0]}' y='{tag[1]}'/> \n"
        
    # append after 100
    for tag in tmp_coordinates[99:]:
        i +=1 
        x += f"      <part name='{i}' x='{tag[0]}' y='{tag[1]}'/> \n"  

    x += "    </box> \n  </image>\n"

x += " </images>\n</dataset>\n"

open("full_set_linesep_box.xml", "w").write(x)

end = time.time()
print(f"Process took {end-start} seconds")

Process took 84.09478378295898 seconds


## NOTE
If we want to use just part of the landmarks (eg. we are only interested in eyes), we can just simply find the corresponding numbers and let *i* iterate only over those parts. Allows for much simpler and easier manipulation.


## Train the shape predictor

Based on the code from [dlib](http://dlib.net/train_shape_predictor.py.html) which follow the [Kazemi](https://www.csc.kth.se/~vahidk/papers/KazemiCVPR14.pdf) paper.

> Parameters:  Unless specified,  all the  experiments areperformed with the following fixed parameter settings. The number of strong regressors, rt, in the cascade is T=10 and each rt comprises of K= 500 weak regressors gk. The depth of the trees (or ferns) used to represent gk is set to F= 5.  At each level of the cascade P= 400 pixel locations are sampled from the image. To train the weak regressors, we randomly sample a pair of these P pixel locations according to our prior and choose a random threshold to create a potential split as described in equation (9).  The best split is then found by repeating this process S= 20 times, and choosing the one that optimizes our objective. To create the training data to learn our model we use R= 20 different initializations for each training example.


Default?
```
Training with cascade depth: 10
Training with tree depth: 4
Training with 500 trees per cascade level.
Training with nu: 0.1
Training with random seed:
Training with oversampling amount: 20
Training with oversampling translation jitter: 0
Training with landmark_relative_padding_mode: 1
Training with feature pool size: 400
Training with feature pool region padding: 0
Training with 0 threads.
Training with lambda_param: 0.1
Training with 20 split tests.
Fitting trees...
```

```python
options.be_verbose = True

# dlib.train_shape_predictor() does the actual training.  It will save the
# final predictor to predictor.dat.  The input is an XML file that lists the
# images in the training dataset and also contains the positions of the face
# parts.
training_xml_path = os.path.join(coordinates_path, "training_set.xml")
dlib.train_shape_predictor(training_xml_path, "predictor.dat", options)

# Now that we have a model we can test it.  dlib.test_shape_predictor()
# measures the average distance between a face landmark output by the
# shape_predictor and where it should be according to the truth data.
print("\nTraining accuracy: {}".format(
    dlib.test_shape_predictor(training_xml_path, "predictor.dat")))
# The real test is to see how well it does on data it wasn't trained on.  We
# trained it on a very small dataset so the accuracy is not extremely high, but
# it's still doing quite good.  Moreover, if you train it on one of the large
# face landmarking datasets you will obtain state-of-the-art results, as shown
# in the Kazemi paper.
testing_xml_path = os.path.join(faces_folder, "testing_with_face_landmarks.xml")
print("Testing accuracy: {}".format(
    dlib.test_shape_predictor(testing_xml_path, "predictor.dat")))

# Now let's use it as you would in a normal application.  First we will load it
# from disk. We also need to load a face detector to provide the initial
# estimate of the facial location.
predictor = dlib.shape_predictor("predictor.dat")
detector = dlib.get_frontal_face_detector()

# Now let's run the detector and shape_predictor over the images in the faces
# folder and display the results.
print("Showing detections and predictions on the images in the faces folder...")
win = dlib.image_window()
for f in glob.glob(os.path.join(faces_folder, "*.jpg")):
    print("Processing file: {}".format(f))
    img = dlib.load_rgb_image(f)

    win.clear_overlay()
    win.set_image(img)

    # Ask the detector to find the bounding boxes of each face. The 1 in the
    # second argument indicates that we should upsample the image 1 time. This
    # will make everything bigger and allow us to detect more faces.
    dets = detector(img, 1)
    print("Number of faces detected: {}".format(len(dets)))
    for k, d in enumerate(dets):
        print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
            k, d.left(), d.top(), d.right(), d.bottom()))
        # Get the landmarks/parts for the face in box d.
        shape = predictor(img, d)
        print("Part 0: {}, Part 1: {} ...".format(shape.part(0),
                                                  shape.part(1)))
        # Draw the face landmarks on the screen.
        win.add_overlay(shape)

    win.add_overlay(dets)
    dlib.hit_enter_to_continue()
```    

In [141]:
options = dlib.shape_predictor_training_options()

In [None]:
# Now make the object responsible for training the model.
# This algorithm has a bunch of parameters you can mess with.  The
# documentation for the shape_predictor_trainer explains all of them.
# You should also read Kazemi's paper which explains all the parameters
# in great detail.  However, here I'm just setting three of them
# differently than their default values.  I'm doing this because we
# have a very small dataset.  In particular, setting the oversampling
# to a high amount (300) effectively boosts the training set size, so
# that helps this example.
options.oversampling_amount = 300
# I'm also reducing the capacity of the model by explicitly increasing
# the regularization (making nu smaller) and by using trees with
# smaller depths.
options.nu = 0.05
options.tree_depth = 2

In [256]:
start = time.time()
options.be_verbose = True

# dlib.train_shape_predictor() does the actual training.  It will save the
# final predictor to predictor.dat.  The input is an XML file that lists the
# images in the training dataset and also contains the positions of the face
# parts.
training_xml_path = os.path.join('./', "training_set_linesep_box.xml")
dlib.train_shape_predictor(training_xml_path, "predictor_helen_194lm.dat", options)

end = time.time()
print("Process took ", end-start)
print("in mins: ", (end-start)/60)

took  2973.901990890503


#### Train on full dataset

**NOT RECOMMENDED** -- does not give any estimate of accuracy

In [253]:
options = dlib.shape_predictor_training_options()

In [254]:
# options.oversampling_amount = 300
# options.nu = 0.05
# options.tree_depth = 2

start = time.time()
options.be_verbose = True

training_xml_path = os.path.join('./', "full_set_linesep_box.xml")
dlib.train_shape_predictor(training_xml_path, "predictor_helen_194lm_full.dat", options)

end = time.time()
print("Process took ", end-start)
print("in mins: ", (end-start)/60)

took  3546.8384199142456
in mins:  59.113973665237424


### Check accuracy

In [258]:
# Now that we have a model we can test it.  dlib.test_shape_predictor()
# measures the average distance between a face landmark output by the
# shape_predictor and where it should be according to the truth data.
print("Test-train split")
print("\nTraining accuracy: {}".format(
    dlib.test_shape_predictor(training_xml_path, "predictor_helen_194lm.dat")))
# The real test is to see how well it does on data it wasn't trained on.  We
# trained it on a very small dataset so the accuracy is not extremely high, but
# it's still doing quite good.  Moreover, if you train it on one of the large
# face landmarking datasets you will obtain state-of-the-art results, as shown
# in the Kazemi paper.
testing_xml_path = os.path.join('./', "testing_set_linesep_box.xml")
print("Testing accuracy: {}".format(
    dlib.test_shape_predictor(testing_xml_path, "predictor_helen_194lm.dat")))

Test-train split

Training accuracy: 5.7563198754141585
Testing accuracy: 13.456272056904027


In [255]:
print("Full dataset")
print("\nTraining accuracy: {}".format(
    dlib.test_shape_predictor(training_xml_path, "predictor_helen_194lm_full.dat")))
testing_xml_path = os.path.join('./', "testing_set_linesep_box.xml")
print("Testing accuracy: {}".format(
    dlib.test_shape_predictor(testing_xml_path, "predictor_helen_194lm_full.dat")))


Training accuracy: 6.333507979325041
Testing accuracy: 6.731640684105537


### Test on examples

In [None]:
import glob

# not in the Github folder, generally, once can create their own testing set and try
faces_folder = './test_faces/'

# Now let's use it as you would in a normal application.  First we will load it
# from disk. We also need to load a face detector to provide the initial
# estimate of the facial location.
predictor = dlib.shape_predictor("predictor_helen_194lm.dat")
detector = dlib.get_frontal_face_detector()

# Now let's run the detector and shape_predictor over the images in the faces
# folder and display the results.
print("Showing detections and predictions on the images in the faces folder...")
win = dlib.image_window()
for f in glob.glob(os.path.join(faces_folder, "*.png")):
    print("Processing file: {}".format(f))
    img = dlib.load_rgb_image(f)

    win.clear_overlay()
    win.set_image(img)

    # Ask the detector to find the bounding boxes of each face. The 1 in the
    # second argument indicates that we should upsample the image 1 time. This
    # will make everything bigger and allow us to detect more faces.
    dets = detector(img, 1)
    print("Number of faces detected: {}".format(len(dets)))
    for k, d in enumerate(dets):
        print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
            k, d.left(), d.top(), d.right(), d.bottom()))
        # Get the landmarks/parts for the face in box d.
        shape = predictor(img, d)
        print("Part 0: {}, Part 1: {} ...".format(shape.part(0),
                                                  shape.part(1)))
        # Draw the face landmarks on the screen.
        win.add_overlay(shape)

    win.add_overlay(dets)
    dlib.hit_enter_to_continue()