<div class="alert alert-block alert-warning">
<b>Important note: this notebook creates folders, do not re-run this notebook!</b>
    
to re-run:
- move sorted CK+ 3 folders into "codes", empty sorted folders
- empty fer2013 "combined" folders
- empty KDEF sorted set
</div>

## 1.0 The Data

For this task, I have gathered four datasets:

- <b>CK+</b>: The Extended Cohn-Kanade (CK+) database, 593 sequences across 123 subjects, labelled 8 emotions.
- <b>FER-2013</b>: The Facial Expression Recognition 2013 (FER-2013) Dataset, 32,298 images, labelled 7 emotions.
- <b>KDEF</b>: The Karolinska Directed Emotional Faces (KDEF), 4900 images across 70 subjects, labelled 7 emotions.
- <b>GENKI-4K</b>: The MPLab GENKI Database , 4000 images, labelled 2 emotions.

This notebook will extract the images from the different datasets (which are formatted differently) and sort them into emotion folders, for further processing.

### Imports:

In [1]:
import re
import glob
import shutil
from shutil import copyfile

### Defined Functions:

In [2]:
def all_to_train(loop_list, end_directory):
    """moves images from folders into another folder"""
    for paths in loop_list:
        for image in paths:
            copyfile(image, end_directory)
            

In [3]:
def fer_to_combined(loop_list, end_directory):
    """moves images from fer2013 folders into combined folder"""
    img_num = 0
    for paths in loop_list:
        for image in paths:
            copyfile(image, end_directory %(img_num,))
            img_num +=1

***

## 1.1 Extraction and sorting

#### KDEF

KDEF data is sorted into individiual participant folder with emotion and angle infomation is stored in the file name. 

KDEF --> participant_no --> images

> e.g AM30ANHL.JPG<br>
> Participant: AM30<br>
> Emotion: AN - anger<br>
> Photo angle: HL -  half-left profile facing<br>

Goal: To keep only images facing the front and 45 degree side, then sort data into 4 folders: angry, happy, neutral, sad.

Process: 
1. To extract Images with HL, HR, S in the image name (half left profile, straight, half right profile)
2. copy file into a sorted directory of 4 emotion folders

In [4]:
# extracting and sorting KDEF files into 4 emotions:

#list for each emotion
AN=0 #Angry
HA=0 #Happy
NE=0 #Neutral
SA=0 #Sad


participants = glob.glob("../datasets/KDEF/KDEF/*")

#get folder number
for x in participants:
    
    #get folder number
    part = "%s" %x[-4:] 
    
    #for each folder
    for file in glob.glob("../datasets/KDEF/KDEF/%s/*" %part):
        
        #search using regex for emotion, HL, HR and S view
        search_an = re.search('^.*\S.\d.AN[H|S].*', file)
        
        #if search is a success
        if search_an is not None:
            
            #copy file into new directory with new file name
            _ = copyfile(file, "../datasets/KDEF/sorted_set/angry/%s.jpg" %('AN'+ str(AN),))
            AN+=1
            
        #if not, continue searching for other 3 emotions
        else:
            search_ha = re.search('^.*\S.\d.HA[H|S].*', file)
            if search_ha is not None:
                _ = copyfile(file, "../datasets/KDEF/sorted_set/happy/%s.jpg" %('HA'+ str(HA),))
                HA+=1
                
            else:
                search_ne= re.search('^.*\S.\d.NE[H|S].*', file)
                if search_ne is not None:
                    _ = copyfile(file, "../datasets/KDEF/sorted_set/neutral/%s.jpg" %('NE'+ str(NE),))
                    NE+=1    
                    
                else:
                    search_sa= re.search('^.*\S.\d.SA[H|S].*', file)
                    if search_sa is not None:
                        _ = copyfile(file, "../datasets/KDEF/sorted_set/sad/%s.jpg" %('SA'+ str(SA),))
                        SA+=1  
                        
                    else:
                        continue


***

#### CK+

CK+ Data is sorted into 2 folders: source_emotion, source_images. Image data is seprated from images and images are in sequence of pictures ranging from neutral to peak emotion.

source_emotion --> participant_no --> session_no(if any e.g 001) --> text file with emotion number (e.g 3)
source_images --> participant_no --> session_no(if any e.g 001) --> several images in sequence (not fixed qty)

e.g participant (folder:S005) has only one folder (folder:001) with "3.0" in text file= disgust, 11 images
e.g participant (folder:S010) has six folder (folder:002) with "7.0" in text file= disgust, 14 images

Goal: To extract only first 2 images for neutral, last 6 images for peak emotion and sort into emotion folder.

Process:
1. sort emotions in order of their numbers in text file
2. extract emotion_no by txt file and tag to emotions list
3. extract images by same participant number and session (first 2, last 6)
4. tag emotions and sort into emotion folders with copy
5. move folders into dataset

In [5]:
# extracting and sorting CK+ files into 8 emotions:
# pick out last 6 emotion frames and first 2 neutral frames from CK+ 

#Define emotion order
emotions = ["neutral", "anger", "contempt", "disgust", "fear", "happy", "sadness", "surprise"] 

#Returns a list of all folders with participant numbers
participants = glob.glob("source_emotion/*") 


#store current participant number
for x in participants:
    part = "%s" %x[-4:] 
    
    #Store list of sessions for current participant
    for sessions in glob.glob("%s//*" %x): 
        for files in glob.glob("%s//*" %sessions):
            current_session = files[20:-30]
            file = open(files, 'r')
            
            #emotions are encoded as a float, readline as float, then convert to integer.
            emotion = int(float(file.readline())) 
            
            #get path for last 6 images in sequence, which contains the target emotion

            sourcefile_emotion1 = sorted(glob.glob("source_images//%s//%s//*" %(part, current_session)))[-1] 
            sourcefile_emotion2 = sorted(glob.glob("source_images//%s//%s//*" %(part, current_session)))[-2] 
            sourcefile_emotion3 = sorted(glob.glob("source_images//%s//%s//*" %(part, current_session)))[-3] 
            sourcefile_emotion4 = sorted(glob.glob("source_images//%s//%s//*" %(part, current_session)))[-4]           
            sourcefile_emotion5 = sorted(glob.glob("source_images//%s//%s//*" %(part, current_session)))[-5] 
            sourcefile_emotion6 = sorted(glob.glob("source_images//%s//%s//*" %(part, current_session)))[-6] 

            #get path for first 2 images in sequence, which contains the neutral emotion
            sourcefile_neutral1 = sorted(glob.glob("source_images//%s//%s//*" %(part, current_session)))[0] 
            sourcefile_neutral2 = sorted(glob.glob("source_images//%s//%s//*" %(part, current_session)))[1] 


            #Generate path to put images
            dest_emot1 = "sorted_set//%s//%s" %(emotions[emotion], sourcefile_emotion1[25:]) 
            dest_emot2 = "sorted_set//%s//%s" %(emotions[emotion], sourcefile_emotion2[25:]) 
            dest_emot3 = "sorted_set//%s//%s" %(emotions[emotion], sourcefile_emotion3[25:]) 
            dest_emot4 = "sorted_set//%s//%s" %(emotions[emotion], sourcefile_emotion4[25:]) 
            dest_emot5 = "sorted_set//%s//%s" %(emotions[emotion], sourcefile_emotion5[25:]) 
            dest_emot6 = "sorted_set//%s//%s" %(emotions[emotion], sourcefile_emotion6[25:]) 
            dest_neut1 = "sorted_set//neutral//%s" %sourcefile_neutral1[25:] 
            dest_neut2 = "sorted_set//neutral//%s" %sourcefile_neutral2[25:] 
            
            #Copy files into desstination folders
            copyfile(sourcefile_emotion1, dest_emot1) 
            copyfile(sourcefile_emotion2, dest_emot2) 
            copyfile(sourcefile_emotion3, dest_emot3) 
            copyfile(sourcefile_emotion4, dest_emot4) 
            copyfile(sourcefile_emotion5, dest_emot5) 
            copyfile(sourcefile_emotion6, dest_emot6) 
            copyfile(sourcefile_neutral1, dest_neut1) 
            copyfile(sourcefile_neutral2, dest_neut2) 
            
            
#move folders into datasets
shutil.move("source_images","../datasets/CK/")
shutil.move("source_emotion","../datasets/CK/")
shutil.move("sorted_set","../datasets/CK/")
print("(:")

#Code credit: van Gent, P. (2016). Emotion Recognition With Python, OpenCV and a Face Dataset. 
#A tech blog about fun things with Python and embedded electronics. Retrieved from:
#http://www.paulvangent.com/2016/04/01/emotion-recognition-with-python-opencv-and-a-face-dataset/

(:


***

#### FER-2013

FER-2013 dataset is sorted into a train and validation folder then to individual emotion folders.


Goal: To combine validation set with train set, as the combined dataset will be split afterwards.

In [6]:
#fer2013 combining training and validation set

#create new combined folder in destination
#copy training folder over
shutil.copytree("../datasets/fer2013/train","../datasets/fer2013/combined")

#copy val files into folder

img_no=0
for files in glob.glob("../datasets/fer2013/validation/angry/*"):
    shutil.copy(files,"../datasets/fer2013/combined/angry/%s.jpg" %("V"+str(img_no),))
    img_no += 1
    
for files in glob.glob("../datasets/fer2013/validation/happy/*"):
    shutil.copy(files,"../datasets/fer2013/combined/happy/%s.jpg" %("V"+str(img_no),))
    img_no += 1

for files in glob.glob("../datasets/fer2013/validation/neutral/*"):
    shutil.copy(files,"../datasets/fer2013/combined/neutral/%s.jpg" %("V"+str(img_no),))
    img_no += 1
    
for files in glob.glob("../datasets/fer2013/validation/sad/*"):
    shutil.copy(files,"../datasets/fer2013/combined/sad/%s.jpg" %("V"+str(img_no),))
    img_no += 1