# Image Selection

This notebook is used to select images to display in the game. There is a major difference in the selection according to the two different task formulations. **form1** is representing the formulation "find the difference", whereas **form2** represents the task question "is there a difference?"



## If you want to run this code...
Make sure you downloaded the original dataset from https://he-dhamo.github.io/SIMSG/ . This notebook works with the data found following the link in the third line under "Downloads". This notebook should be located in the unzipped folder **CLEVR_SIMSG**. Also, you need to create a folder **image_selection** with the subfolders **form1** and **form2** with the subfolders **source** and **target**. Of course you can just as well change the folder paths in the code respectively. 

In [56]:
#imports
from PIL import Image
from random import random as rand
from random import randint
from IPython.display import display
from matplotlib import pyplot as plt
import matplotlib.pyplot as plt
from shutil import copyfile
import shutil
import os

image ID intervals of the image categories identified in the original data set: <br>
 **Object Addition**     : 0 - 5124 <br>
 **Relationship change** : 5125 - 10127 <br>
 **Object Removal**      : 10128 - 16339 <br>
 **Attribute Change**    : 16340 - 21300<br>
 <br>
Meaning the pairs of images with the id 0 - 5124 have an additional object in the target images, the pairs with the ids 5125 - 10127 convey a relationship change of some object, and so on. Please note that these information are supplied without guarantee. 

The function *get_change_categories_list* creates lists of image ids, according to the category ranges. The number of selected images per category is controlled by the parameters. For the task formulation "Is there a difference?" (form2) it is reasonable to include images that are actually identical. 

In [57]:
def get_change_categories_list(num_same_images, num_object_addition, num_relationship_change, num_attribute_change):
    #chose number of images for each category: 
    #for the task formulation "Is there a difference?"
    same_images = num_same_images

    #for the task formulation "spot the difference"
    object_addition = num_object_addition

    #the following two change categories are used in both task formulation scenarios. The two resulting lists are just split in half
    relationship_c_ = num_relationship_change
    attribute_c_=num_attribute_change
    
    #create lists to store the indices in
    addition_index,relationship_c_index,same_i,attribute_c_index = [],[],[],[]

    #control the number of selected indices by comparing the length of the respective list to the number set via the parameter
    while len(same_i)< same_images:
        rem = randint(10128,16339)
        #make sure that images are selected only once
        if rem not in same_i: 
            same_i.append(rem)


    while len(addition_index) <object_addition:
        add_= randint(0,5124)
        if add_ not in addition_index: 
            addition_index.append(add_)

    while len(relationship_c_index)<relationship_c_:
        rel = randint(5125,10127)
        if rel not in relationship_c_index: 
            relationship_c_index.append(rel)

    while len(attribute_c_index) <attribute_c_:
        attri = randint(16339,21300)
        if attri not in attribute_c_index:
            attribute_c_index.append(attri)
            
    #return the lists of indices
    return addition_index,relationship_c_index,same_i,attribute_c_index




*split_attri_and_rel_change* splits up the indices for the image categories "attribute change" and "relationship change" in half, because these categories can be used for both task formulations

In [58]:
def split_attri_and_rel_change(attribute_c_index,relationship_c_index):
    #split up the list with attribute changes in half
    middle_index = len(attribute_c_index)//2
    first_half_attri = attribute_c_index[:middle_index]
    second_half_attri = attribute_c_index[middle_index:]
    
    #split up the list with relationship changes in half
    middle_index = len(relationship_c_index)//2
    first_half_rel = relationship_c_index[:middle_index]
    second_half_rel = relationship_c_index[middle_index:]
    
    #return the split halves
    return first_half_attri,second_half_attri,first_half_rel,second_half_rel


the function *create_file_paths* takes the indix lists and creates full paths to the image pairs accordingly


In [59]:
def create_file_paths(addition_index, first_half_attri,first_half_rel, same_i, second_half_attri,second_half_rel):
    
    #there are four paths needed: for both task formulations the images that are in the source and the target folder
    path_form1_source,path_form1_target,path_form2_source, path_form2_target= [],[],[],[]
    
    #fill the lists of images for the task formulation "Spot the difference"
    for a in addition_index: 
        single_line = []
        image_path_source = ('./source/images/'+str(a)+'.png')
        image_path_target = ('./target/images/'+str(a)+'.png')

        path_form1_source.append(image_path_source)
        path_form1_target.append(image_path_target)

    for e in first_half_attri+first_half_rel:
        image_path_source = ('./source/images/'+str(e)+'.png')
        image_path_target = ('./target/images/'+str(e)+'.png')

        path_form1_source.append(image_path_source)
        path_form1_target.append(image_path_target)

    #fill the lists of images for the task formulation "Is there a difference"
    for a in same_i: 
        single_line = []
        image_path_source = ('./source/images/'+str(a)+'.png')
        image_path_target = ('./source/images/'+str(a)+'.png')

        path_form2_source.append(image_path_source)
        path_form2_target.append(image_path_target)

    for e in second_half_attri+second_half_rel:
        image_path_source = ('./source/images/'+str(e)+'.png')
        image_path_target = ('./target/images/'+str(e)+'.png')

        path_form2_source.append(image_path_source)
        path_form2_target.append(image_path_target)
    
    #return the image pair file paths
    return path_form1_source,path_form1_target,path_form2_source,path_form2_target

*copy_image_files* creates copies of the selected images into the folder "image_selection"

In [70]:
def copy_image_files(path_form1_source,path_form1_target,path_form2_source,path_form2_target, basic_p):
    #delete all image files that are currently in the path folders
    folder_list = ['./../image_selection/form1/source','./../image_selection/form1/target','./../image_selection/form2/target','./../image_selection/form2/source']
    for e in folder_list: 
        filelist = [ f for f in os.listdir(e)]
        for f in filelist:
            os.remove(os.path.join(e, f))
            
    basic_path = basic_p
    
    for e in path_form1_target: 
        source='/CLEVR_SIMSG'+e[1:]
        target = basic_path+'image_selection/form1'+e[1:].replace("/images","")
        shutil.copyfile(source,target)
        
    for e in path_form1_source: 
        source='/CLEVR_SIMSG'+e[1:]
        target = basic_path+'image_selection/form1'+e[1:].replace("/images","")
        shutil.copyfile(source,target)


    for e in path_form2_source:  
        source='/CLEVR_SIMSG'+e[1:]
        target = basic_path+'image_selection/form2'+e[1:].replace("/images","")
        shutil.copyfile(source,target)
        
    for e in path_form2_target: 
        source='/CLEVR_SIMSG'+e[1:]
        #since there are identical images to be copied for the form2, the original paths are located in the "source" folder
        # therefore, in order to copy the images from the source folder into a source and a target folder, 
        #the path needs to be adjusted:
        target = basic_path+'image_selection/form2'+e[1:].replace("/images","").replace("source","target")
        shutil.copyfile(source,target)

    
    

the idea behind the numbers of images per category is as follows: <br>
For the form2, there are images and without differences required. A ratio of 50% is the simplest way to go. The list of the indices with attribute and relationship change images are split in half at a later point. 
Thus, the number of image pairs for form2 will be composed as follows: <br>
150 image pairs of same images +<br>
75 image pairs with relationship changes +<br>
75 image pairs with attribute changes <br>
<br>
a change in the number of objects will presumably be found too easily, therefore this change type is only considered for form1.



In [79]:
#the order of the parameters is: number of same images, number of images with an object addition,
# number of images with a relationship change, number of images with an attribute change
addition_index,relationship_index,same_i,attri_index=get_change_categories_list(150,75,150,150)



In [80]:
#create halves of the index lists of attribute and relationship changes
first_half_attri,second_half_attri,first_half_rel,second_half_rel= split_attri_and_rel_change(attri_index,relationship_index)


In [81]:
#create the file paths
path_form1_source,path_form1_target,path_form2_source,path_form2_target=create_file_paths(addition_index, first_half_attri,first_half_rel, same_i, second_half_attri,second_half_rel)


In [82]:
#copy the images
#depending on the number of images this step can take a while
copy_image_files(path_form1_source,path_form1_target,path_form2_source,path_form2_target, 'D:/')