# EMIP Toolkit Examples:

In this file we will show examples of the main functionalities included in the EMIP Toolkit.

This includes:  
•	Reading Raw Data Files from EMIP Dataset into Toolkit Containers.  
•	Applying a Fixation Filter to Raw Data.  
•	Raw Data and Filtered Fixation Visualization.  
•	Apply Fixation Correction Through Offset.  
•	Undo Applied Offset.  
•	Generate AOIs for any EMIP Trial.  
•	Draw AOIs over Trial Image.  
•	Add Text Tokens to Generated AOIs.  
•	Add srcML Tags to AOIs and Tokens.  
•	Hit Test Between Fixations and AOIs.
 

In [4]:
import emip_toolkit as tk
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
The history saving thread hit an unexpected error (OperationalError('database or disk is full')).History will not be written to the database.


# Reading Raw Data Files from EMIP Dataset:  

1. Download the EMIP dataset using the download method 
2. The dataset should be in a folder called EMIPData in the parent directry to the directory where this tutorial is.  
3. The folder structure should look like the following:  
  
-parent_dir
    + EMIPData
        + EMIP-Toolkit- replication package
            + emip_dataset  
                    + rawdata
                    + EMIP_DataCollection_Materials
            + current_directory  
                    + EMIP_Toolkit_Examples


In [5]:
data_path = tk.download('EMIP')

# gets the structured data of 10 subjects
EMIP = tk.EMIP_dataset(data_path + '/EMIP-Toolkit- replication package/emip_dataset/rawdata/', 10)   

print('number of subjects: ', len(EMIP))
print('subject ID: ', EMIP['100'].trial[0].get_subject_id())
print('number of trials: ', EMIP['100'].get_number_of_trials())
print('number of samples in trial: ',EMIP['100'].trial[0].get_sample_number())

Please cite this paper:  https://dl.acm.org/doi/abs/10.1145/3448018.3457425
parsing file: ./datasets/EMIP/EMIP-Toolkit- replication package/emip_dataset/rawdata/100_rawdata.tsv
parsing file: ./datasets/EMIP/EMIP-Toolkit- replication package/emip_dataset/rawdata/101_rawdata.tsv
parsing file: ./datasets/EMIP/EMIP-Toolkit- replication package/emip_dataset/rawdata/102_rawdata.tsv
parsing file: ./datasets/EMIP/EMIP-Toolkit- replication package/emip_dataset/rawdata/103_rawdata.tsv
parsing file: ./datasets/EMIP/EMIP-Toolkit- replication package/emip_dataset/rawdata/104_rawdata.tsv
parsing file: ./datasets/EMIP/EMIP-Toolkit- replication package/emip_dataset/rawdata/105_rawdata.tsv
parsing file: ./datasets/EMIP/EMIP-Toolkit- replication package/emip_dataset/rawdata/106_rawdata.tsv
parsing file: ./datasets/EMIP/EMIP-Toolkit- replication package/emip_dataset/rawdata/107_rawdata.tsv
parsing file: ./datasets/EMIP/EMIP-Toolkit- replication package/emip_dataset/rawdata/108_rawdata.tsv
parsing file: .

In [6]:
# If already downloaded
data_path = './datasets/EMIP/'
EMIP = tk.EMIP_dataset(data_path + '/EMIP-Toolkit- replication package/emip_dataset/rawdata/', 10)  

parsing file: ./datasets/EMIP//EMIP-Toolkit- replication package/emip_dataset/rawdata/100_rawdata.tsv
parsing file: ./datasets/EMIP//EMIP-Toolkit- replication package/emip_dataset/rawdata/101_rawdata.tsv
parsing file: ./datasets/EMIP//EMIP-Toolkit- replication package/emip_dataset/rawdata/102_rawdata.tsv
parsing file: ./datasets/EMIP//EMIP-Toolkit- replication package/emip_dataset/rawdata/103_rawdata.tsv
parsing file: ./datasets/EMIP//EMIP-Toolkit- replication package/emip_dataset/rawdata/104_rawdata.tsv
parsing file: ./datasets/EMIP//EMIP-Toolkit- replication package/emip_dataset/rawdata/105_rawdata.tsv
parsing file: ./datasets/EMIP//EMIP-Toolkit- replication package/emip_dataset/rawdata/106_rawdata.tsv
parsing file: ./datasets/EMIP//EMIP-Toolkit- replication package/emip_dataset/rawdata/107_rawdata.tsv
parsing file: ./datasets/EMIP//EMIP-Toolkit- replication package/emip_dataset/rawdata/108_rawdata.tsv
parsing file: ./datasets/EMIP//EMIP-Toolkit- replication package/emip_dataset/rawd

# Applying a Fixation Filter to Raw Data:

EMIP Toolkit implements a dispersion-based fixation detection algorithm (I-DT). The technique uses a duration window over the gaze data with a length equal to the minimum fixation duration threshold. In addition to this time window, samples are added until sample spacial dispersion exceeds the maximum dispersion threshold. 

The fixation filter parameters have default values of:  
* minimum_duration 50 milliseconds  
* sample_duration 4 milliseconds  
* maxmimum_dispersion 25 pixels  

In [7]:
# select any subject and trial number.
subject_ID = '106'
trial_num = 2      # valid source code trials are 2 and 5

# apply fixation filter to specific trial
# EMIP[subject_ID].trial[trial_num].filter_fixations(minimum_duration=50, sample_duration=4, maxmimum_dispersion=25)

# you can use the method get_fixation_number() to count the fixations after filtering in a trial
print("number of fixations: ", EMIP[subject_ID].trial[trial_num].get_fixation_number())

number of fixations:  357


In [8]:
# accessor for samples count
print("raw sample count:", EMIP[subject_ID].trial[trial_num].get_sample_number())

# accessor for trial image
print("trial image:", EMIP[subject_ID].trial[trial_num].get_trial_image())

raw sample count: 18964
trial image: vehicle_java2.jpg


# Raw Data and Filtered Fixation visualization:  

You can visualize any trial raw data, filtered fixations, or a combination of the two.  Filtered fixations are in green, and raw samples are in red.

Try changing the code to: 

```
draw_trial(image_path, False, True)
```
or
```
draw_trial(image_path, True, False)
```

In [9]:
image_path = data_path + '/EMIP-Toolkit- replication package/emip_dataset/stimuli/'

EMIP[subject_ID].trial[trial_num].draw_trial(image_path, draw_raw_data=True, draw_fixation=True)

FileNotFoundError: [Errno 2] No such file or directory: './datasets/EMIP//EMIP-Toolkit- replication package/emip_dataset/stimuli/vehicle_java2.jpg'

# Apply fixation correction through offset:

You can apply fixation corection using sample_offset(x_offset, y_offset) on the Trial object.

Try running this with:

```
sample_offset(-200, 100)
```
You can then visualize the trial to see the effect:
```
draw_trial(image_path, True, True)
```

In [None]:
# apply offset
EMIP[subject_ID].trial[trial_num].sample_offset(-200, 100) # x:100 and y:50 just for example

# draw trial again
EMIP[subject_ID].trial[trial_num].draw_trial(image_path, True, True)

You can get the total applied offset, useful after a few calls to sample_offset method.

Try running this with:
```
get_offset()
```

In [None]:
print("Current offset:", EMIP[subject_ID].trial[trial_num].get_offset())

After applying offset to samples you can call fixation filter again to generate fixations from samples at thier new position.

# Undo applied offset:

In [None]:
# undo all previous offset
EMIP[subject_ID].trial[trial_num].reset_offset()

# draw trial again
EMIP[subject_ID].trial[trial_num].draw_trial(image_path, draw_raw_data=True, draw_fixation=True)

# Generate AOIs for any EMIP Trial:  

You can generate token level or line level AOIs for any trial code file in the EMIP dataset.

Try running this with:
```
image_path = "emip_dataset/stimuli/"
image = "rectangle_java2.jpg"

aoi = tk.find_aoi(image, image_path, "sub-line")
```

In [None]:
image_path = data_path + '/EMIP-Toolkit- replication package/emip_dataset/stimuli/'
image = "rectangle_java2.jpg"
aoi = tk.find_aoi(image, image_path, level="sub-line")
aoi.head()

# Draw AOIs over trial image:


You can draw the obtained AOIs for the trial code file on the trial image in the EMIP dataset.

Try running this with: <br>
```
image_path = "emip_dataset/stimuli/"
image = "rectangle_java2.jpg"

tk.draw_aoi(aoi, image, image_path)
```

In [None]:
image = "rectangle_java2.jpg"

tk.draw_aoi(aoi, image, image_path)

# Add text tokens to generated AOIs:

In [None]:
aoi.head()

In [None]:
file_path = data_path + '/EMIP-Toolkit- replication package/emip_dataset/EMIP_DataCollection_Materials/emip_stimulus_programs/'

aois_with_tokens = tk.add_tokens_to_AOIs(file_path, aoi)

aois_with_tokens

# Add srcML tags to AOIs and tokens:  

The srcML format is an XML representation for source code, where the markup tags identify elements of the abstract syntax for the language. Read more about it: https://www.srcml.org/about.html

In [None]:
srcML_path = "./datasets/EMIP2021/"

aois_tokens_srcml = tk.add_srcml_to_AOIs(aois_with_tokens, srcML_path)

aois_tokens_srcml

# Hit Test between Fixations and AOIs:  

Match fixations to AOIs to calculate the fixation duration over each AOI (can be customized for line or code token).  
radius is 25 pixels by default and it represents the area around the AOI included in the AOI region.

In [None]:
aoi_fixes = tk.hit_test(EMIP[subject_ID].trial[trial_num], aois_tokens_srcml, radius=25)

aoi_fixes.head()