<center><h1> Using Automated Movement Detection to Assist with Gesture Annotation </h1>


<h4> James Trujillo ( j.p.trujillo@uva.nl )<br>
    Updated: 19-06-2025 </h4>
    
<img src="./images/envisionbox_logo.png" alt="isolated" width="300"/> 
<br>
<h3> Info </h3>
This module is meant to be a brief tutorial on how you can use the automatic movement detection tool SPUDNIG (Ripperda, Drijvers & Holler, 2020) as part of your gesture coding pipeline. This method can be handy because SPUDNIG can capture all of the hand movements performed by a person, allowing the coder to simply go through these annotations, determine which are gestures and which are not, and fix the timing boundaries of the annotations, rather than going through every second of the video. The authors of the original SPUDNIG paper estimate that this can cut the time you spend gesture coding nearly in half. <br>
So, in this tutorial, we'll see how you can easily set everything up, run the program, and then use the output to code some gestures.
<br><br>

<b>Packages:</b>
    
* opencv-python

* pympi-ling

* SciPy

* numpy

* pandas

* scipy

* mediapipe
<br><br>
 
    
<b>Module citation: </b>
Trujillo, J.P. (2025). <i> Using Automated Movement Detection to Assist with Gesture Annotation </i> \[day you visited the site]. Retrieved from: https://envisionbox.org/Assisted_Gesture_Annotation.html 
<br>
<b>SPUDNIG reference</b>

<br>Ripperda, J., Drijvers, L. & Holler, J. (2020). Speeding up the detection of non-iconic and iconic gestures (SPUDNIG): A toolkit for the automatic detection of hand movements and gestures in video data. <i>Behavior Research Methods</i>: 52, 1783–1794. https://doi.org/10.3758/s13428-020-01350-2 <br>
<b>Location Repository:</b>

https://github.com/jptrujillo/SPUDNIG_assisted_gesture_annotation_module
<br><br>

 



In [10]:
import cv2
import utils
import movements2
import run_MP_module
import os
import sort_output
import argparse
import pympi

In [2]:
def init(filename):
    keypoints_left = utils.keypoint_check(keypoints_left)
    keypoints_right = utils.keypoint_check(keypoints_right)
    keypoints_body = utils.keypoint_check(keypoints_body)
    
    fps = utils.get_fps(filename)
    return  fps, keypoints_left, keypoints_right, keypoints_body




## Running SPUDNIG
In the next step, we will actually run SPUDNIG on our videos. Most of the functions have been tucked away into imported modules to make this tutorial more concise, so there's not much to see here. However, one thing to note is the two-step process. First, we check if there is already some MediaPipe data in the <i>motion_data</i> folder. If not, we go ahead and do the motion tracking. After that, we run the SPUDNIG algorithm which does the actual movement estimation. What is important to note here is that if you already have motion tracking data, you won't have to run it again. This code block will just use what's already there (assuming it's in the motion_data folder). This also makes it a lot easier to go back and adjust parameters for movement detection, without re-running any motion tracking.

### Set some parameters
Here we want to set up some parameters for how the automated detection will work, such as movement thresholds. For the first run, we'll just leave them as is. You can always adjust them and re-run the notebook if you want to fine-tune your movement detection. These values are chosen to minimize the number of false negatives and false positives, but can all be adjusted. The default values (used here) are also meant to skew somewhat towards reducing the number of false negatives as possible, at the cost of additional false positives. In other words, it ensures that you get every gesture in the video, although you will need to clean out the 'noise'.<br>
- <b>Threshold:</b> If the reliability of a key point in a frame is below the reliability threshold, the script stops checking for potential movement and indicates that nomovement is detected in the respective frame, before continuing to the next frame. If the reliability is above the threshold, the script continues and determines whether the key point in question is part of a rest state or part of a movement. <i>range: 0 - 1</i>
- <b>minimum cutoff:</b> minimum number of frames to be considered a movement, lower value = more precise tracking, i.e. even small movements are detected, higher value = more lenient tracking, i.e. only large movements that e xtend over several frames are detected. <i>range: 0 - 10</i>
- <b>gap cutoff:</b> minimum number of frames between two "movements" before they are merged together (i.e., what constitutes a gap). lower value = more individual submovements, higher value = fewer movements, more merging. <i>range: 0 - 10</i>

In [3]:
# minimum number of frames to be considered a "movement" (values: 0-10)
min_cutoff = 4
# minimum number of frames between two "movements" before they are merged together (values: 0-10)
gap_cutoff = 4
# threshold for reliability: lower values correspond to lower reliability/precision of tracking. 
# Frames with lower reliability than this are discarded (values: 0-1)
threshold = 0.3
# these numbers just correspond to keypoints in MediaPipe
keypoints_left = range(4,83)
keypoints_right = range(4,83)
keypoints_body = range(9,23)

In [4]:
video_dir = "./videos_to_process/"
motion_output_folder = "./motion_data/"


This next code block will:
- loop through each video in your <i>videos_to_process</i> folder
- check if there is no tracking data already, and if not, run the motion tracking script
- get the fps of the video, which is used when determining what is considered a "movement
- run SPUDNIG, resulting in a dataframe containing the detected movements
- save an Elan file with these annotations, into the <i>annotations</i> folder

In [5]:
for video_name in os.listdir(video_dir):

    video_name_short =  video_name.split(".")[0]
    data_output_folder = motion_output_folder + "/" + video_name_short + "/" 
    
    if not os.path.isfile(data_output_folder + video_name):
            run_MP_module.process_video(video_name, video_dir, data_output_folder)
        
    #### Now run SPUDNIG ####
    # first, restructure the data for SPUDNIG
    keypoints_left, keypoints_right, keypoints_body = sort_output.sort_MP(
       data_output_folder, keypoints_left, keypoints_right, keypoints_body, video_name_short)
    # then, get the video's fps
    fps = utils.get_fps(video_dir + video_name)

    # now process
    data = movements2.main(data_output_folder, threshold, keypoints_left, keypoints_right, keypoints_body, fps,
                           min_cutoff, gap_cutoff)
    
    print(data)

    Annotation = [(timestamp_to_ms(row[1]["Begin"]), timestamp_to_ms(row[1]["End"]),row[1]["Annotation"]) for row in data.iterrows()]
    tiers = {'Movements':Annotation}
    new_eaf = utils.to_eaf(tiers)
                    
    new_eaf.to_file("./annotations/" + video_name_short + ".eaf")
    

        Tier      Begin        End Annotation
0  Movements  0:0:0.033  0:0:1.401   movement
1  Movements  0:0:1.634  0:0:3.136   movement
2  Movements  0:0:3.603  0:0:5.872   movement
         Tier       Begin         End Annotation
0   Movements   0:0:0.200   0:0:5.100   movement
1   Movements   0:0:5.466   0:0:8.533   movement
2   Movements   0:0:8.833   0:0:9.100   movement
3   Movements   0:0:9.466  0:0:10.800   movement
4   Movements  0:0:11.200  0:0:17.100   movement
5   Movements  0:0:17.433  0:0:21.300   movement
6   Movements  0:0:22.066  0:0:22.833   movement
7   Movements  0:0:23.100  0:0:28.033   movement
8   Movements  0:0:28.266  0:0:28.833   movement
9   Movements  0:0:29.066  0:0:29.633   movement
10  Movements  0:0:30.766  0:0:32.100   movement
11  Movements  0:0:32.300  0:0:37.466   movement
12  Movements  0:0:39.266  0:0:39.800   movement
13  Movements  0:0:40.166  0:0:50.300   movement
14  Movements  0:0:51.000  0:0:51.466   movement
15  Movements  0:0:52.766  0:0:5

## Checking the motion tracking
This first step is basically just a sanity check to make sure that the tracking data makes sense. All this entails, is that you open the video file in the motion_data folder and check if the keypoints make sense. Is it (relatively consistently) putting keypointso on the shoulders, hands, etc? It won't be perfect, but you want to make sure there's not anything strange going on in the video. <br>
Ideally, you have something like this:<br>
<img src="./images/good_tracking.png" width=500 />

Sometimes there will be tracking errors. Take a look at the example below: <br>
<img src="./images/poor_tracking.png" width=500 />
<br>
Here we see some points where tracking did not work well. However, most of the video looks okay. This is really a judgment call on whether you trust the quality of the tracking, and there's not really a standard on this. 

## Cleaning the annotations
Finally, in this step we're going to get to some usable annotations. Go ahead and open the Elan file, that will be the <i>annotations</i> folder. You can then go to Edit >> Linked Files to add the video back in: <br>
<img src="./images/linked_files1.png" width=500 /> <br>
<br>

Now, as noted before, we can do our "assisted annotation". We'll create a new tier called "gesture". <br>
<img src="./images/add_tier.png" width=500 /> <br>
<br>

Now, we can move through each of the annotations in the "movements" tier and see if it corresponds to a true gesture or not. If so, we can select the annotation and use this to create a new annotation in the gesture tier. When the "movement" annotation isn't a gesture (or perhaps just not the kind that we are interested in), we just ignore it and move on. <br>
<img src="./images/add_gesture_annot.png" width=500 /> <br>
<br>

