# Introduction

This notebook provides an overview of the data collection process and the approach to preprocessing the kinematic data for the downloaded dance video dataset.

## Data Collection using Youtube Data API and Pytube

The video data used for exploratory data analysis was downloaded using the [Youtube Data API](https://developers.google.com/youtube/v3/docs/search/list) and [Pytube](https://pytube.io/en/latest/), which ensured that only authorized videos were collected for analysis. To increase the likelihood of finding relevant and clean videos that focused on individual dancers rather than groups, the code function used a keyword search that included the genre name and terms such as "solo choreography", "solo practice", or "dance cover". The expected video format is `mp4`, `width:360`, `height:640`, `max_length:120`, `min_views:100`.

For more details about the data collection process, please refer to the code in [/src/data/collection.py](https://github.com/kayesokua/gestures/blob/main/src/data/collection.py)

In [None]:
from src.data.collection import extract_video_from_youtube

extract_video_from_youtube(query='contemporary', max_count=5)
extract_video_from_youtube(query='ballet', max_count=5)
extract_video_from_youtube(query='folk', max_count=5)

## Pose Estimation using MediaPipe

After downloading the video data, the kinematic data will be extracted using [MediaPipe Pose Solution](https://github.com/google/mediapipe/blob/master/docs/solutions/pose.md). The chosen output format is `csv` with relative values by default. In this file, we obtain `x`,`y`,`z` coordinates and obtain the `fps` using [OpenCV](https://docs.opencv.org/4.x/). We use `NaN` to frames where a pose cannot be detected. 

The code snippet below gathers all videos in `mp4` format and extracts landmarks and screenshots. For more details about the data annotation process, please refer to the code in [/src/data/annotation.py](https://github.com/kayesokua/gestures/blob/main/src/data/annotation.py)

In [1]:
import os
from src.data.annotation import extract_landmarks_from_videos

video_path = 'data/external/test'

if os.path.exists(video_path):
    extract_landmarks_from_videos(video_path)
else:
    print("Path does not exists.")

objc[7831]: Class CaptureDelegate is implemented in both /Users/caijinsi/.pyenv/versions/3.10.6/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x136a7a4d0) and /Users/caijinsi/.pyenv/versions/3.10.6/lib/python3.10/site-packages/mediapipe/.dylibs/libopencv_videoio.3.4.16.dylib (0x137bd4860). One of the two will be used. Which one is undefined.
objc[7831]: Class CVWindow is implemented in both /Users/caijinsi/.pyenv/versions/3.10.6/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x136a7a520) and /Users/caijinsi/.pyenv/versions/3.10.6/lib/python3.10/site-packages/mediapipe/.dylibs/libopencv_highgui.3.4.16.dylib (0x1315f0a68). One of the two will be used. Which one is undefined.
objc[7831]: Class CVView is implemented in both /Users/caijinsi/.pyenv/versions/3.10.6/lib/python3.10/site-packages/cv2/cv2.abi3.so (0x136a7a548) and /Users/caijinsi/.pyenv/versions/3.10.6/lib/python3.10/site-packages/mediapipe/.dylibs/libopencv_highgui.3.4.16.dylib (0x1315f0a90). One of the two will be used. Which on

Extracted 3636 frames from classical_ballet_001.mp4 with 492 missing poses
Extracted 3636 frames in total
Elapsed time: 95.257 seconds


## Extracting Audio Features using Librosa

Although the focus of the research is not on the musicality of the performance, extracting audio information using [Librosa](https://librosa.org/doc/latest/index.html) can add meaningful context to the gestures.


* **Tempo** and **Beat Times** are important audio features that can be used to identify the rhythmic structure and timing of the music in a dance video. 
* [**Root Mean Square**](https://librosa.org/doc/main/generated/librosa.feature.rms.html) can indicate the overall level of activity in the music and provide information on the rhythmic complexity, intensity, and expressive qualities of the dance movements.
* [**Zero Crossing Rate**](https://librosa.org/doc/main/generated/librosa.feature.zero_crossing_rate.html) can provide information on the rhythmic regularity and complexity of the music, as well as its relationship to the dance movements.

For more details about the audio extraction process, please refer to the code in [/src/features/audio.py](https://github.com/kayesokua/gestures/blob/main/src/features/audio.py)

In [None]:
from src.features.audio import extract_tempo_and_beats, extract_rms_energy, extract_zero_crossing_rate

video_url_path = 'data/external/contemporary_005.mp4'
tempo, beat_times = extract_tempo_and_beats(video_url_path)
rms = extract_rms_energy(video_url_path)
zcr = extract_zero_crossing_rate(video_url_path)

## Handling Missing Poses and Outlier Detection

Since we are handling dance videos with different cinematography style, using linear interpolation or median might not be appropriate for handling missing kinematic data. Therefore, the proposed solution is to detect outliers instead by generating binary label using [Isolation Forest algorithm](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html).  `1` indicates normal data and `-1` indicates the outlier.

For more details about the data processing, please refer to the code in [/src/data/processing.py](https://github.com/kayesokua/gestures/blob/main/src/data/processing.py)

In [None]:
from src.data.processing import process_landmarks_using_isolation_forest
process_landmarks_using_isolation_forest("data/interim")

## Data Cleaning

Currently, only minimal data cleaning is required, which involves verifying whether all videos are of the same size.

In [None]:
from src.data.processing import check_video_size

if check_video_size('data/external'):
    print("All videos have the same size.")
else:
    print("Videos have different sizes.")

## Summary

This notebook provided an overview of the data we have for exploration

1. Videos (MP4) with category as filename: `data/external/{category_i}.mp4`
2. Kinematic data with outliers information(CSV): `data/processed/{category_i}.csv`
3. Frame screenshots (PNG) saved in chronological order: `data/interim/{video_filename}/*.png`
4. Function methods can extract various audio features: `tempo`,`beat_times`, `rms`,`zcr`


## Resources

For a comprehensive list of resources, [please see here](https://github.com/kayesokua/gestures/blob/main/references/README.md).