# **Clustering | Human Activity**

## **Load packages**

In [1]:
import pandas as pd
import os

## **Load datasets**

In [2]:
train_path = "./data/train_subset_10.csv"
val_path = "./data/val_subset_10.csv"
test_path = "./data/test_subset_10.csv"

train_df = pd.read_csv(train_path)
val_df = pd.read_csv(val_path)
test_df = pd.read_csv(test_path)

In [3]:
train_df.head()

Unnamed: 0,youtube_id,label
0,zeIkGEHK46I,riding camel
1,-Fwy8NwefTk,shot put
2,sDD0p8h88rI,dying hair
3,09AinCnKAE8,riding camel
4,qyTDZajMSqQ,baking cookies


In [4]:
val_df.head()

Unnamed: 0,youtube_id,label
0,9QMlwdR8Olg,shot put
1,fkSOwCyCmOo,shot put
2,IoDjDQTv-q0,balloon blowing
3,eI69j2uheYo,balloon blowing
4,Xchd-YBUVY4,spraying


In [5]:
test_df.head()

Unnamed: 0,youtube_id
0,oaVWnxlQOeo
1,7zMBk9Zu9fY
2,DwPEZeX5WkA
3,bO1MW9Lq9Sg
4,TdBpD9Ccg9w


## **Useful functions**

In [6]:

def rename_files(directories: list[str]):

    """

    Description:
        This function renames files, just keeping the id of the video

    Args:
        directories (list[str]): array of str, where each str is the local path to a directory, these directories contain videos .mp4
    
    """

    print("Starting renaming process...\n")

    for directory in directories:

        print(f"Renaming process for {directory} started.")
        
        for filename in os.listdir(directory): # archives inside of the curret directory
            if filename.endswith(".mp4") and '_' in filename: # if the file is a video with an _

                id = filename.split('_')[0] # extract just the id
                new_filename = f"{id}.mp4" # concat just the id w '.mp4'

                old_file = os.path.join(directory, filename) # path to the old file
                new_file = os.path.join(directory, new_filename) # path to the new file
                
                os.rename(old_file, new_file) # rename videos w just its id
        
        print(f"Renaming process for {directory} completed.")

In [20]:
def write_video_paths_txt(df:pd.DataFrame, directory_path:str, name:str):

    """
    Description:
        This function writes the paths of the videos in a .txt file, this will be used for feature extraction in 'file_with_video_paths'

    Args: 
        df (pd.DataFrame): a DataFrame of Pandas, could be train_df, val_df or test_df
        directory_path (str): the path to the directory where the videos are located
        name (str): the name of the .txt file

    """

    if not isinstance(df, pd.DataFrame):
        raise TypeError("df must be a DataFrame of Pandas")

    paths = []

    for id in df['youtube_id']:
        paths.append(os.path.join(directory_path, f"{id}.mp4"))

    print(f"Finished writing video paths of {name} dataset" )

    output_file = f'./txt/path_{name}.txt'
    with open(output_file, 'w') as f:
        for path in paths:
            f.write(f"{path}\n")

    print(f"Finished writing video paths of {name} dataset")

You can change the location according to your paths

In [8]:
directories = ['./videos/train_subset', 
               './videos/test_subset', 
               './videos/val_subset'] # pls download the .zip w the videos and place them in the 'videos' directory

val_directory = 'C:/Users/nayel/Desktop/utec/2025-0/machine-learning/project-2/videos/val_subset/' # full path of the directory where the val_videos are located
train_directory = 'C:/Users/nayel/Desktop/utec/2025-0/machine-learning/project-2/videos/train_subset/'
test_directory = 'C:/Users/nayel/Desktop/utec/2025-0/machine-learning/project-2/videos/test_subset/'

In [9]:
rename_files(directories)

Starting renaming process...

Renaming process for ./videos/train_subset started.
Renaming process for ./videos/train_subset completed.
Renaming process for ./videos/test_subset started.
Renaming process for ./videos/test_subset completed.
Renaming process for ./videos/val_subset started.
Renaming process for ./videos/val_subset completed.


In [19]:
write_video_paths_txt(val_df, val_directory, 'val')
write_video_paths_txt(train_df, train_directory, 'train')
write_video_paths_txt(test_df, test_directory, 'test')

Finished writing video paths of val dataset
Finished writing video paths of val dataset


Ahora que tenemos todos los path de los videos almacenados en un `.txt` (separado por train, val y test) podemos usar video_features para realizar la extracción de caracteristicas correspondiente.

Para esto primero clonamos el repositorio de video_features e instalamos las dependencias necesarias dentro de este directorio/repositorio:

```bash
git clone https://github.com/v-iashin/video_features.git
cd video_features

Para realizar la instalación de dependencias, necesitas tener anaconda/miniconda instalado

```bash
conda create -n video_features
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -c conda-forge omegaconf scipy tqdm pytest opencv
conda install -c conda-forge av

Con esto hecho, ya se puede realizar la extracción de caracteristicas en la terminal con el siguiente comando:

```bash

python main.py 
        \ feature_type=r21d 
        \ device="cuda:0" 
        \ file_with_video_paths="../txt/path_${name}.txt" 
        \ on_extraction=save_numpy 
        \ output_path="../extraction/${name}"

# 'name' could be: [train, test, val]

Ejecutando lo anterior, se crean archivos `.npy` dentro del directorio 'videos'. Cada archivo le corresponde a la extracción de características de un video.