# **Clustering | Human Activity**

## **Load packages**

In [4]:
import pandas as pd
import os

## **Load datasets**

## **Useful functions**

In [19]:

def rename_files(directories: list[str], path: str):

    """

    Description:
        This function renames files, just keeping the id of the video

    Args:
        directories (list[str]): array of str, where each str is the local path to a directory, these directories contain videos .mp4
    
    """

    print("Starting renaming process...\n")

    path_actor_1 = []

    for directory in directories:

        print(f"Renaming process for {directory} started.")
        
        for filename in os.listdir(directory): # archives inside of the curret directory
            if filename.endswith(".wav"): # if the file is a video with an _

                name = filename # extract just the id
                path_actor_1.append(os.path.join(path, f"{name}")) # append the id to the array

    print(path_actor_1)

    output_file = f'./txt/path_actor_1.txt'
    with open(output_file, 'w') as f:
        for path in path_actor_1:
            f.write(f"{path}\n")
        

In [24]:
def write_video_paths_txt(df:pd.DataFrame, directory_path:str, name:str):

    """
    Description:
        This function writes the paths of the videos in a .txt file, this will be used for feature extraction in 'file_with_video_paths'

    Args: 
        df (pd.DataFrame): a DataFrame of Pandas, could be train_df, val_df or test_df
        directory_path (str): the path to the directory where the videos are located
        name (str): the name of the .txt file

    """

    if not isinstance(df, pd.DataFrame):
        raise TypeError("df must be a DataFrame of Pandas")

    paths = []

    for id in df['youtube_id']:
        paths.append(os.path.join(directory_path, f"{id}.mp4"))

    print(f"Finished writing video paths of {name} dataset" )

    output_file = f'./txt/path_{name}.txt'
    with open(output_file, 'w') as f:
        for path in paths:
            f.write(f"{path}\n")

    print(f"Finished writing video paths of {name} dataset")

You can change the location according to your paths

In [26]:
directories = ['./archive/Actor_01'] # pls download the .zip w the videos and place them in the 'videos' directory

actor_01_path = 'C:/Users/nayel/Desktop/utec/2025-0/machine-learning/project-3/archive/Actor_01/'

In [27]:
os.makedirs("txt", exist_ok=True)

rename_files(directories, actor_01_path)

Starting renaming process...

Renaming process for ./archive/Actor_01 started.
['C:/Users/nayel/Desktop/utec/2025-0/machine-learning/project-3/archive/Actor_01/03-01-01-01-01-01-01.wav', 'C:/Users/nayel/Desktop/utec/2025-0/machine-learning/project-3/archive/Actor_01/03-01-01-01-01-02-01.wav', 'C:/Users/nayel/Desktop/utec/2025-0/machine-learning/project-3/archive/Actor_01/03-01-01-01-02-01-01.wav', 'C:/Users/nayel/Desktop/utec/2025-0/machine-learning/project-3/archive/Actor_01/03-01-01-01-02-02-01.wav', 'C:/Users/nayel/Desktop/utec/2025-0/machine-learning/project-3/archive/Actor_01/03-01-02-01-01-01-01.wav', 'C:/Users/nayel/Desktop/utec/2025-0/machine-learning/project-3/archive/Actor_01/03-01-02-01-01-02-01.wav', 'C:/Users/nayel/Desktop/utec/2025-0/machine-learning/project-3/archive/Actor_01/03-01-02-01-02-01-01.wav', 'C:/Users/nayel/Desktop/utec/2025-0/machine-learning/project-3/archive/Actor_01/03-01-02-01-02-02-01.wav', 'C:/Users/nayel/Desktop/utec/2025-0/machine-learning/project-3/a

In [19]:
write_video_paths_txt(val_df, val_directory, 'val')
write_video_paths_txt(train_df, train_directory, 'train')
write_video_paths_txt(test_df, test_directory, 'test')

Finished writing video paths of val dataset
Finished writing video paths of val dataset


Ahora que tenemos todos los path de los videos almacenados en un `.txt` (separado por train, val y test) podemos usar video_features para realizar la extracción de caracteristicas correspondiente.

Para esto primero clonamos el repositorio de video_features e instalamos las dependencias necesarias dentro de este directorio/repositorio:

```bash
git clone https://github.com/v-iashin/video_features.git
cd video_features

Para realizar la instalación de dependencias, necesitas tener anaconda/miniconda instalado

```bash
conda create -n video_features
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -c conda-forge omegaconf scipy tqdm pytest opencv
conda install -c conda-forge av

Con esto hecho, ya se puede realizar la extracción de caracteristicas en la terminal con el siguiente comando:

```bash

python main.py 
        \ feature_type=r21d 
        \ device="cuda:0" 
        \ file_with_video_paths="../txt/path_${name}.txt" 
        \ on_extraction=save_numpy 
        \ output_path="../extraction/${name}"

# 'name' could be: [train, test, val]

Ejecutando lo anterior, se crean archivos `.npy` dentro del directorio 'videos'. Cada archivo le corresponde a la extracción de características de un video.