<div style="text-align: center;">
    <h1>Video Embeddings with FiftyOne</h1>
    <h2>Elderly Action Recognition Challenge</h2>
    <h2>Video Data Analysis with FiftyOne</h2>
</div>

This notebook walks you through the process of generating and visualizing video embeddings for the [Elderly Action Recognition Challenge](https://voxel51.com/computer-vision-events/elderly-action-recognition-challenge-wacv-2025) using the [FiftyOne](https://docs.voxel51.com/) platform. It covers key steps such as downloading the dataset, applying the [Hiera Video Embeddings plugin](https://github.com/harpreetsahota204/hiera-video-embeddings-plugin), generating embeddings, and visualizing the results. By the end, you'll be equipped to analyze video data and extract insights using embeddings.

---

**Useful Links:**
- [Challenge Overview](https://voxel51.com/computer-vision-events/elderly-action-recognition-challenge-wacv-2025/)
- [FiftyOne Documentation](https://docs.voxel51.com/)
- [Hiera Video Embeddings Plugin](https://github.com/harpreetsahota204/hiera-video-embeddings-plugin)

---

<div style="text-align: center;">
    <img src="https://github.com/user-attachments/assets/a97ed6ff-8aa8-4911-98b0-6f8dce36ab83" alt="challenge-logo" width="200" style="margin-right: 20px;">
    <img src="https://github.com/user-attachments/assets/6b1d05e4-3da3-4591-b70f-764e5ad0e5da" alt="fiftyone-logo" width="200">
</div>

---

**Goal**: Equip participants with the tools to generate meaningful video embeddings, visualize them, and submit their results to advance the field of action recognition for the elderly.

## Requirements anf FiftyOne Installation

First thing you need to do is create a Python environment in your system, if you are not familiar with that please take a look of this [ReadmeFile](https://github.com/voxel51/fiftyone-examples?tab=readme-ov-file#-prerequisites-for-beginners-), where we will explain how to create the environment. After that be sure you activate the created environment and install FiftyOne there and the necessary packages, as ```umap-learn``` for dimensionality reduction (UMAP visualization). For more details on installing FiftyOne, check out FiftyOne Installation Guide.

Don't forget to restart the kernel after you run the next line. It is needed just one time.

In [None]:
!pip install fiftyone umap-learn huggingface-hub ipywidgets

## Imports
In this section, we import all the necessary libraries and modules to work with the dataset. In the rest of the notebook we could see other commented imports for educational purposes.

In [4]:
import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub
import fiftyone.operators as foo
import fiftyone.brain as fob

import os

Here, we load the dataset from Hugging Face Hub using ```load_from_hub()```. The dataset [```"Voxel51/GMNCSA24-FO"```](https://huggingface.co/datasets/Voxel51/GMNCSA24-FO) is specified, and you can adjust the number of samples with arguments like max_samples. Learn more about loading datasets in FiftyOne from the [docs](https://docs.voxel51.com/user_guide/dataset_creation/index.html).

In [7]:
from fiftyone.utils.huggingface import load_from_hub

dataset = load_from_hub("Voxel51/GMNCSA24-FO", overwrite=True)

Downloading config file fiftyone.yml from Voxel51/GMNCSA24-FO
Loading dataset
Importing samples...
 100% |█████████████████| 335/335 [7.2ms elapsed, 0s remaining, 46.6K samples/s]       
Importing frames...
 100% |█████████████████████| 0/0 [742.0us elapsed, ? remaining, ? samples/s] 
Downloading 335 media files...


100%|██████████| 4/4 [00:16<00:00,  4.23s/it]


## Install the plugin
In this cell, we download and install the [Hiera Video Embeddings Plugin](https://github.com/harpreetsahota204/hiera-video-embeddings-plugin) and its dependencies using the FiftyOne plugin manager. The plugin enables the computation of video embeddings. More about plugins in FiftyOne can be found here.

> **Note:** Remember that a plugin in FiftyOne is an extension that adds new functionality or features to the platform, such as custom operators, embeddings, or integrations with other tools, allowing users to enhance their dataset management and analysis workflows.

In [17]:
!fiftyone plugins download https://github.com/harpreetsahota204/hiera-video-embeddings-plugin
!fiftyone plugins requirements @harpreetsahota/hiera_video_embeddings --install

Downloading harpreetsahota204/hiera-video-embeddings-plugin...
  103.9Kb [6.5ms elapsed, ? remaining, 15.7Mb/s]   
Copying plugin '@harpreetsahota/hiera_video_embeddings' to '/Users/paularamos/fiftyone/__plugins__/@harpreetsahota/hiera_video_embeddings'
Collecting hiera-transformer
  Downloading hiera_transformer-0.1.4-py3-none-any.whl.metadata (16 kB)
Collecting torch>=1.8.1 (from hiera-transformer)
  Downloading torch-2.6.0-cp313-none-macosx_11_0_arm64.whl.metadata (28 kB)
Collecting timm>=0.4.12 (from hiera-transformer)
  Using cached timm-1.0.14-py3-none-any.whl.metadata (50 kB)
Collecting torchvision (from timm>=0.4.12->hiera-transformer)
  Downloading torchvision-0.21.0-cp313-cp313-macosx_11_0_arm64.whl.metadata (6.1 kB)
Collecting safetensors (from timm>=0.4.12->hiera-transformer)
  Using cached safetensors-0.5.2-cp38-abi3-macosx_11_0_arm64.whl.metadata (3.8 kB)
Collecting sympy==1.13.1 (from torch>=1.8.1->hiera-transformer)
  Using cached sympy-1.13.1-py3-none-any.whl.metadata 

## Requirements for running a plugin
We configure the environment to allow legacy orchestrators, which might be needed for compatibility with specific tasks or plugins in FiftyOne. For more on orchestrators, refer to [FiftyOne Orchestrator](https://docs.voxel51.com/plugins/using_plugins.html#setting-up-an-orchestrator).

In [18]:
#running in terminal --> export FIFTYONE_ALLOW_LEGACY_ORCHESTRATORS=true or here with the next line
#import os

os.environ['FIFTYONE_ALLOW_LEGACY_ORCHESTRATORS'] = 'true'

Before running the plugin, you must launch the delegated service in the terminal. This service handles distributed or offloaded tasks, like embedding computation. For more on the delegated service, check out the [FiftyOne Delegated Service documentation](https://docs.voxel51.com/plugins/using_plugins.html#managing-delegated-operations).

- Open a new terminal within the same Python Env and execute this line


```fiftyone delegated launch```


We should see in the terminal somthing like this:


![Image](https://github.com/user-attachments/assets/61058374-1521-4e10-8781-0a72daa76538)

Disable and delete previous plugins if needed:

```
fiftyone plugins disable --all
fiftyone plugins delete --all
```

## Initialize the Embedding Operator
 We initialize the embedding operator from the Hiera plugin to compute video embeddings. The operator provides the functionality needed to generate embeddings for the dataset. Learn more about operators in FiftyOne [here](https://docs.voxel51.com/api/fiftyone.operators.html).

In [19]:
import fiftyone.operators as foo

embedding_operator = foo.get_operator("@harpreetsahota/hiera_video_embeddings/compute_hiera_video_embeddings")

## Compute First Set of Video Embeddings
In this cell, the embedding_operator is applied to compute video embeddings using the Hiera model (```hiera_base_16x224```). The embeddings are saved to the ```emb_test_1``` field, and they are normalized (```normalize=True```). This will generate terminal embeddings for video frames.


> **Note: With plugins, you can add new functionality to the FiftyOne App, create integrations with other tools and APIs, render custom panels, and add custom actions to menus.With With FiftyOne, you can even write plugins that allow users to execute long-running tasks from within the App that run on a connected compute cluster.
Get started with plugins by installing some popular plugins, then try your hand at writing your own!
For more on plugins types, refer to the [plugin documentation](https://docs.voxel51.com/plugins/index.html).

In [20]:
await embedding_operator(
    dataset,
    model_name="hiera_base_16x224",
    checkpoint="mae_k400_ft_k400",
    embedding_types="terminal",
    emb_field="emb_test_1",
    normalize=True,
    delegate=True
    )

<fiftyone.operators.executor.ExecutionResult at 0x1426fb0e0>

Wait for the operation to complete. 

## Alternative: To use FiftyOne APP

For this step you can also use the APP, open the Operator Icon and look for the Hiera Operator and there you can select your preferences. 

```session = fo.launch_app(dataset)```

Operator Icon:
![Image](https://github.com/user-attachments/assets/1de84a27-526c-4f13-b50d-e542502b1bfa)


Select the Hiera Video Embeddings Operator:
![Image](https://github.com/user-attachments/assets/fdddc48a-9000-4da3-96f3-92b2646cd33e)

Fill the form:
![Image](https://github.com/user-attachments/assets/149f2ec0-3c7e-4ee0-8e2f-c18846a105a4)

After computing the embeddings, you need to reload the dataset to access the new embedding fields (```emb_test_1```).

In [23]:
dataset.reload()
print(dataset)

Name:        Voxel51/GMNCSA24-FO
Media type:  video
Num samples: 335
Persistent:  False
Tags:        []
Sample fields:
    id:               fiftyone.core.fields.ObjectIdField
    filepath:         fiftyone.core.fields.StringField
    tags:             fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:         fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.VideoMetadata)
    created_at:       fiftyone.core.fields.DateTimeField
    last_modified_at: fiftyone.core.fields.DateTimeField
    sample_id:        fiftyone.core.fields.ObjectIdField
    support:          fiftyone.core.fields.FrameSupportField
    events:           fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
    emb_test_1:       fiftyone.core.fields.VectorField
Frame fields:
    id:               fiftyone.core.fields.ObjectIdField
    frame_number:     fiftyone.core.fields.FrameNumberField
    created_at:       fiftyone.core.fields.DateTimeField
  

Check your dataset, looking for the embeddings field calles ```emb_test_1```

![Image](https://github.com/user-attachments/assets/f3e020a4-a927-465f-8f90-c76b3626039e)

## Visualize the Embeddings Using UMAP
Here, we use UMAP (Uniform Manifold Approximation and Projection) to reduce the dimensionality of the embeddings to 2D for visualization. This is useful for understanding the structure and relationships between the embeddings. For more on UMAP in FiftyOne, check the [Brain documentation](https://docs.voxel51.com/brain.html#visualizing-embeddings).

In [24]:
#import fiftyone.brain as fob

results = fob.compute_visualization(
    dataset,
    embeddings="emb_test_1", # or whichever embedding field
    method="umap",
    brain_key="emb_viz_1",
    num_dims=2,
    verbose=True,
)

Generating visualization...




UMAP( verbose=True)
Sat Feb  1 08:50:59 2025 Construct fuzzy simplicial set
Sat Feb  1 08:50:59 2025 Finding Nearest Neighbors
Sat Feb  1 08:50:59 2025 Finished Nearest Neighbor Search
Sat Feb  1 08:50:59 2025 Construct embedding


Epochs completed: 100%| ██████████ 500/500 [00:00]

	completed  0  /  500 epochs
	completed  50  /  500 epochs
	completed  100  /  500 epochs
	completed  150  /  500 epochs
	completed  200  /  500 epochs
	completed  250  /  500 epochs
	completed  300  /  500 epochs
	completed  350  /  500 epochs
	completed  400  /  500 epochs
	completed  450  /  500 epochs
Sat Feb  1 08:50:59 2025 Finished embedding





Wait until the operation is ready, check the terminal where you execute ```fiftyone delegated launch```. This process time will depend on the number of samples you have in your dataset or view.

![Image](https://github.com/user-attachments/assets/e46323a7-7de0-482c-93d8-169869780237)



![Image](https://github.com/user-attachments/assets/9148c0f7-4b84-4146-9541-9da328fec727)

In [None]:
## Launch FiftyOne APP
session = fo.launch_app(dataset)
session.freeze()

##  Verify the Brain Run Was Successful
We check if the brain run was successful (dataset.has_brain_runs) and view the list of brain runs. You can load the results for the embedding visualization by calling dataset.load_brain_results().

In [25]:
dataset.has_brain_runs
dataset.list_brain_runs()
dataset.load_brain_results("emb_viz_1")
dataset.first()['emb_test_1']

array([-1.62456706e-01,  1.42509893e-01,  2.82073580e-02, -3.32412511e-01,
       -3.05628106e-02,  4.05462921e-01,  2.42151812e-01,  8.07870738e-03,
        6.63925827e-01,  6.74422607e-02,  2.61222124e-01, -6.64945841e-01,
        5.38978398e-01,  6.64280057e-01,  2.17230186e-01, -7.81381577e-02,
        1.09162733e-01,  1.40530944e-01,  2.84134865e-01,  3.69161189e-01,
       -6.60031676e-01, -1.73182964e-01, -9.98321295e-01, -4.56319489e-02,
        9.21102166e-02,  4.77855355e-01,  9.49432194e-01,  1.41382849e+00,
       -1.15292102e-01,  1.49507588e-02,  2.68596321e-01, -5.70652373e-02,
       -1.85752302e-01, -3.02214831e-01,  4.66969609e-01, -2.34535076e-02,
        4.67020273e-01,  3.40447687e-02, -3.02124619e-01,  1.01528168e-01,
       -1.86585352e-01,  8.77562314e-02, -1.87962905e-01, -8.47835988e-02,
        2.68085629e-01, -3.18618506e-01, -2.06036083e-02,  8.17718059e-02,
        5.98581955e-02, -4.06609833e-01, -1.02172479e-01, -1.44649327e-01,
       -2.16971375e-02, -

## Compute Second Set of Video Embeddings

The goal of this cell is to illustrate you can run different video embeddings in the same project, in this case we will use a different model in the same plugin.

This step computes a second set of embeddings using a slightly different model (```hiera_base_plus_16x224```). The embeddings are saved to the ```emb_test_2``` field with normalization. This demonstrates how different model architectures can generate distinct embeddings for the same dataset.

In [28]:
await embedding_operator(
    dataset,
    model_name="hiera_base_plus_16x224",
    checkpoint="mae_k400_ft_k400",
    embedding_types="terminal",
    emb_field="emb_test_2",
    normalize=True,
    delegate=True
    )

<fiftyone.operators.executor.ExecutionResult at 0x347e2a0d0>

Check if ```emb_test_2``` was created.

In [30]:
print(dataset)
dataset.reload()

Name:        Voxel51/GMNCSA24-FO
Media type:  video
Num samples: 335
Persistent:  False
Tags:        []
Sample fields:
    id:               fiftyone.core.fields.ObjectIdField
    filepath:         fiftyone.core.fields.StringField
    tags:             fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:         fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.VideoMetadata)
    created_at:       fiftyone.core.fields.DateTimeField
    last_modified_at: fiftyone.core.fields.DateTimeField
    sample_id:        fiftyone.core.fields.ObjectIdField
    support:          fiftyone.core.fields.FrameSupportField
    events:           fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
    emb_test_1:       fiftyone.core.fields.VectorField
    emb_test_2:       fiftyone.core.fields.VectorField
Frame fields:
    id:               fiftyone.core.fields.ObjectIdField
    frame_number:     fiftyone.core.fields.FrameNumberField
    

Repeat the compute_visualization section.

In [31]:
results = fob.compute_visualization(
    dataset,
    embeddings="emb_test_2", # or whichever embedding field
    method="umap",
    brain_key="emb_viz_2",
    num_dims=2,
    verbose=True,
)

Generating visualization...




UMAP( verbose=True)
Sat Feb  1 09:07:38 2025 Construct fuzzy simplicial set
Sat Feb  1 09:07:38 2025 Finding Nearest Neighbors
Sat Feb  1 09:07:38 2025 Finished Nearest Neighbor Search
Sat Feb  1 09:07:38 2025 Construct embedding


Epochs completed: 100%| ██████████ 500/500 [00:00]

	completed  0  /  500 epochs
	completed  50  /  500 epochs
	completed  100  /  500 epochs
	completed  150  /  500 epochs
	completed  200  /  500 epochs
	completed  250  /  500 epochs
	completed  300  /  500 epochs
	completed  350  /  500 epochs
	completed  400  /  500 epochs
	completed  450  /  500 epochs
Sat Feb  1 09:07:38 2025 Finished embedding





Let's take a look of the results in the APP:

Check the ```brain_key``` are in the list of Embeddings Panel:

![Image](https://github.com/user-attachments/assets/d74d68f8-1e14-462b-956c-db60c278d004)

Exploring Embeddings emb_test_1:

![Image](https://github.com/user-attachments/assets/37de8ac9-2db4-4eb1-b46a-081eaa8c64b4)

Exploring Embeddings emb_test_2:

![Image](https://github.com/user-attachments/assets/b2e6a849-25fa-4264-875d-dc15185c9c06)
