# 1. Introduction
To get a first feel of the Mcity Data Engine, we provide an online demo in a Google Colab environment. We will load the Fisheye8K dataset and demonstrate the Mcity Data Engine workflow Embedding Selection. This workflow leverage a set of models to compute image embeddings. This subset will finally be visualized in the Voxel51 UI, highlighting how often a sample was picked by a model.

## Go to Header -> Click on Runtime and select Run All (as per the below image) for the outcome or follow the steps below:


<b> Chrome's Appearance Mode is Dark </b>

![image](https://github.com/user-attachments/assets/68a5fecd-41ac-42b9-83f9-8c52b8cd96e9)



# 2. Step-by-Step Instructions

### <b>Step 1</b>: Clone the mcity_data_engine GitHub repository using bewlow command. This command will remove the existing folder (if any) and clone it from GitHub. This will make a copy of the code in the current Google Colab session.

In [None]:
!rm -rf mcity_data_engine && git clone https://github.com/mcity/mcity_data_engine.git

### <b>Step 2</b>: Copy the local copy of the code [from Step 1] to workspace. 

In [None]:
!cp -R mcity_data_engine/* .

### <b>Step 3</b>: To execute embedding WORKFLOW, some configuration changes required. Same will be done by below script for Colab workflow:

In [None]:
!pip install typed-ast

In [None]:
import ast
import _ast


config_file_path = './config/config.py'

UPDATED_WORKFLOWS =    { "embedding_selection": {
        "mode": "compute",
        "parameters": {
            "compute_representativeness": 0.99,
            "compute_unique_images_greedy": 0.01,
            "compute_unique_images_deterministic": 0.99,
            "compute_similar_images": 0.03,
            "neighbour_count": 3,
        },
        "embedding_models": [
            "detection-transformer-torch",
            "zero-shot-detection-transformer-torch",
            "clip-vit-base32-torch",
        ],
    },}

class ConfigVisitor(ast.NodeTransformer):
        def visit_Assign(self, node):
            # Look for the assignment of the variables we want to modify
            if isinstance(node.targets[0], _ast.Name):
                if node.targets[0].id == "SELECTED_WORKFLOW":
                    node.value = ast.Constant(value=["embedding_selection"] )
                elif node.targets[0].id == "WORKFLOWS":
                    node.value = ast.Constant(value=UPDATED_WORKFLOWS)
                elif node.targets[0].id == "WANDB_ACTIVE":
                    node.value = ast.Constant(value=False)
                elif node.targets[0].id == "V51_REMOTE":
                    node.value = ast.Constant(value=False)                                      
            return node

    # Transform the AST
transformer = ConfigVisitor()

with open(config_file_path, "r") as file:
    content = file.read()

parsed_ast = ast.parse(content)
updated_ast = transformer.visit(parsed_ast)

    # Convert AST back to source code
    
updated_content = ast.unparse(updated_ast)

with open(config_file_path, "w") as file:
    file.write(updated_content)

print("Config file updated successfully.")



### <b>Step 4</b>: Install Colab specific data engine requirements for this exercise.

In [None]:
%%capture
!pip install -r requirements_colab.txt

### <b>Step 5</b>: Configure Huggingface timeout variables to avoid session timeouts due to default timeout values.

In [None]:
!export HF_HUB_ETAG_TIMEOUT=5000
!export HF_HUB_DOWNLOAD_TIMEOUT=1000

### <b>Step 6</b>: This exercise uses an opensource dataset named Voxel51/fisheye8k available on Hggingface,  which will be downloaded in this step. The Max_samples parameter is set to 100 for fast processing, but it can be increased to 8000 if needed.

In [None]:
from fiftyone.utils.huggingface import load_from_hub
import fiftyone as fo

if fo.dataset_exists("fisheye8k"):
  fo.delete_dataset("fisheye8k")

dataset = load_from_hub(
    "Voxel51/fisheye8k",
    name = "fisheye8k",
    max_samples=100,
    )

### <b>Step 7</b>: Start embedding WORKFLOW.

In [None]:
!python main.py &> /dev/null

### <b>Step 8</b>: Set up the Voxel51 app layout for the dataset for which embeddings are computed

   ### Below are the key fields related to embedding, which you will find on the left filter once the Voxel51 app is opened:

- <b> embedding_selection --> </b> Type of selection (stages of the internal selection process)

- <b> embedding_selection_model --> </b> Which embedding model was responsible for the first selection

- <b> embedding_selection_count --> </b> How often it was selected

  ![image](https://github.com/user-attachments/assets/3c8c46d8-0545-4dc5-aa7c-09f319aca450)

In [None]:
import fiftyone as fo
samples_panel = fo.Panel(type="Samples", pinned=True)

embeddings_panel = fo.Panel(
    type="Embeddings",
    state=dict(brainResult="clip_vit_base32_torch_umap", colorByField="embedding_selection_count"),
)

spaces = fo.Space(
    children=[
                fo.Space(children=[samples_panel]),
                fo.Space(children=[embeddings_panel]),
            ],
    orientation="horizontal",
)
fo.launch_app(dataset=fo.load_dataset('fisheye8k'),spaces=spaces)

# 3. Handling Errors and Troubleshooting
* The Voxel51 app may take 1-2 minutes to launch, depending on your internet speed.
* If no images appear in the Samples Panel or Embeddings Panel, manually click on the Samples or Embeddings tab to load them (refer Voxel51 [Tutorial](https://docs.voxel51.com/tutorials/image_embeddings.html)).

# 4. Additional Notes and Best Practices
* Always ensure your runtime is set to GPU for optimal performance.
* Keep an eye on execution logs for warnings or errors.
* Save your work frequently to avoid losing progress.


By following this manual, you will be able to execute the Fish Eye 8K notebook in Google Colab efficiently. Happy coding!