[![Open in Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mcity/mcity_data_engine/blob/main/fish_eye_8k_colab.ipynb)


# 1. Introduction
This manual provides detailed steps for running the Mcity data engine workflow for embedding using opensource Fish Eye 8K camera data on a Google Colab notebook. 

# 2. Prerequisites
Before proceeding, ensure you have:
* A Google account
* Access to Google Colab (https://colab.research.google.com/)
* A stable internet connection


# 3. Step-by-Step Instructions

## Step 1: Creating a Google Colab Workspace
1. Open your web browser and navigate to [Google Colab](https://colab.research.google.com/).
1. Sign in with your Google account if prompted.

## Step 2: Change Runtime type to T4 GPU on Colab
1. Click on Runtime in the menu bar.
2. Select Change runtime type.
3. Under the Hardware accelerator, choose T4 GPU.
4. Click Save to apply changes.
 
![image](https://github.com/user-attachments/assets/a6dd81f0-6066-4f82-a9d1-f4aea7201b89)


## Step 3: Connecting to Colab
1. Click the Connect button in the top-right corner.
2. Wait until you see a green checkmark indicating that the runtime is connected.

![image](https://github.com/user-attachments/assets/f2034bb4-7898-41c9-85b8-6d9e24fc09a2)


## Step 4: Clone Git repo

In [None]:
!git clone https://github.com/mcity/mcity_data_engine.git

## Step 5: Copy code to execution path

In [None]:
!cp -R mcity_data_engine/* .

## Step 6: Modify config for Colab workflow:

In [None]:
!pip install typed-ast

In [None]:
import ast
import _ast


config_file_path = './config/config.py'

UPDATED_WORKFLOWS =    { "embedding_selection": {
        "mode": "compute",
        "parameters": {
            "compute_representativeness": 0.99,
            "compute_unique_images_greedy": 0.01,
            "compute_unique_images_deterministic": 0.99,
            "compute_similar_images": 0.03,
            "neighbour_count": 3,
        },
        "embedding_models": [
            "detection-transformer-torch",
            "zero-shot-detection-transformer-torch",
            "clip-vit-base32-torch",
        ],
    },}

class ConfigVisitor(ast.NodeTransformer):
        def visit_Assign(self, node):
            # Look for the assignment of the variables we want to modify
            if isinstance(node.targets[0], _ast.Name):
                if node.targets[0].id == "SELECTED_WORKFLOW":
                    node.value = ast.Constant(value=["embedding_selection"] )
                elif node.targets[0].id == "WORKFLOWS":
                    node.value = ast.Constant(value=UPDATED_WORKFLOWS)
                elif node.targets[0].id == "WANDB_ACTIVE":
                    node.value = ast.Constant(value=False)
                elif node.targets[0].id == "V51_REMOTE":
                    node.value = ast.Constant(value=False)                                      
            return node

    # Transform the AST
transformer = ConfigVisitor()

with open(config_file_path, "r") as file:
    content = file.read()

parsed_ast = ast.parse(content)
updated_ast = transformer.visit(parsed_ast)

    # Convert AST back to source code
    
updated_content = ast.unparse(updated_ast)

with open(config_file_path, "w") as file:
    file.write(updated_content)

print("Config file updated successfully.")



## Step 7: Install Colab specific data engine requirements

In [None]:
!pip install -r requirements_colab.txt

## Step 8: Configure Huggingface timeout variables

In [None]:
!export HF_HUB_ETAG_TIMEOUT=5000
!export HF_HUB_DOWNLOAD_TIMEOUT=1000
!python main.py

## Step 9: Download sample dataset

In [None]:
from fiftyone.utils.huggingface import load_from_hub

dataset = load_from_hub(
    "Voxel51/fisheye8k",
    name = "fisheye8k",
    max_samples=1000,
)

## Step 9: start embeddings workflow

In [None]:
!python main.py

## Step 10: Set up the Voxel51 app layout for the dataset for which embeddings are computed

In [None]:
import fiftyone as fo
samples_panel = fo.Panel(type="Samples", pinned=True)

embeddings_panel = fo.Panel(
    type="Embeddings",
    state=dict(brainResult="clip_vit_base32_torch_umap", colorByField="embedding_selection_count"),
)

spaces = fo.Space(
    children=[
        fo.Space(
            children=[
                fo.Space(children=[samples_panel]),
            ],
            orientation="horizontal",
        ),
        fo.Space(children=[embeddings_panel]),
    ],
    orientation="vertical",
)
fo.launch_app(dataset=fo.load_dataset('fisheye8k'),spaces=spaces)

# 4. Handling Errors and Troubleshooting
* If Colab requests a session restart after installing requirements, ignore the error and proceed.
* The Voxel51 app may take 1-2 minutes to launch, depending on your internet speed.
* If no images appear in the Samples Panel or Embeddings Panel, manually click on the Samples or Embeddings tab to load them (refer Voxel51 [Tutorial](https://docs.voxel51.com/tutorials/image_embeddings.html)).

# 5. Additional Notes and Best Practices
* Always ensure your runtime is set to GPU for optimal performance.
* Keep an eye on execution logs for warnings or errors.
* Save your work frequently to avoid losing progress.


By following this manual, you will be able to execute the Fish Eye 8K notebook in Google Colab efficiently. Happy coding!