# Getting Started with Depth Estimation: Using Depth Estimation Models

### Loading dataset

Note, we've created an indepth tutorial that discusses the methods for loading depth data into Fiftyone. As discussed i in that tutorial FiftyOne's `Heatmap` class is ideal for representing depth data:

```python
fo.Heatmap(
    map=None,           # 2D numpy array containing the data
    map_path=None,      # OR path to the heatmap image on disk
    range=None          # Optional [min, max] range for proper visualization
)
```

Let's start by loading a dataset from the Hugging Face Hub. 

In [1]:
from datasets import load_dataset

clver_depth = load_dataset(
    "erkam/clevr-with-depth",
    split="train",
    cache_dir="clevr_with_depth",
)

Note how this dataset is saved:

In [None]:
clver_depth[0]

The code takes a Hugging Face dataset containing image-depth pairs and converts it into a FiftyOne dataset for visualization and analysis. 

For each sample, it saves the RGB image to disk (since FiftyOne requires file paths) and extracts the depth information from the first channel of the RGBA depth map. Each sample in the resulting FiftyOne dataset contains the path to the RGB image, the original prompt, and the depth map stored as a heatmap visualization. 

The depth values are scaled between 0 and 198, which represents the range of depth values in this dataset.

In [23]:
import fiftyone as fo
import numpy as np
from PIL import Image
import os

def convert_dataset_to_fiftyone(hf_dataset, save_dir="./clver_depth_data"):
    """
    Converts a Hugging Face dataset containing image-depth pairs into a FiftyOne dataset.

    This function takes a dataset from Hugging Face that contains RGB images and their corresponding
    depth maps, saves the images to disk, and creates a FiftyOne dataset with the images and depth
    information stored as heatmaps.

    Args:
        hf_dataset: A Hugging Face dataset containing 'image', 'depth', and 'prompt' fields
        save_dir (str): Directory path where images and depth maps will be saved.
                       Defaults to "./clver_depth_data"

    Returns:
        fo.Dataset: A FiftyOne dataset containing:
            - RGB images stored on disk
            - Depth maps as FiftyOne Heatmap objects (scaled 0-198)
            - Original prompts from the dataset

    Note:
        The depth maps are extracted from the first channel of the RGBA depth images
        since all channels are identical in this dataset.
    """
    # Create directories if they don't exist
    os.makedirs(os.path.join(save_dir, "images"), exist_ok=True)
    os.makedirs(os.path.join(save_dir, "depth"), exist_ok=True)
    
    samples = []
    # Create a FiftyOne dataset
    dataset = fo.Dataset("clver_depth", overwrite=True, persistent=True)
    
    for idx, item in enumerate(hf_dataset):
        # Generate filenames
        image_filename = f"image_{idx:06d}.png"
        depth_filename = f"depth_{idx:06d}.png"
        
        image_path = os.path.join(save_dir, "images", image_filename)
        depth_path = os.path.join(save_dir, "depth", depth_filename)
        
        # Save images to disk
        item['image'].save(image_path)
        
        # Extract depth map from first channel (since all channels are identical in this dataset)
        depth_np = np.array(item['depth'])[:, :, 0]  # Taking channel 0

        # Create a FiftyOne sample
        sample = fo.Sample(
            filepath=image_path,
            prompt=item['prompt']
        )
        
        # Add depth as Heatmap with proper range
        sample["depth"] = fo.Heatmap(
            map=depth_np,
            range=[0, 198] # if you know the range of your dataset, use those values
        )
        # Add the sample to the dataset
        samples.append(sample)
        
    dataset.add_samples(samples)
    return dataset
# Usage:
fo_dataset = convert_dataset_to_fiftyone(clver_depth)

 100% |███████████████| 1400/1400 [2.2s elapsed, 0s remaining, 626.8 samples/s]      


In [24]:
fo.launch_app(fo_dataset)

Dataset:          clver_depth
Media type:       image
Num samples:      1400
Selected samples: 0
Selected labels:  0
Session URL:      http://localhost:5151/

You can verify the depth map was parsed by calling the dataset:


In [26]:
fo_dataset

Name:        clver_depth
Media type:  image
Num samples: 1400
Persistent:  True
Tags:        []
Sample fields:
    id:               fiftyone.core.fields.ObjectIdField
    filepath:         fiftyone.core.fields.StringField
    tags:             fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:         fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    created_at:       fiftyone.core.fields.DateTimeField
    last_modified_at: fiftyone.core.fields.DateTimeField
    prompt:           fiftyone.core.fields.StringField
    depth:            fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Heatmap)

And inspect the values of the first map like so:

In [25]:
fo_dataset.first()['depth']

<Heatmap: {
    'id': '67cb64d2f7d612329d38ecd6',
    'tags': [],
    'map': array([[  0,   0,   0, ...,   0,   0,   0],
           [  0,   0,   0, ...,   0,   0,   0],
           [  0,   0,   0, ...,   0,   0,   0],
           ...,
           [191, 191, 191, ..., 191, 191, 191],
           [191, 191, 192, ..., 192, 191, 191],
           [192, 192, 192, ..., 192, 192, 192]], dtype=uint8),
    'map_path': None,
    'range': [0.0, 198.0],
}>

Refer to our guide for loading depth data for other examples and more detail. Once the dataset has been parsed to Fiftyone format you can launch the app and inspect it's contents

```python
fo.launch_app(fo_dataset)
```

<img src="/home/harpreet/workspace/getting-started-fo-experiences/depth-estimation/assets/clevr-dataset.gif" width= "70%">


## Using depth estimation models in FiftyOne

### As a Zoo model

You can load `transformers` depth estimation models directly from the FiftyOne Model Zoo! 

To load a transformers depth estimation model from the zoo, specify `depth-estimation-transformer-torch` as the first argument, and pass in the model’s name or path as a keyword argument:

```python
model = foz.load_zoo_model(
    "depth-estimation-transformer-torch",
    name_or_path="path/to-model",
)
```

Any model that can be run in a Hugging Face pipeline for the `depth-estimation` task can be loaded as a Zoo model. A non-exhaustive list of such models includes:

* [`Intel/dpt-large`](Intel/dpt-large) 

* [`Intel/dpt-hybrid-midas`](https://huggingface.co/Intel/dpt-hybrid-midas)

* [`vinvino02/glpn-kitti`](https://huggingface.co/vinvino02/glpn-kitti)

* [`LiheYoung/depth-anything-small-hf`](https://huggingface.co/LiheYoung/depth-anything-small-hf)

* [`depth-anything/Depth-Anything-V2-Small-hf`](https://huggingface.co/depth-anything/Depth-Anything-V2-Small-hf)

* [`Intel/zoedepth-nyu-kitti`](https://huggingface.co/Intel/zoedepth-nyu-kitti)

Refer to the Hugging Face documentation on [*Monocular depth estimation*](https://huggingface.co/docs/transformers/tasks/monocular_depth_estimation) to stay up to date on which models can be run in a pipeline.  

**Note:** When selecting a model, it's advisable to refer to it's model card and determine whether its suitable for your dataset and use case.

Below is an example of using the `depth-anything/Depth-Anything-V2-Small-hf` on the dataset we parsed earlier:

In [None]:
import torch

import fiftyone as fo
import fiftyone.zoo as foz

dav2_model = foz.load_zoo_model(
    "depth-estimation-transformer-torch",
    name_or_path="depth-anything/Depth-Anything-V2-Small-hf",
    device="cuda" if torch.cuda.is_available() else "cpu"
    )

fo_dataset.apply_model(
    dav2_model, 
    label_field="dav2_small",
    )

  62% |█████████------|  868/1400 [22.6s elapsed, 13.8s remaining, 38.8 samples/s]   

To verify:

In [None]:
fo_dataset.first()["dav2_small"]

In [None]:
import fiftyone.zoo as foz

model = foz.load_zoo_model(
    "depth-estimation-transformer-torch",
    name_or_path="Intel/dpt-hybrid-midas",
)

dataset.apply_model(model, label_field="dpt_hybrid_midas")

session = fo.launch_app(dataset)

### Hugging Face model that's not compatible with integration

### Plugin

https://github.com/harpreetsahota204/depthpro-plugin

In [None]:
fiftyone plugins download https://github.com/harpreetsahota204/depthpro-plugin

In [None]:
fiftyone plugins requirements @harpreetsahota/depth_pro_plugin --install

In [None]:
import fiftyone.operators as foo

depthpro = foo.get_operator("@harpreetsahota/depth_pro_plugin/depth_pro_estimator")

You can compute the depth map directly through the FiftyOne App:

Launch the FiftyOne App with your dataset
Open the "Operators Browser" by clicking on the Operator Browser icon above the sample grid or by typing backtick (`)
Type "depth_pro_estimator"
Configure the following parameters:
Depth Type: Choose between:
inverse - Reciprocal of depth (1/distance)
regular - Direct physical distance measurement in meters.
Field Name: Enter the name for the heatmap field (e.g., "depth_map")
Click "Execute" to compute depth estimation for your dataset


In [None]:
await depthpro(
    dataset,
    depth_field="depth_map",
    depth_type="inverse",
    delegate=True
    )

### Diffusers Depth Estimation 

In [None]:
!pip install diffusers

In [None]:
import diffusers
import torch

pipe = diffusers.MarigoldDepthPipeline.from_pretrained(
    "prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torch_dtype=torch.float16
).to("cuda")

image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg")
depth = pipe(image)

vis = pipe.image_processor.visualize_depth(depth.prediction)
vis[0].save("einstein_depth.png")

depth_16bit = pipe.image_processor.export_depth_to_16bit_png(depth.prediction)
depth_16bit[0].save("einstein_depth_16bit.png")

### Arbitrary Depth Estimation Model

https://github.com/DepthAnything/Depth-Anything-V2