# TripoSR Reimplementation — 3D Mesh from a Single Image

Welcome to this notebook where I will walk you through a **reimplementation of TripoSR**, a fast and powerful 3D object reconstruction model that takes a **single 2D image** and generates a **textured 3D mesh**.

This implementation is based on the original work by [Tripo AI](https://github.com/VAST-AI-Research/TripoSR) and adapted using the excellent tutorial by [PyImageSearch](https://pyimagesearch.com/2024/11/25/create-a-3d-object-from-your-images-with-triposr-in-python/). The goal here is to understand how the model works, and recreate the workflow in a clean, reproducible, and Colab-friendly format 🚀

<br/>

### What I'll Do Here

- Clone the TripoSR repo and set it up in Colab
- Install all required dependencies (with 💀 Mac fixes if needed)
- Upload your own image or use a sample one
- Generate a 3D mesh using the pre-trained TripoSR model
- Visualize the result interactively

> **Note**: This is a *reimplementation and walkthrough*, not an official version — credits go to the original authors and sources linked below.

<br/>

### References

- [TripoSR Paper (arXiv)](https://arxiv.org/pdf/2403.02151)
- [Official GitHub Repo](https://github.com/VAST-AI-Research/TripoSR)
- [PyImageSearch Blog Walkthrough](https://pyimagesearch.com/2024/11/25/create-a-3d-object-from-your-images-with-triposr-in-python/)

<br/>

Let’s dive in and build some 3D magic from 2D pixels ✨


### Clone the GitHub Repository

I'll use [PyImageSearch's Repo](https://github.com/pyimagesearch/TripoSR) to clone the source code for TripoSR. Since we're working inside Google Colab, so we use `%cd` to change the working directory and `sys.path.append` to import local modules like `tsr.infer`.

In [1]:
!git clone https://github.com/pyimagesearch/TripoSR.git
import sys
sys.path.append('/content/TripoSR/tsr')
%cd TripoSR

Cloning into 'TripoSR'...
remote: Enumerating objects: 164, done.[K
remote: Counting objects: 100% (97/97), done.[K
remote: Compressing objects: 100% (55/55), done.[K
remote: Total 164 (delta 65), reused 42 (delta 42), pack-reused 67 (from 1)[K
Receiving objects: 100% (164/164), 36.71 MiB | 19.53 MiB/s, done.
Resolving deltas: 100% (67/67), done.
/content/TripoSR


### Install Required Dependencies

To run TripoSR, we'll need to install a few Python packages including PyTorch, ONNX Runtime, and some image processing tools.

In [2]:
# Install all dependencies listed in the repo's requirements.txt
!pip install -r requirements.txt -q

# Install ONNX Runtime for running the model inference
!pip install onnxruntime

# Pillow upgrade is needed for proper image handling (especially for mesh
# textures)
!pip install --upgrade Pillow -q

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.6/3.6 MB[0m [31m57.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m688.5/688.5 kB[0m [31m51.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m229.9/229.9 kB[0m [31m23.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m269.3/269.3 kB[0m [31m26.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.7/41.7 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.5/98.5 MB[0m [31m9.2 MB/s[0m eta [36m0:00:00

### Import Libraries & Utilities

We’ll now import all the core libraries needed for this reimplementation. This includes:
- `torch` for GPU support and tensor ops
- `Pillow` & `rembg` for image processing and background removal
- `TSR` for the main model
- `pymeshlab` for mesh manipulation
- `IPython.display` to show the output as a video right in Colab

In [5]:
import torch
import os
import time
from PIL import Image
import numpy as np
from IPython.display import Video

# TripoSR system and utility functions
from tsr.system import TSR
from tsr.utils import remove_background, resize_foreground, save_video

# Mesh handling and background removal
import pymeshlab as pymesh
import rembg

### Select Device (GPU or CPU)

We'll check if a CUDA-enabled GPU is available and set our device accordingly. If you're using Google Colab with GPU runtime enabled, it should default to `"cuda"`.


In [6]:
device = "cuda" if torch.cuda.is_available() else "cpu"

### Create a Timer Utility Class

We'll define a simple `Timer` class to measure how long different parts of the pipeline take — especially useful when comparing performance across devices (CPU vs GPU).

This class:
- Automatically syncs with CUDA (if available) for accurate timing
- Stores timing data in a dictionary
- Prints the duration of each operation in **milliseconds (ms)**

In [7]:
class Timer:
  def __init__(self):
    self.items = {}
    self.time_scale = 1000.0
    self.time_unit = "ms"

  def start(self, name: str) -> None:
    if torch.cuda.is_available():
      torch.cuda.synchronize()
    self.items[name] = time.time()

  def end(self, name: str) -> float:
    if name not in self.items:
      return
    if torch.cuda.is_available():
      torch.cuda.synchronize
    start_time = self.items.pop(name)
    delta = time.time() - start_time
    t = delta * self.time_scale
    print(f"{name} finished in {t:.2f}{self.time_unit}.")

timer = Timer()

### Upload Your 2D Image

Now it's your turn! Upload any 2D image (preferably of a single object) that you'd like to convert into a 3D mesh.

This code:
- Opens a file upload dialog in Colab
- Loads the first uploaded image using `Pillow`
- Resizes it to **512×512** (TripoSR expects square inputs)
- Saves it to the `examples/` folder as `product.png`

In [8]:
from google.colab import files
uploaded = files.upload()

# Load the uploaded image
original_image = Image.open(list(uploaded.keys())[0])

# Resize and save in the format expected by TripoSR
original_image.resize((512, 512)).save("examples/product.png")

Saving product.png to product.png


### Configure Inference Parameters

Here we define all the key settings for running the TripoSR model:

- `image_paths`: Path to the input image  
- `pretrained_model_name_or_path`: Hugging Face model name or path  
- `device`: CUDA GPU device for inference (or CPU fallback)  
- `chunk_size`: Controls how much data gets processed at once (default = 8192)  
- `no_remove_bg`: If `True`, skips background removal (set `False` to enable)  
- `foreground_ratio`: Resize factor for foreground cropping  
- `output_dir`: Folder to save the generated mesh + render  
- `model_save_format`: File format for the saved mesh (e.g., `"obj"`)  
- `render`: Whether to generate a render video or not

In [9]:
image_paths = "/content/TripoSR/examples/product.png"
device = "cuda:0"
pretrained_model_name_or_path = "stabilityai/TripoSR"
chunk_size = 8192
no_remove_bg = True
foreground_ratio = 0.85
output_dir = "output/"
model_save_format = "obj"
render = True

# Make sure the output directory exists
output_dir = output_dir.strip()
os.makedirs(output_dir, exist_ok=True)

### Load the Pretrained TripoSR Model

We’ll now load the **pretrained TripoSR model** from Hugging Face using the built-in `TSR.from_pretrained()` method.

This step:
- Loads model weights + config from `"stabilityai/TripoSR"`
- Sets the rendering chunk size
- Moves the model to the specified device (`cuda` or `cpu`)
- Times the whole process using our custom `Timer` class

In [13]:
timer.start("Initializing model")
model = TSR.from_pretrained(
    pretrained_model_name_or_path, # HF model hub path
    config_name="config.yaml", # Model config
    weight_name="model.ckpt", # Pretrained weights
)

# Set chunk size for renderer
model.renderer.set_chunk_size(chunk_size)

# Move model to GPU or CPU
model.to(device)

timer.end("Initializing model")

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/4.6 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.6/4.6 MB[0m [31m20.4 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m4.6/4.6 MB[0m [31m78.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.6/4.6 MB[0m [31m57.5 MB/s[0m eta [36m0:00:00[0m
[?25h

config.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

Initializing model finished in 15433.58ms.


### Preprocess the Input Image

Now let’s get our uploaded image ready for TripoSR. This step involves:

- **Removing the background** (optional with `rembg`)
- **Cropping + resizing the foreground** to focus on the object
- **Handling transparency (RGBA)** by compositing it onto a neutral background
- Saving the final processed image to disk

We're also timing this step to see how long preprocessing takes.

In [14]:
timer.start("Processing images")

images = []
rembg_session = rembg.new_session()

# Remove background using Rembg (unless disabled)
image_with_bg_removed = remove_background(original_image, rembg_session)

# Resize + crop based on foreground
image = resize_foreground(image_with_bg_removed, foreground_ratio)

# Handle RGBA transparency blending
if image.mode == "RGBA":
    image = np.array(image).astype(np.float32) / 255.0
    image = image[:, :, :3] * image[:, :, 3:4] + (1 - image[:, :, 3:4]) * 0.5
    image = Image.fromarray((image * 255.0).astype(np.uint8))

# Save processed image
image_dir = os.path.join(output_dir, str(0))
os.makedirs(image_dir, exist_ok=True)
image.save(os.path.join(image_dir, "input.png"))

# Append to list for inference
images.append(image)

timer.end("Processing images")

Processing images finished in 3666.01ms.


### Run TripoSR on the Image (Inference, Render & Export)

Now comes the fun part! We’ll:
- Run the model on the processed image
- Generate a **360° render** of the object (30 views)
- Save the rendered frames and a `.mp4` turntable video
- Extract and export the **3D mesh** in your chosen format (default: `.obj`)

All steps are timed with our `Timer` class for performance tracking.

In [15]:
for i, image in enumerate(images):
    print(f"Running image {i + 1}/{len(images)} ...")

    # Inference
    timer.start("Running model")
    with torch.no_grad():
        scene_codes = model([image], device=device)
    timer.end("Running model")

    # Render turntable video
    if render:
        timer.start("Rendering")
        render_images = model.render(scene_codes, n_views=30, return_type="pil")
        for ri, render_image in enumerate(render_images[0]):
            render_image.save(os.path.join(output_dir, str(i), f"render_{ri:03d}.png"))
        save_video(
            render_images[0], os.path.join(output_dir, str(i), "render.mp4"), fps=30
        )
        timer.end("Rendering")

    # Export mesh
    timer.start("Exporting mesh")
    meshes = model.extract_mesh(scene_codes, has_vertex_color=False)
    mesh_file = os.path.join(output_dir, str(i), f"mesh.{model_save_format}")
    meshes[0].export(mesh_file)
    timer.end("Exporting mesh")

print("Processing complete.")

Running image 1/1 ...
Running model finished in 1862.29ms.


Please either pass the dim explicitly or simply use torch.linalg.cross.
The default value of dim will change to agree with that of linalg.cross in a future release. (Triggered internally at /pytorch/aten/src/ATen/native/Cross.cpp:62.)
  right = F.normalize(torch.cross(lookat, up), dim=-1)


Rendering finished in 33034.50ms.
Exporting mesh finished in 2759.80ms.
Processing complete.


### Preview the Rendered 3D Model

Let’s check out what we just created! Here's a **360° turntable render** of your object, stitched from 30 different viewpoints. Super useful to visually validate the result before working with the mesh.

In [16]:
Video('output/0/render.mp4', embed=True)