In [None]:
!pip install -qU diffusers transformers accelerate trimesh

# Shap-E

**Shap-E** is a conditional model for generating 3D assets which could be used for video game development, interior design, and architecture. It is trained on a large dataset of 3D assets, and post-processed to render more views of each object and produce 16K instead of 4K point clouds.

The Shap-E model is trained in two steps:
1. an encoder accepts the point clouds and rendered views of a 3D asset and outputs the parameters of implicit functions that represent the asset
2. a diffusion model is trained on the latents produced by the encoder to generate either neural radience fields (NeRFs) or a textured 3D mesh, making it easier to render and use the 3D asset in downstream applications.

## Text-to-3D

In [None]:
from diffusers import ShapEPipeline
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

pipe = ShapEPipeline.from_pretrained(
    'openai/shap-e',
    torch_dtype=torch.float16,
    variant='fp16',
).to(device)

In [None]:
prompt = [
    'a firecracker',
    'a birthday cupcake'
]
guidance_scale = 15.0

images = pipe(
    prompt,
    guidance_scale=guidance_scale,
    num_inference_steps=64,
    frame_size=256,
).images

In [None]:
from diffusers.utils import export_to_gif

export_to_gif(images[0], 'firecracker_3d.gif')
export_to_gif(images[1], 'cake_3d.gif')

## Image-to-3D

In [None]:
# We will use Kandinsky 2.1 to generate an image first

from diffusers import DiffusionPipeline
import torch

prior_pipeline = DiffusionPipeline.from_pretrained(
    'kandinsky-community/kandinsky-2-1-prior',
    torch_dtype=torch.float16,
    use_safetensors=True,
).to('cuda')

pipeline = DiffusionPipeline.from_pretrained(
    'kandinsky-community/kandinsky-2-1',
    torch_dtype=torch.float16,
    use_safetensors=True,
).to('cuda')

In [None]:
prompt = 'a cheeseburger, white background'
image_embeds, negative_image_embeds = prior_pipeline(
    prompt,
    guidance_scale=1.0
).to_tuple()

image = pipeline(
    prompt,
    image_embeds=image_embeds,
    negative_image_embeds=negative_image_embeds,
).images[0]

image.save('burger.png')

Now we can use Shap-E:

In [None]:
from PIL import Image
from diffusers import ShapEImg2ImgPipeline
from diffusers.utils import export_to_gif

pipe = ShapEImg2ImgPipeline.from_pretrained(
    'openai/shap-e-img2img',
    torch_dtype=torch.float16,
    variant='fp16'
).to('cuda')

In [None]:
image = Image.open('burger.png').resize((256,256))
guidance_scale = 3.0

images = pipe(
    image,
    guidance_scale=guidance_scale,
    num_inference_steps=64,
    frame_size=256,
).images

export_to_gif(images[0], 'burger_3d.gif')

## Generate mesh

Shap-E is a flexible model that can also generate textured mesh outputs to be rendered for downstream applications.

In [None]:
from diffusers import ShapEPipeline
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

pipe = ShapEPipeline.from_pretrained(
    'openai/shap-e',
    torch_dtype=torch.float16,
    variant='fp16',
).to(device)

In [None]:
prompt = 'a birthday cupcake'
guidance_scale = 15.0

images = pipe(
    prompt,
    guidance_scale=guidance_scale,
    num_inference_steps=64,
    frame_size=256,
    output_type='mesh', # note here
)

Use the `export_to_ply()` to save the mesh output as a `ply` file. (We can also save the mesh output as an `obj` file with the `export_to_obj()` function.)

In [None]:
from diffuers.utils import export_to_ply

ply_path = export_to_ply(images[0], '3d_cake.ply')
print(f"Saved to folder: {ply_path}")

Then we can convert the `ply` file to a `glb` file with the trimesh library:

In [None]:
import trimesh

mesh = trimesh.load('3d_cake.ply')
mesh_export = mesh.export('3d_cake.glb', file_type='glb')

A GLB file (`.glb`), "GL Transmission Format Binary", is a standardized file foramt used to store and share 3D data, including 3D models, scenes, textures, materials, animations, and lighting information, all contained within a single, compact binary file.

By default, the mesh output is focused from the bottom viewpoint but we can change the default viewpoint by applying a rotation transform:

In [None]:
import trimesh
import numpy as np

mesh = trimesh.load('3d_cake.ply')

rot = trimesh.transformations.rotation_matrix(-np.pi / 2, [1, 0, 0])
mesh = mesh.apply_transform(rot)

mesh_export = mesh.export('3d_cake.glb', file_type='glb')

The mesh file can be visualized with the Dataset viewer once we upload it to the dataset repository.