<a href="https://colab.research.google.com/github/syntec-research/Cafca/blob/main/Cafca_Synthetic_Dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cafca Synthetic Dataset

## General Information


This notebook visualizes the main components of the Cafca synthetic dataset. The full dataset contains 1,500 subjects, each of which is rendered with 13 expressions in 3 environments from 30 views, resulting in 1.755 Mio. images.

### Folder Structure

The paths follow this format: `SUBJECT`/`EXP`_`ENV`/`{cameras_json, color_image, foreground_mask, segmentation}`

`SUBJECT` and `EXP` are 5-digit numbers. `ENV` is a 3-digit number. Camera names go from `C00` to `C29`.

Please see "Utilities" (`load_example(...)`) for loading a scene.


### Environments
The dataset contains each expression rendered in three different environments. The first environment (index `000`) is the same for all expressions (`Laval_Indoor_9C4A5690_8k.exr`). The other two environments are picked at random from the [Laval Indoor Dataset](http://indoor.hdrdb.com/). The environment name is saved in `environment.json` in the `color_image` folder.


## Coordinate System

The dataset uses a right-handed coordinate system following OpenCV convention (X right, Y down, Z forward). For convenience, the camera json file contains redundant information:
the full projection matrices (`P`), extrinsic and intrinsic matrices (`world2cam` / `cam2world` and `K`), and all of
these parameters individually.

## Segmentation

The segmentations are stored as grayscale 16-bit PNG. Region `0` is the background, and other entries refer to regions like facial skin, throat, hair, upper body, eyes, etc.

## FLAME 2023 Annotations and Keypoints

The dataset includes pseudo-GT annotations for keypoints and [FLAME](https://flame.is.tue.mpg.de/") 2023. These annotations were obtained by running [VHAP](https://github.com/ShenhanQian/VHAP), a photometric fitting pipeline.

The annotations are stored as `npz` files that combine all expressions for a particular frame and environment. For example, the FLAME parameters for subject `00000` and environment `000` are stored under
`00000-00099/00000/flame_2023_000/tracked_flame_params_100.npz`.

2D Keypoints are estimated with [STAR](https://github.com/ShenhanQian/STAR/) and stored as `npz` in the `landmark2d/STAR` subfolder for each frame and camera. For example, the following path stores landmarks for subject `00000`, expression `00001`, environment `000` and camera `C00`: `00000-00099/00000/00001_000/landmark2d/STAR/C00.npz`


## Setup

In [None]:
#@title Download and Unzip Sample Dataset
print('Downloading sample dataset with 5 identities...')
!wget https://dataset.ait.ethz.ch/downloads/cafca_v2/mini_sample_dataset.zip
!unzip mini_sample_dataset.zip

In [None]:
!pip install mediapy trimesh --quiet

In [None]:
#@title Imports
import plotly.io as pio

import mediapy
import matplotlib.pyplot as plt
import numpy as np
import os
import json
import plotly.graph_objs as go
from typing import List, Any
import trimesh

In [None]:
#@title Utilities

def read_camera_json(path: str) -> dict[str, Any]:
  """Returns a dict with camera parameters."""
  with open(path, 'rb') as f:
    return json.load(f)


def read_foreground_mask(path: str) -> np.ndarray:
  """Returns a float foreground mask file in [0, 1.0]."""
  foreground_mask = mediapy.read_image(path).astype(float)
  # Map from [0, 255] to [0, 1].
  return foreground_mask / 255.0


def scatter3d(arr: np.ndarray | List, name: str, mode: str='markers', **kwargs) -> go.Scatter3d:
  """Returns a plotly.graph_objs.Scatter3d object for the given array."""
  arr = np.array(arr)
  return go.Scatter3d(x=arr[:, 0], y=arr[:, 1], z=arr[:, 2], mode=mode, name=name, **kwargs)


def read_flame_mesh(path: str) -> trimesh.Trimesh:
  return trimesh.load(path)

def read_flame_params(path: str) -> dict[str, Any]:
  params = np.load(path)
  keys = params.files
  return {k: params[k] for k in keys}

def load_example(base_dir, subject: str, expression: str, environment: str, camera_name: str) -> dict[str, Any]:
  """Returns a dict with data for an individual scene."""
  frame_dir = os.path.join(base_dir, subject, f'{expression}_{environment}')

  rgb = mediapy.read_image(os.path.join(frame_dir, f'color_image/{camera_name}.png'))
  segmentation = mediapy.read_image(os.path.join(frame_dir, f'segmentation/{camera_name}.png'))
  foreground_mask = read_foreground_mask(os.path.join(frame_dir, f'foreground_mask/{camera_name}.png'))
  camera = read_camera_json(os.path.join(frame_dir, f'cameras_json/{camera_name}.json'))
  flame_mesh = read_flame_mesh(os.path.join(base_dir, subject, f'flame_2023_{environment}/eval_100/mesh/frame_{expression}.obj'))
  flame_params = read_flame_params(os.path.join(base_dir, subject, f'flame_2023_{environment}/tracked_flame_params_100.npz'))

  return {
      'rgb': rgb,
      'segmentation': segmentation,
      'foreground_mask': foreground_mask,
      'camera': camera,
      'flame_mesh': flame_mesh,
      'flame_params': flame_params,
      }

## Visualizations
We first visualize all modalities for a single example and then plot multiple cameras.

### Individual Scene

In [None]:
#@title Choose Example
base_dir = './'
# Load the following scene:
subject = '00000'  #@param{"type": "string"}
expression = '00001'  #@param{"type": "string"}
environment= '002'  #@param{"type": "string"}
camera_name = 'C00'  #@param{"type": "string"}

example = load_example(base_dir, subject, expression, environment, camera_name)

In [None]:
#@title Visualize Scene
visuals_keys = ['rgb', 'segmentation', 'foreground_mask']
visuals = {k: example[k] for k in visuals_keys}
visuals['segmentation'] = visuals['segmentation'] / np.max(visuals['segmentation'])  # Scale to [0, 1.0] for visualization

mediapy.show_images(visuals.values(), titles=visuals_keys, height=256)

In [None]:
#@title Visualize Semantic Regions
semantic_regions = set(example['segmentation'].reshape(-1).tolist())
semantic_vis = list()

# We only visualize the regions in this example. Other images have more semantic regions.
for semantic_region in semantic_regions:
    vis = np.zeros_like(example['segmentation'])
    vis[example['segmentation'] == semantic_region] = 255
    semantic_vis.append(vis)
mediapy.show_images(semantic_vis, titles=[f'Region {region}' for region in semantic_regions], height=256)

In [None]:
#@title List Camera Fields
for k in example['camera']:
  print(k, np.array(example['camera'][k]).shape)

In [None]:
#@title List FLAME Parameters

# The dict contains parameters for all 13 expression.
for k in example['flame_params']:
  print(k, example['flame_params'][k].shape)

In [None]:
#@title Visualize Cameras for Multiple View Points
n_cameras = 30  #@param{"type": "number"}
near, far = 0, 0.3

camera_names = [f'C{i:02d}' for i in range(n_cameras)]
examples = list(map(lambda camera_name: load_example(base_dir, subject, expression, environment, camera_name), camera_names))

positions = np.array([example['camera']['position'] for example in examples])
orientations = np.array([example['camera']['orientation'] for example in examples])
orientations_x = orientations[:, 0]
orientations_y = orientations[:, 1]
orientations_z = orientations[:, 2]

cam_x = (np.linspace(near, far, 10)[:, None, None] * orientations_x + positions).reshape(-1, 3)
cam_y = (np.linspace(near, far, 10)[:, None, None] * orientations_y + positions).reshape(-1, 3)
cam_z = (np.linspace(near, far, 10)[:, None, None] * orientations_z + positions).reshape(-1, 3)

mesh = example['flame_mesh']

origin = np.zeros((1, 3))
fig = go.Figure(data=[ # original h3ds
    scatter3d(positions, name='Position'),
    scatter3d(origin, name='Origin'),
    scatter3d(cam_x, name='X', marker={'size': 3, 'opacity': 0.5}),
    scatter3d(cam_y, name='Y', marker={'size': 3, 'opacity': 0.5}),
    scatter3d(cam_z, name='Z', marker={'size': 3, 'opacity': 0.5}),
    scatter3d(mesh.vertices[::2], name='mesh', marker={'size': 1, 'opacity': 1}),
    ])

fig.update_layout(scene_camera=dict(
    eye=dict(x=0, y=0, z=2),
    center=dict(x=0, y=0, z=0),
    up=dict(x=0, y=1, z=0)
))

fig.show()

### Multiple Scenes

In [None]:
#@title Visuals for a Neutral Expression in Three Environments
subject = '00001'  #@param{"type": "string"}
n_cameras = 5  #@param{"type": "number"}
camera_names = [f'C{i:02d}' for i in range(n_cameras)]
subject_dir = os.path.join(base_dir, subject)
rgbs = list()

for expression, env in [('00000', '000'), ('00000', '001'), ('00000', '002')]:
  examples = list(map(lambda camera_name: load_example(base_dir, subject, expression, env, camera_name), camera_names[:n_cameras]))
  rgbs += [example['rgb'] for example in examples]

mediapy.show_images(rgbs, height=256, columns=n_cameras)

In [None]:
#@title Visuals for One Subject with an Expressive Face in the Same Environment
subject = '00001'  #@param{"type": "string"}
n_cameras = 5  #@param{"type": "number"}
rgbs = list()

for expression, env in [('00001', '000'), ('00002', '000'), ('00003', '000')]:
  examples = list(map(lambda camera_name: load_example(base_dir, subject, expression, env, camera_name), camera_names[:n_cameras]))
  rgbs += [example['rgb'] for example in examples]

mediapy.show_images(rgbs, height=256, columns=n_cameras)

In [None]:
#@title Visuals for Multiple Subjects, Expressions, and Environments
n_subjects = 3  #@param{"type": "number"}
n_expressions = 3  #@param{"type": "number"}
n_environments = 3  #@param{"type": "number"}
n_cameras = 5  #@param{"type": "number"}

for subject_i in range(n_subjects):
    rgbs = list()
    for expression_i in range(n_expressions):
        for env_i in range(n_environments):
          examples = list(map(lambda camera_name: load_example(base_dir, f'{subject_i:05d}', f'{expression_i:05d}', f'{env_i:03d}', camera_name), camera_names[:n_cameras]))
          rgbs += [example['rgb'] for example in examples]
    print(f'Subject {subject_i:05d}')
    mediapy.show_images(rgbs, height=256, columns=n_cameras)