# SEVA pipeline introduction

## For inference
Mainly command line based, calling functions from `preprocessing.py`.\
**Note:** we don't use the cropping dataloader yet, and all the cropping logistics are still 1024x1024 deterministic squares.\
(TODO: Use the `RandomBBoxCrop` class from `dataloader.py`.)

In [1]:
# first, ensure that preprocessing.py is in the current directory
# !python preprocessing.py -h

# example use to get a multi-view static scene from mvhumannet dataset ready for SEVA inference of a SINGLE subject
# required: base_dir, timestep, output_dir
!python preprocessing.py \
    --base_dir "path_to_mvhumannet_dataset_subject" \
    --timestep 5 \
    --output_dir "output_path_to_store" \
    --seconds 3 \
    --fps 10
    # --num_train_frames 0 # no test frames
    # --train_ids_path "path_to_train_ids_txt" # is a txt file in which you can specify the exact training frames to use
# notes:
# - currently, --subject_id not used, but can change easily; otherwise, just defined subject path as above
# - if you DON'T want to generate orbital path poses (only train/test poses), set num_train_frames = 0 (overwriting fps & seconds)
# - black frames are generated as "placeholders" for the missing views for our custom orbital path. This is normal.
# - if you want to post-process all transform matrices with another, use --transform_coords (with a txt file from np.savetxt)
# - --crop_only is deprecated (probably don't want to use)

# generation also may take some time
# if you want to generate different train_test_splits_{num_train_frames}.json, use --apply_split_only boolean tag
# this will skip the generation process and simply generate the json of the order of train and test poses
!python preprocessing.py \
    --base_dir "path_to_mvhumannet_dataset_subject" \
    --timestep 5 \
    --output_dir "output_path_to_store" \
    --seconds 3 \
    --fps 10 \
    --apply_split_only
# this shouldn't redo any computation


IndentationError: unexpected indent (1803576428.py, line 20)

### Visualizations

In [33]:
# camera pose visualization

# run this in actual command line, otherwise it won't pop up
!python visuals.py --transforms_path "demo_inputs/assets_demo_cli/garden_flythrough/transforms.json"

Camera scale file not found at demo_inputs/assets_demo_cli/garden_flythrough/camera_scale.pkl. Using default scale of 1.0. This is fine if transforms have already been scaled.
Figure(1000x1000)


In [None]:
# view comparison (somewhat bad design: it uses the same file but --comparison flag for a different visualization)
!python visuals.py \
    --comparison \
    --gt_dir "to_processed_input_dir" \
    --comparison_dir "to_processed_output_dir" \
    --num_split 9 \
    --output_path "rendered_video_output_path" \
    --fps 30

# NOTE: resolutions are assumed to be square, otherwise this won't really work well.
# --gt_dir is our generated input directory (MVHumanNet subject directory with "transforms.json")
# --comparison_dir is the directory of the outputs with the SEVA outputs "first-pass, input, samples-rgb, transforms.json" directories and files.
# --num_split is important, and there must be a corresponding train_test_split_{num_split}.json file in the --gt_dir directory.

Remaining tests: 11
Ignoring input image: ./input/000.png
IMAGE COMPARISON STATS:

11 (360, 640, 3)
11 (576, 576, 3)
11 (3, 4)
Target dimensions: 576x576
Initial shapes: GT (360, 640, 3), Generated (576, 576, 3)
After scaling: GT (576, 576, 3), Generated (576, 576, 3)
Saving video...
Saving video...: 100%|██████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 11.08it/s]
Video saved as comparison_re10k.mp4


## For Training

### Data Preprocessing
From `dataloader.py`, cropping class & dataloader.\
(these are dependent on `preprocessing.py`)

In [2]:
import os
import torch
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms.v2 as T
from PIL import Image
from preprocessing import get_bbox_center_and_size, get_mvhumannet_extrinsics, load_json, load_pickle, update_intrinsics_resize, generate_gaussian_mixture_samples, generate_gaussian_samples
import json
from dataloader import MVHumanNetDataset
import matplotlib.pyplot as plt

transform = T.Compose([
    T.Resize(576), # whatever final resolution we want here
    T.ToTensor(),
])

mvhumannet_dataset = 'mvset/' # change this to dataset path, with the structure as below:
# Expected dataset structure:
# ${mvhumannet_dataset}/
# ├── subject_id1/
# │   ├── annots/
# │   │   ├── camera1/
# │   │   │   └── frame_001.json
# │   │   └── camera2/
# │   │       └── frame_001.json
# │   ├── images_lr/
# │   │   ├── camera1/
# │   │   │   └── frame_001.jpg
# │   │   └── camera2/
# │   │       └── frame_001.jpg
# │   ├── fmask_lr/
# │   │   ├── camera1/
# │   │   │   └── frame_001_fmask.png
# │   │   └── camera2/
# │   │       └── frame_001_fmask.png
# │   ├── camera_extrinsics.json
# │   ├── camera_intrinsics.json
# │   └── camera_scale.pkl
# └── subject_id2/
#     └── ...

# sampling distributions

# 'pre_scale' accounts for the intrinsics-related camera scaling
# as MVHumanNet authors downsampled by a factor of 2 beforehand.
mvds = MVHumanNetDataset(root_dir=mvhumannet_dataset, transforms=transform, pre_scale=0.5)
dataloader_train = DataLoader(mvds, batch_size=1, shuffle=True)



In [3]:
# Example samples post-processed cropped images
cropped_img, updated_K, transform_matrix = next(iter(dataloader_train))
print(f"Feature batch shape: {cropped_img.size()}")
print(f"Labels batch shape: {updated_K.size()}")
print(f"Transforms batch shape: {transform_matrix.size()}")

plt.figure(figsize=(10, 10))
# Convert tensor to numpy for plotting
img_to_plot = cropped_img.squeeze(0).permute(1, 2, 0).numpy()
# Ensure values are in valid range for imshow
if img_to_plot.max() <= 1.0:
    plt.imshow(img_to_plot)
else:
    plt.imshow(img_to_plot / 255.0)
plt.title('Cropped Image')
plt.axis('off')
plt.show()



NameError: name 'x1' is not defined

TODO: pull this training branch and get it working: https://github.com/nviolante25/stable-virtual-camera/tree/training