## Using SOMA to label unlabeled mocaps

We roughly base this off of the two SOMA tutorials: [Tutorial 1](https://github.com/nghorbani/soma/blob/main/src/tutorials/run_soma_on_soma_dataset.ipynb) and [Tutorial 2](https://github.com/nghorbani/soma/blob/main/src/tutorials/label_priming.ipynb).

Reading the [definitions](https://github.com/nghorbani/soma/tree/main/src/tutorials#definitions) of terminology is recommended.

We will be using a pre-trained SOMA model called the "SuperSet" to label our unlabeled motioin capture data. The SuperSet trains a body model with 89 markers placed around the body, and thus will label our unlabeled mocap data by matching our unlabeled markers to a subset of the 89 markers. If we have an unlabeled marker that does not correspond to a marker on the SuperSet, the behavior is undefined -- either that marker will be unlabeled (marked as ghost points/discarded), or it will be labelled with some nearby marker label. This imperfect labelling can lead to downstream errors when trying to fit the SMPL model to the labelled mocap data.

It is possible to train a new SOMA model, for a different marker layout (specifically a marker layout that does not fit a subset of the SuperSet), please check this [README](https://github.com/nghorbani/soma/tree/main/src/tutorials/README.md) and [Tutorial 1](https://github.com/nghorbani/soma/blob/main/src/tutorials/run_soma_on_soma_dataset.ipynb) for information on this. As per the note in Tutorial 1:


>Due to licensing restrictions we cannot release the AMASS marker noise model and the CAESAR beta parameters.
>
>In an ablative study in the paper, we have shown that these parameters improve the performance SOMA. So without them it might be that the model you train would be underperforming
>

Thus it is more effective to use the pretrained models. As such, we will not be covering model training in this tutorial.

Conceptually, this is how the SOMA training works (see [this paper](https://arxiv.org/pdf/2110.04431.pdf) for more details):
- Goal: to label unlabeled mocap data for a specific *markerlayout*
- First, we take a single frame from a single trial of our unlabeled mocap data
  - one frame needed per subject, since each subject has a unique body -- and thus unique markerlayouts/point clouds -- thus a unique body model
- This frame is labelled
- This labelled frame is MoSHed to get a SMPL-X body model for that specific subject
- virtual markers corresponding to our markerlayout are automotically placed on this SMPL-X body
- This SMPL body model is animated by AMASS motions
  - AMASS dataset has many actions saved in SMPL format, which allows us to have our subject-specific body-model 'perform' many actions
  - Our markerlayout remains on the body model as these actions are performed
- Thus a large repository of labeled mocap data is synthesized, data which is specific to our subject and our markerlayout
- Real noise data is added to this synthetic data
- occlusions and ghost points also added
- SOMA model is trained on this synthetic data, where the model is given the input of a synthetic unlabeled point cloud and learns to output the corresponding labels
- Transformers with multiple self-attention units are used to learn the spatial structure of the 3d body, as well as an optimal transport layer to encode the natural constraints of the human body.

### Environment Check and Imports

In [3]:
# If this doesn't work, something is wrong with the conda environment. Check conda-env.md
import soma
import psbody
import moshpp

In [4]:
import os
import os.path as osp
import time

from loguru import logger

from soma.tools.run_soma_multiple import run_soma_on_multiple_settings


### Configuration Files

SOMA uses OmegaConf for configuration of each run. You can change every value of the configuration inside this notebook, so you do not need to change the YAML file, unless you want to change the default value for future cases.

In [11]:
def get_soma_conf_file():
    '''
    Prints path of soma_run_conf.yaml to inspect exact configuration settings
    '''
    import soma
    init_path = osp.join(soma.__file__)
    base_path = osp.dirname(osp.dirname(osp.dirname(init_path)))
    conf_path = osp.join(base_path, 'support_data', 'conf', 'soma_run_conf.yaml')

    return conf_path

In [12]:
get_soma_conf_file()

'/home/ritaank/Documents/dev/motion_synthesis/soma-root-2/soma/support_data/conf/soma_run_conf.yaml'

### Args for run_soma_on_multiple_settings:
    soma_expr_ids: list of soma experiment ids
    soma_mocap_target_ds_names: target dataset names, these should be available at mocap_base_dir
    soma_data_ids: data ids of some experiments
    tracklet_labeling_options: whether to use tracklet labeling
    ds_name_gt: gt mocap data for labeling evaluation
    soma_cfg: overloading soma_run_conf.yaml
    mosh_cfg: overloading moshpp_conf.yaml inside mosh code
    render_cfg: overload render_conf.yaml
    eval_label_cfg: eval_label.yaml
    parallel_cfg: relevant for use on IS cluster
    fast_dev_run: if True will run for a limited number of mocaps
    run_tasks: a selection of ['soma', 'mosh', 'render', 'eval_label']
    mocap_ext: file extension of the source mocap point clouds
    mosh_stagei_perseq: if True stage-i of mosh will run for every sequence instead of every subject
    fname_filter: List of strings to filter the source mocaps
    mocap_base_dir: base directory for source mocaps
    gt_mosh_base_dir: directory holding mosh results of the gt mocaps, used for v2v evaluation
    soma_work_base_dir: base directory for soma data. this directory holds: data, training_experiments, support_data
    **kwargs:

In [None]:
soma_work_base_dir = os.getcwd() # TODO: we will need to change working directory to subdir
support_base_dir = osp.join(soma_work_base_dir, 'support_files')

mocap_base_dir = osp.join(support_base_dir, 'evaluation_mocaps/original')
timestr = time.strftime("%Y%m%d-%H%M%S")

In [None]:
# The name of the folder of the unlabeled mocap data
# Change to the folder you want to label
data_source = 'immersion_test'

# The name of the subject, for folder naming purposes
# Not necessary to change; the outputs will be in a folder with this name + the time
subject_name = 'subjectA'

In [None]:
run_soma_on_multiple_settings(
        soma_expr_ids=[
            'V48_02_SuperSet', # the model we are using
        ],
        soma_mocap_target_ds_names=[
            data_source,
        ],
        soma_data_ids=[
            # TODO: what do these numbers mean
            'OC_05_G_03_real_000_synt_100',
        ],
        soma_cfg={
            'mocap.subject_name' : f'{args.subject_name}_{timestr}',
            'soma.batch_size': 256,
            'dirs.support_base_dir':support_base_dir,
            'mocap.unit': 'mm',
            'save_c3d': True,
            # 'keep_nan_points': True,
            'remove_zero_trajectories': True,
            # 'mosh_stagei_perseq': True,
        },
        parallel_cfg={
            # 'max_num_jobs': 5,
            'randomly_run_jobs': True,
        },
        run_tasks=[
            'soma',
        ],
        mocap_base_dir = mocap_base_dir,
        soma_work_base_dir=soma_work_base_dir,
        mocap_ext='.c3d' 
    )