# Assignment 3: Motion AutoEncoder

In this assignment, you will implement and train an autoencoder for motion generation using the CMU-Mocap dataset. You will develop a simplified version of a motion manifold system capable of learning from motion capture data, synthesizing new movements, and enabling basic motion editing techniques.

Please read the following two papers as they are your reference for your implementation:
* Learning Motion Manifolds with Convolutional Autoencoders, https://www.ipab.inf.ed.ac.uk/cgvu/motioncnn.pdf
* A Deep Learning Framework for Character Motion Synthesis and Editing, https://www.ipab.inf.ed.ac.uk/cgvu/motionsynthesis.pdf

To help you understand motion data representation, here are some valuable resources:

* **CMU Mocap dataset**: https://mocap.cs.cmu.edu/
  * The raw data format may be challenging to interpret directly. For easier use, download the BVH format (which our dataloader utilizes): https://github.com/una-dinosauria/cmu-mocap
  * Blender (https://www.blender.org/) can be used to visualize the downloaded BVH files. This will provide a clearer understanding of the human skeleton structure used in the dataset and allow you to visualize the raw motion sequences.
* **Additional reference** (optional): For a deeper exploration of human models, you can examine more motion data in FBX format at: https://www.mixamo.com/#/?page=1&type=Motion%2CMotionPack
  * Note that Mixamo uses a different skeletal structure than our dataset, but it includes human mesh deformation with motion, which BVH data doesn't provide.


This assignment includes substantial starter code in separate Python files to help you get started. Your tasks are to:

* Complete all sections marked with `TODO` comments. Feel free to add code or restructure functions to improve your implementation's flexibility during training.
* Present your results by embedding videos and images in this notebook. You should also answer all questions and complete the write-up sections in this notebook.
* Include all necessary video files, images, and code files in your submission for proper evaluation. **Important**: Do NOT submit model checkpoints as these will unnecessarily increase your submission size.

**Please reserve enough time for this assignment given the potential amount of time for training.**

In [1]:
# For you to add video to your submission.
from IPython.display import Video

## Part 1: Familiarize Yourself with the Data (10 pt)

Unlike previous assignments where standard dataloaders were readily available in existing libraries, this assignment requires implementing a custom dataloader for the CMU-Mocap dataset.

We've provided most of the dataloader implementation in `dataloader.py` and included a pre-processed version of the necessary data in `cmu-mocap/cache` (https://utexas.box.com/v/cmu-mocap-cache). Please download this data and place the `cmu-mocap` folder in your current working directory. Alternatively, you can download the BVH files from the links provided earlier and generate your own pre-processed data.

For this part, you need to implement data normalization in the dataloader. There are two sets of TODOs in the file:

1. First, compute appropriate statistics for normalizing your data based on the entire dataset
2. Second, implement the normalization procedure in the `__getitem__` function where the dataloader retrieves one batch of data

Upon successful completion of this section, you should be able to visualize four sample motion sequences by running `python dataloader.py`.

Important notes:
* There are multiple approaches to normalizing motion data. You are free to choose any method that produces reasonable training results in the subsequent sections.
* Include visualizations of your **normalized motion** sequences in your submission for this part, following the example format below.

In [2]:
# Replace with your own video file.
Video('./vid/sample_0.mp4')

## Part 2: Motion Manifold Learning (45 points)

In this section, you will implement the neural network architecture and training methodology described in the reference paper to learn a motion manifold.

### Convolutional Autoencoder Architecture (15 points)

1. Implement the convolutional autoencoder following the architecture detailed in (one of) the reference paper in `MotionAutoencoder`.
2. Develop both the encoding and decoding operations that will allow the network to compress motion data into a lower-dimensional manifold representation and then reconstruct it.

### Training Procedure (30 points)

1. Implement the training procedure in `MotionManifoldTrainer` for your autoencoder, carefully considering the appropriate loss function(s) to use for motion data.
2. Generate and include training curves using the visualization template provided in the starter code.

Important considerations:
1. The reference paper was published several years ago when neural network implementations were often manually coded with custom operations. You'll need to adapt these concepts to modern PyTorch conventions and standard operations. Some training parameters may require adjustment to work effectively with contemporary deep learning frameworks.
2. Your submission for this part should include clear visualizations of your training curves showing loss change over time.

## Part 3: Motion Synthesis (30 points)

### Motion Interpolation (15 points)

Implement the function `MotionManifoldSynthesizer.interpolate_motions` that creates transitions between different motions using the learned manifold. This function should accept two motion sequences sampled from the dataset and generate an interpolated motion that blends naturally between them. You can visualize your results using the provided `visualize_interpolation` function.

Your submission for this section should include at least two video examples demonstrating motion interpolation between different movement types.

### Fixing Corrupt Motion Data (15 points)

Implement the function `MotionManifoldSynthesizer.fix_corrupted_motion` that projects corrupted motion data onto the learned manifold and reconstructs corrected, natural-looking movements. This function should demonstrate the manifold's ability to act as a prior distribution over valid human motion. Use the provided `visualize_motion_comparison` function to create side-by-side comparisons of the corrupted input and your reconstructed output.

Your submission for this section should include at least two video examples showing motion correction from different types of corruptions provided in the starter code.

In [None]:
# Your videos of fixing corrupt motion data.

## Part 4. Analysis Questions (15 pt)

Answer the question with your analysis. The questions are open-ended. We are looking for you own observasion from the expriments you did. Autoencoder is known as a relatively simple method so a lot of things here won't be perfect.

1. Explain your chosen normalization approach for the motion data. Why did you select this method, and how does it specifically address the challenges of human motion data? What other normalization techniques did you consider, and why did you not choose them?

[Answer]:

2. After training your autoencoder, explore and describe the structure of your learned manifold. You can use t-SNE or PCA to visualize the hidden unit space (include one image in your answer).  Are different motion types clustered in particular regions? Can you identify meaningful directions in the latent space that correspond to specific motion attributes (speed, posture, etc.)?

[Answer]:

3. Critically analyze the quality of your interpolated motions. Where does the interpolation succeed or fail? What patterns do you notice about transitions between dissimilar motions versus similar ones?

[Answer]:

4. For the corrupted motion reconstruction task, analyze which types of corruption your system handles well versus poorly. What does this tell you about the properties of your learned manifold?

[Answer]:

## Extra Credit: Advanced Motion Synthesis and Editing (20 points)

In this optional extra credit section, you'll implement more sophisticated motion synthesis techniques and potentially extend your model architecture to achieve these advanced tasks.

For this task, you'll develop a method to complete partially specified motion sequences. Given a motion with missing frames, your system should intelligently fill in the gaps while maintaining natural movement characteristics and continuity.

You need:
- Develop a method to mask out and fill missing segments in motion sequences
- Ensure smooth transitions between existing and synthesized motion
- Leverage your trained motion manifold to generate plausible completions

Your submission shoud:
- Provide at least two examples of filling gaps in the middle of motion sequences
- Provide at least two examples of extending incomplete motions by synthesizing the ending frames
- For each visualization, you need to show the input and output side-by-side
- Include a short analysis of your approach and the quality of your results

### Motion Edtiting (10 points)

For this task, you'll implement the style transfer technique described in "A Deep Learning Framework for Character Motion Synthesis and Editing." This will allow you to transfer the style characteristics of one motion to another while preserving the content of the target motion.

You need to develop the method to add constraint on the trained autoencoder.

Your submission shoud:
- Provide at least three examples of motion editing results between different motion types. Each visualization should include the original content motion, the reference motion, and your result
- Write a short analysis of your results, discussing successes, limitations, and potential improvements
