# Text-driven Motion with Diffusion Models


Recently, there has been a surge of interest in leveraging diffusion models to explore new avenues in motion generations conditioning on latent representation of natural language. Novel approach have opened up exciting opportunities for generating dynamic, stable and consistent motion with stunning animations that have been produced conditioning with textual prompt.


We are thrilled to offer a selection of readings that we believe would be an excellent starting point for individuals who wish to delve deeper into the topic at hand. These resources have been thoughtfully curated to provide  valuable insights and perspectives to researchers and enthusiasts alike. We hope that these readings will prove to be a valuable resource for those seeking to expand their knowledge of this exciting and rapidly evolving area of study.

+ MotionCLIP: Exposing Human Motion Generation to CLIP Space [arXiv:2203.08063v1](https://arxiv.org/abs/2203.08063)
+ Human Motion Diffusion Model [arXiv:2209.14916v2](https://arxiv.org/abs/2209.14916)
+ Human Motion Diffusion as a Generative Prior [arXiv:2303.01418v1](https://arxiv.org/abs/2303.01418)

In this particular context, our goal is to generate motion directly from textual prompts, utilizing a model that is available from the work of [Zhang et al.](https://mingyuan-zhang.github.io/projects/MotionDiffuse.html). 

We acknowledge the [tutorial](https://huggingface.co/spaces/mingyuan/MotionDiffuse) has provided us with a valuable foundation of knowledge.

### Install the dependencies


In [None]:
!apt install ffmpeg
!pip install git+https://github.com/openai/CLIP.git
!pip install mmcv

Reading package lists... Done
Building dependency tree       
Reading state information... Done
ffmpeg is already the newest version (7:4.2.7-0ubuntu0.1).
0 upgraded, 0 newly installed, 0 to remove and 23 not upgraded.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/openai/CLIP.git
  Cloning https://github.com/openai/CLIP.git to /tmp/pip-req-build-9ryutem5
  Running command git clone --filter=blob:none --quiet https://github.com/openai/CLIP.git /tmp/pip-req-build-9ryutem5
  Resolved https://github.com/openai/CLIP.git to commit a9b1bf5920416aaeaec965c25dd9e8f98c864f16
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting ftfy
  Downloading ftfy-6.1.1-py3-none-any.whl (53 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.1/53.1 KB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: clip
  Building wheel for clip (setup.py) ... [?25l[?25hdone

In [None]:
!gdown https://drive.google.com/uc?id=1vzBZ2rNCQWBQpYvC6hpyJfR3iK1O_FEG
!unzip MotionDiffuse.zip
!rm MotionDiffuse.zip
!pip install matplotlib==3.3.1
import os
os.chdir("MotionDiffuse")

Downloading...
From: https://drive.google.com/uc?id=1vzBZ2rNCQWBQpYvC6hpyJfR3iK1O_FEG
To: /content/MotionDiffuse.zip
100% 711M/711M [00:10<00:00, 69.8MB/s]
Archive:  MotionDiffuse.zip
   creating: MotionDiffuse/
   creating: MotionDiffuse/checkpoints/
   creating: MotionDiffuse/checkpoints/t2m/
   creating: MotionDiffuse/checkpoints/t2m/t2m_motiondiffuse/
   creating: MotionDiffuse/checkpoints/t2m/t2m_motiondiffuse/meta/
  inflating: MotionDiffuse/checkpoints/t2m/t2m_motiondiffuse/meta/mean.npy  
  inflating: MotionDiffuse/checkpoints/t2m/t2m_motiondiffuse/meta/std.npy  
   creating: MotionDiffuse/checkpoints/t2m/t2m_motiondiffuse/model/
  inflating: MotionDiffuse/checkpoints/t2m/t2m_motiondiffuse/model/latest.tar  
  inflating: MotionDiffuse/checkpoints/t2m/t2m_motiondiffuse/opt.txt  
   creating: MotionDiffuse/datasets/
  inflating: MotionDiffuse/datasets/dataloader.py  
  inflating: MotionDiffuse/datasets/dataset.py  
  inflating: MotionDiffuse/datasets/evaluator.py  
  inflating: M

### Visualize the generated motion

In [None]:
%env PYTHONPATH=.:$PYTHONPATH
!python -u tools/visualization.py \
    --opt_path checkpoints/t2m/t2m_motiondiffuse/opt.txt \
    --text "a person is jumping" \
    --motion_length 60 \
    --result_path "test_sample.mp4"

env: PYTHONPATH=.:$PYTHONPATH
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Reading checkpoints/t2m/t2m_motiondiffuse/opt.txt
100% 1000/1000 [02:07<00:00,  7.87it/s]


In [None]:
from IPython.display import HTML
from base64 import b64encode
mp4 = open('test_sample.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=400 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

In [None]:
# open the kinematic_tree of the skeleton
with open("kinematic_tree.pickle", 'rb') as pfile:
  kinematic_tree = pickle.load(pfile)


colors = ['red', 'blue', 'black', 'red', 'blue',
              'darkblue', 'darkblue', 'darkblue', 'darkblue', 'darkblue',
              'darkred', 'darkred', 'darkred', 'darkred', 'darkred']

kinematic_tree

[[0, 2, 5, 8, 11],
 [0, 1, 4, 7, 10],
 [0, 3, 6, 9, 12, 15],
 [9, 14, 17, 19, 21],
 [9, 13, 16, 18, 20]]