---
## **<p style="text-align: center; text-decoration: underline;">DATA CHALLENGE</p>**
# **<p style="text-align: center;">HUMAN MOTION DESCRIPTION (HMD): Motion-To-Text</p>**
---

> IMT Nord Europe *2025*.

---

![examples](https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fimg.clipart-library.com%2F2%2Fclip-motions%2Fclip-motions-6.png&f=1&nofb=1&ipt=0747ffa645bb5f7798e8a2d44499b28f1156ce0e83b1b300fabfed4c6ab1fdf2&ipo=images)

### â–  **Overview**
In this data challenge, you will explore the intersection of natural language processing (NLP) and human motion synthesis by working on text-to-motion and motion-to-text tasks using the HumanML3D dataset. This dataset contains 3D human motion sequences paired with rich textual descriptions, enabling models to learn bidirectional mappings between language and motion.

#### **I. Main Task: Motion-To-Text Generation**
- **Motion-to-Text:** Develop a model to describe human motions in natural language given a sequence of 3D poses.

#### **II. Dataset Overview:**
- HumanML3D includes 14,616 motion samples across diverse actions (walking, dancing, sports) and 44,970 text annotations.
- Data includes skeletal joint positions, rotations, and fine-grained textual descriptions.

<img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fproduction-media.paperswithcode.com%2Fdatasets%2F446194c5-ce59-43eb-b4cb-570a7a4d0cd9.png&f=1&nofb=1&ipt=b2edbe3251cab88e26a7f9d4e765c811b2cc890dc2ace7f7456baeca076b115b&ipo=images" alt="description" style="width:800px; height:600px;" />

The provided dataset contains the following components:

- 1. `motions` Folder: Contains `.npy` files, each representing a sequence of body poses. Each file has a shape of `(T, N, d)`, where:
  - `T`: Number of frames in the sequence (varies across sequences).
  - `N`: Number of joints in the body (22 in this case).
  - `d`: Dimension of each joint (3D coordinates: `x`, `y`, `z`).

- 2. `texts` Folder: Contains `.npy` files, each providing **3 textual descriptions** of the corresponding motion sequence. Each description is accompanied by part-of-speech (POS) tags for every word in the description. Example: "a person jump hop to the right#a/DET person/NOUN jump/NOUN hop/NOUN to/ADP the/DET right/NOUN#"

- 3. File Lists
    - **`all.txt`**: List of all motion files in the dataset.
    - **`train.txt`**: List of motion files for training.
    - **`val.txt`**: List of motion files for validation.
    - **`test.txt`**: List of motion files for testing.


#### **III. Evaluation Metrics**

BLEU (Bilingual Evaluation Understudy): The BLEU score evaluates the quality of generated text against reference texts using n-gram precision.
> Note: Higher BLEU scores (closer to 1 or 100\%) indicate better text-motion alignment. BLEU focuses on lexical overlap, not semantic accuracy. For motion descriptions, it measures how well generated text matches the linguistic patterns of ground-truth annotations.

Solutions should be submitted in the following format (in a csv file):

For each ID in the motion test set, you must predict the corresponding description. The file should contain a header and have the following format:

| id      | text                                                                 |
|---------|---------------------------------------------------------------------|
| 004822  | A person walks slowly forward, swinging their arms naturally        |
| 014457  | Someone performs a golf swing with proper form                      |
| 009613  | An individual jogs backwards diagonally across the room             |
| 008463  | A man bends down to pick up an object while walking                 |
| 012365  | A dancer spins clockwise while raising both arms                    |
| 007933  | Two people engage in a slow-motion martial arts demonstration       |
| 003430  | A child skips happily across a playground                           |
| 014522  | An athlete performs a perfect cartwheel sequence                    |
| 005698  | A woman gracefully practices yoga sun salutations                   |
| 001664  | A parkour expert vaults over a low wall                             |

You can generate your submission files using pandas as follows:

    >>> submission = pd.DataFrame({
    ...     'id': ['004822', '014457', ...],
    ...     'text': [
    ...         "a person walking slowly",
    ...         "someone swinging a golf club",
    ...         ...
    ...     ]
    ... })
    ... submission.to_csv('./submission.csv', index=False)

### **Animation Demo**

In [1]:
import os
from os.path import join as pjoin
from tqdm import tqdm
import numpy as np

import matplotlib
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.animation import FuncAnimation, PillowWriter
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
import mpl_toolkits.mplot3d.axes3d as p3

# Define the kinematic tree for connecting joints
kinematic_tree = [
    [0, 2, 5, 8, 11],
    [0, 1, 4, 7, 10],
    [0, 3, 6, 9, 12, 15],
    [9, 14, 17, 19, 21],
    [9, 13, 16, 18, 20]
]

def plot_3d_motion(save_path, joints, title, figsize=(10, 10), fps=120, radius=4):
    # Split the title if it's too long
    title_sp = title.split(' ')
    if len(title_sp) > 10:
        title = '\n'.join([' '.join(title_sp[:10]), ' '.join(title_sp[10:])])

    def init():
        ax.set_xlim3d([-radius / 2, radius / 2])
        ax.set_ylim3d([0, radius])
        ax.set_zlim3d([0, radius])
        fig.suptitle(title, fontsize=20)
        ax.grid(b=False)

    def plot_xzPlane(minx, maxx, miny, minz, maxz):
        # Plot a plane XZ
        verts = [
            [minx, miny, minz],
            [minx, miny, maxz],
            [maxx, miny, maxz],
            [maxx, miny, minz]
        ]
        xz_plane = Poly3DCollection([verts])
        xz_plane.set_facecolor((0.5, 0.5, 0.5, 0.5))
        ax.add_collection3d(xz_plane)

    # Reshape the joints data
    data = joints.copy().reshape(len(joints), -1, 3)
    # fig = plt.figure(figsize=figsize)
    # ax = p3.Axes3D(fig)
    fig = plt.figure(figsize=figsize)
    ax = fig.add_subplot(111, projection='3d')
    init()

    # Compute min and max values for the data
    MINS = data.min(axis=0).min(axis=0)
    MAXS = data.max(axis=0).max(axis=0)

    # Define colors for the kinematic tree
    colors = ['red', 'blue', 'black', 'red', 'blue',
              'darkblue', 'darkblue', 'darkblue', 'darkblue', 'darkblue',
              'darkred', 'darkred', 'darkred', 'darkred', 'darkred']

    frame_number = data.shape[0]

    # Adjust the height offset
    height_offset = MINS[1]
    data[:, :, 1] -= height_offset
    trajec = data[:, 0, [0, 2]]

    # Center the data
    data[..., 0] -= data[:, 0:1, 0]
    data[..., 2] -= data[:, 0:1, 2]

    def update(index):
        # Clear existing lines and collections
        for line in ax.lines:
            line.remove()
        for collection in ax.collections:
            collection.remove()

        # Update the view
        ax.view_init(elev=120, azim=-90)
        ax.dist = 7.5

        # Plot the XZ plane
        plot_xzPlane(MINS[0] - trajec[index, 0], MAXS[0] - trajec[index, 0], 0, MINS[2] - trajec[index, 1], MAXS[2] - trajec[index, 1])

        # Plot the trajectory
        if index > 1:
            ax.plot3D(trajec[:index, 0] - trajec[index, 0], np.zeros_like(trajec[:index, 0]), trajec[:index, 1] - trajec[index, 1], linewidth=1.0, color='blue')

        # Plot the kinematic tree
        for i, (chain, color) in enumerate(zip(kinematic_tree, colors)):
            linewidth = 4.0 if i < 5 else 2.0
            ax.plot3D(data[index, chain, 0], data[index, chain, 1], data[index, chain, 2], linewidth=linewidth, color=color)
        # Hide axis labels
        plt.axis('off')
        ax.set_xticklabels([])
        ax.set_yticklabels([])
        ax.set_zticklabels([])

    # Create the animation
    ani = FuncAnimation(fig, update, frames=frame_number, interval=1000 / fps, repeat=False)

    # Save the animation
    ani.save(save_path, fps=fps)
    plt.close()

    print(f'Animation saved to {save_path}!')

In [3]:
## path to data /!\ replace this with your paths
motion_data_dir = './motions/'
text_data_dir = './texts/'

## list all files in the folder
npy_files = sorted(os.listdir(motion_data_dir))

## pick a random motion file
npy_file = np.random.choice(npy_files)

## read npy motion file
motion_data = np.load(os.path.join(motion_data_dir, npy_file))
print('shape', motion_data.shape)

## get the corresponding titles for the given motion
titles = []
with open('{}{}.txt'.format(text_data_dir, npy_file.split('.')[0])) as f:
    descriptions = f.readlines()
    for desc in descriptions:
        titles.append(desc.split('#')[0].capitalize())

print('Descriptions:')
print('- '+'\n- '.join(titles))

## pick a random title
title = np.random.choice(titles)

## create & save animation
save_path = './animation.gif'
plot_3d_motion(save_path, motion_data, title=title, figsize=(10, 6), fps=30, radius=4)

shape (199, 22, 3)
Descriptions:
- Someone is moving forward and looking at their watch for something, and looking confused
- Waiving and looking at back of hand to stop sun
- Person is stumping on something.


MovieWriter ffmpeg unavailable; using Pillow instead.


Animation saved to ./animation.gif!


### **Evaluation Metric**

In [None]:
import numpy as np
from nltk.translate.bleu_score import sentence_bleu

def score(gt_texts, generated_text) -> float:
    """Calculate BLEU score"""

    # Get 3 ground truth references
    refs = [d.split(' ') for d in gt_texts]
    # Get single submission candidate
    gen = generated_text.split(' ')
    # Calculate blue score
    bleu_score = sentence_bleu(refs, gen)

    return bleu_score

## Usage Example
gt_texts = ["a person walks forward to the left, picks something up",
            "a person walks up to something, picks it up, brings it back to where they were",
            "a man walks forward, picks up an object with his right hand"]
generated_text = "a person walks forward to the left"

score(gt_texts, generated_text)