## 0. Data Preprocessing

In [1], the authors utilise the NTU RGB+D 60 datasest for their experiments. Each skeleton of the 60-class dataset is captured at 30fps and consists of 25 joints. 

The preprocessing the authors perform consists of several steps:

1. Denoise the raw skeleton data
2. Remove skeleton files that contain poor data
3. Remove files that contain 2 actors -> this removes 11 action classes

The dataset is split into train/test splits depending on camera view. The first camera view is used for evaluations, while the other two are used for training.

The sequences are cut or repeated until each sequence has a length of T = 75.

### 0.1 Preprocessing

We start by loading the raw skeleton data and filtering any samples with missing skeletons.

In [12]:
import os 
import re
import pickle
import numpy as np
from pathlib import Path
import open3d as o3d

ntu60_path = '/media/ubi-lab-desktop/Extreme Pro/data/nturgb+d_skeletons'
ntu60_files = os.listdir(ntu60_path)

with open('sitc/data/NTU_RGBD_samples_with_missing_skeletons.txt', 'r') as f:
    missing_skeletons = [line.split("\n")[0] for line in f.readlines()[3:]]

ntu60_files = [file for file in ntu60_files if Path(file).stem not in missing_skeletons]

In [13]:
# Since the authors do no specify how the denoising is performed, we will skip this step for now and remove the files that contain 2 actors.
with open('sitc/data/NTU_RGBD_actions_with_two_people.txt', 'r') as f:
    two_people = [line.split("\n")[0] for line in f.readlines()]

ntu60_files = [file for file in ntu60_files if re.findall('[A-Z][^A-Z]*', Path(file).stem)[-1] not in two_people]

In [14]:
print(f"Number of samples in filtered data: {len(ntu60_files)}")

Number of samples in filtered data: 46231


In [19]:
# To generate the train/test splits, we have to filter the samples of camera 1 for validation, and the samples of cameras 2 and 3 for training.
ntu60_train = [file for file in ntu60_files if re.findall('[A-Z][^A-Z]*', Path(file).stem)[1] in ['C002', 'C003']]

ntu60_val = [file for file in ntu60_files if re.findall('[A-Z][^A-Z]*', Path(file).stem)[1] in ['C001']]

print(f"Number of samples in training set: {len(ntu60_train)}")
print(f"Number of samples in validation set: {len(ntu60_val)}")

Number of samples in training set: 30757
Number of samples in validation set: 15474


In [20]:
# TODO: Cut or repeat the sequences until the frame length T=75.

file_nr = 0

with open(os.path.join(ntu60_path, ntu60_files[file_nr]), 'r') as f:
    read_lines = [line.split() for line in f.readlines()]
    num_of_frames = int(read_lines[0][0])

In [27]:
read_lines

[['66'],
 ['1'],
 ['72057594037938854',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0.2990857',
  '-0.2655556',
  '2'],
 ['25'],
 ['-0.3215134',
  '0.1103174',
  '3.10087',
  '224.6551',
  '196.7419',
  '862.4604',
  '501.9203',
  '0.3602485',
  '-0.03838925',
  '0.9263259',
  '-0.1032842',
  '2'],
 ['-0.320535',
  '0.4001262',
  '3.035819',
  '223.9105',
  '161.5224',
  '860.4355',
  '399.8131',
  '0.3754133',
  '-0.04008034',
  '0.9202845',
  '-0.1026398',
  '2'],
 ['-0.3154103',
  '0.6799014',
  '2.960193',
  '223.4654',
  '125.5171',
  '859.4493',
  '295.7386',
  '0.3899505',
  '-0.04568582',
  '0.9088764',
  '-0.1406954',
  '2'],
 ['-0.2319366',
  '0.7781583',
  '2.970166',
  '233.8856',
  '113.6137',
  '889.4865',
  '261.2465',
  '0',
  '0',
  '0',
  '0',
  '2'],
 ['-0.3987085',
  '0.5366822',
  '2.871958',
  '211.6594',
  '141.2563',
  '825.9193',
  '341.3803',
  '-0.2090429',
  '0.7506449',
  '-0.4660994',
  '0.4190283',
  '2'],
 ['-0.4658028',
  '0.2874838',
  '2.888669',
  '

In [21]:
num_of_frames

66

- [1] Carr, T., Xu, D., & Lu, A. (2024). Adversary-guided motion retargeting for skeleton anonymization. arXiv. https://arxiv.org/abs/2405.05428