## Data Preprocessing Instruction (for Semantic Segmentation)

This file provide a simple implementation to transfer MCD *.pcd files to binary point cloud input and labels for semantic segmentation.

### 1. Preliminary Installation

Some preliminary packages need to be installed in advanced. Note that pypcd is required to be installed from our modified source, since the official one is no longer maintained for the current python version. Details can be referred to the pypcd issue https://github.com/dimatura/pypcd/issues/28.

In [None]:
pip install numpy
pip install --upgrade git+https://github.com/mcdviral/pypcd.git
pip install tqdm

### 2. Data Organization

Next, download the annotated lidar files from our official website and unzip them subsequently. We recommend you to rename the folder of different sequence to its sequence name (e.g., ntu_day_01) for a consistent data organization with ours. To this end, you should have a data folder organized as follows.   

📦MCD_root_folder
<br>
 ┣ 📂SEQUENCE_NAME_01
 <br>
 ┃ ┣ 📜cloud_0000.pcd
 <br>
 ┃ ┣ 📜cloud_0001.pcd
 <br>
 ┃ ┗ 📜...
 <br>
 ┣ 📂SEQUENCE_NAME_02
 <br>
 ┃ ┣ 📜cloud_0000.pcd
 <br>
 ┃ ┣ 📜cloud_0001.pcd
 <br>
 ┃ ┗ 📜...
 <br>
 ┣ ...
 <br>
 ┣ 📂SEQUENCE_NAME_N
 <br>
 ┃ ┣ 📜cloud_0000.pcd
 <br>
 ┗ ┗ 📜cloud_0001.pcd

### 3. Data Preprocessing

Finally, we provide the following code snippets to extract the segmentation input and labels from the *.pcd file and store them into easy-to-read binary files. Make sure you have the writing access in the data folder.

In [None]:
import os
import glob
import numpy as np
from pypcd import pypcd
from tqdm import tqdm

#* Change the following folder path
mcd_root_dir = "/home/aaron/Downloads/mcd_viral"

for seq_dir in os.listdir(mcd_root_dir):
    curr_seq_dir = os.path.join(mcd_root_dir, seq_dir)
    pcd_files = glob.glob(curr_seq_dir + '/*.pcd')
    pcd_files = sorted(pcd_files)
    
    # Create a pcd, bin, label folder
    os.makedirs(os.path.join(curr_seq_dir, 'pcd'), exist_ok=True)
    os.makedirs(os.path.join(curr_seq_dir, 'bin'), exist_ok=True)
    os.makedirs(os.path.join(curr_seq_dir, 'labels'), exist_ok=True)
    
    # Log
    print("-"*60)
    print("Start converting sequence: {}".format(seq_dir))
    print("-"*60)
    
    for pcd in tqdm(pcd_files):
        # Read the pcd
        cloud_pcd = pypcd.PointCloud.from_path(pcd).pc_data
        xyz   = np.array([list(p) for p in cloud_pcd[['x', 'y', 'z']]], dtype=np.float32)
        
        # Binary data input/label
        cloud_in_bin = np.concatenate(
            [xyz, cloud_pcd['intensity'].reshape(-1,1)], axis=1).astype(np.float32)
        cloud_label_bin = cloud_pcd['label'].astype(np.int32)
        
        # Save binary input/label
        cloudidx_str = pcd.split('/')[-1].replace('.pcd', '')
        cloud_in_bin.tofile(os.path.join(curr_seq_dir, 'bin', '{}.bin'.format(cloudidx_str)))
        cloud_label_bin.tofile(os.path.join(curr_seq_dir, 'labels', '{}.label'.format(cloudidx_str)))
        
        # Move pcd file into pcd folder
        os.replace(pcd, os.path.join(curr_seq_dir, 'pcd', pcd.split('/')[-1]))
    
    print("\n")
    
print("All pcd to bin completed!")
    

After the convertion completed, you should have the root folder organized as follows:

📦MCD_root_folder
<br>
 ┣ 📂SEQUENCE_NAME_01
 <br>
 ┃ ┣ 📂pcd
  <br>
 ┃ ┃ ┣ 📜cloud_0000.pcd
 <br>
 ┃ ┗ ┣ 📜...
  <br>
 ┃ ┣ 📂bin
  <br>
 ┃ ┃ ┣ 📜cloud_0000.bin
 <br>
 ┃ ┗ ┣ 📜...
 <br>
 ┃ ┣ 📂labels
 <br>
 ┃ ┃ ┣ 📜cloud_0000.label
 <br>
 ┃ ┗ ┣ 📜...
 <br>
 ┣ 📂SEQUENCE_NAME_02
 <br>
 ┃ ┣ 📂pcd
  <br>
 ┃ ┣ 📂bin
 <br>
 ┃ ┣ 📂labels
 <br>
 ┣ ...
