# NTU60 Dataset Setup for Google Drive

This notebook helps you download, process, and upload NTU RGB+D 60 dataset to Google Drive for easy access in Colab and other environments.

## Overview
- **Dataset Size**: ~5-10 GB (raw + processed)
- **Processing Time**: ~30-60 minutes depending on system
- **Final Output**: Processed `.npz` files ready for training

## Steps:
1. Mount Google Drive
2. Download or upload NTU60 raw data
3. Process the raw data into `.npz` format
4. Upload processed files to Google Drive
5. Verify setup

## Step 1: Mount Google Drive

First, mount your Google Drive to access it from Colab.

In [None]:
from google.colab import drive
import os

# Mount Google Drive
drive.mount('/content/drive')

# Set up paths
DRIVE_ROOT = '/content/drive/MyDrive'
PROJECT_DIR = os.path.join(DRIVE_ROOT, 'CTR-GCN')
DATA_DIR = os.path.join(PROJECT_DIR, 'data')
NTU_RAW_DIR = os.path.join(DATA_DIR, 'nturgbd_raw')
NTU_PROCESSED_DIR = os.path.join(DATA_DIR, 'ntu')

# Create directories if they don't exist
os.makedirs(NTU_RAW_DIR, exist_ok=True)
os.makedirs(NTU_PROCESSED_DIR, exist_ok=True)

print(f"Google Drive mounted at: {DRIVE_ROOT}")
print(f"Project directory: {PROJECT_DIR}")
print(f"Data directory: {DATA_DIR}")
print(f"NTU raw data directory: {NTU_RAW_DIR}")
print(f"NTU processed data directory: {NTU_PROCESSED_DIR}")

## Step 2: Download NTU60 Dataset

**Option A: If you already have the dataset downloaded locally**

You can upload it manually to Google Drive at the path shown above, or use the upload method below.

**Option B: Download from NTU website**

1. Request access: https://rose1.ntu.edu.sg/dataset/actionRecognition
2. Download: `nturgbd_skeletons_s001_to_s017.zip` (NTU RGB+D 60)
3. Upload to Google Drive at: `{NTU_RAW_DIR}/`

**Option C: Use wget (if you have direct download link)**

If you have a direct download link, you can use wget below. Otherwise, download manually and upload to Drive.

In [None]:
# Check if raw data already exists
import glob

raw_data_path = os.path.join(NTU_RAW_DIR, 'nturgb+d_skeletons')
zip_file = os.path.join(NTU_RAW_DIR, 'nturgbd_skeletons_s001_to_s017.zip')

if os.path.exists(raw_data_path):
    print(f"✅ Raw data already exists at: {raw_data_path}")
    skeleton_files = glob.glob(os.path.join(raw_data_path, '**/*.skeleton'), recursive=True)
    print(f"   Found {len(skeleton_files)} skeleton files")
elif os.path.exists(zip_file):
    print(f"✅ Zip file found: {zip_file}")
    print("   Extracting...")
    import zipfile
    with zipfile.ZipFile(zip_file, 'r') as zip_ref:
        zip_ref.extractall(NTU_RAW_DIR)
    print("   ✅ Extraction complete!")
else:
    print("❌ Raw data not found.")
    print(f"   Please upload 'nturgbd_skeletons_s001_to_s017.zip' to: {NTU_RAW_DIR}")
    print(f"   Or extract it to: {raw_data_path}")
    
    # Option: Use wget if you have a direct link
    # Uncomment and add your download link:
    # !wget "YOUR_DOWNLOAD_LINK_HERE" -O {zip_file}

## Step 3: Clone/Setup CTR-GCN Repository

If you haven't already, clone the CTR-GCN repository or upload it to Google Drive.

In [None]:
# Navigate to project directory
if not os.path.exists(PROJECT_DIR):
    print(f"Project directory not found. Please clone or upload CTR-GCN to: {PROJECT_DIR}")
    print("\nTo clone:")
    print(f"  !git clone https://github.com/Uason-Chen/CTR-GCN.git {PROJECT_DIR}")
else:
    os.chdir(PROJECT_DIR)
    print(f"✅ Project directory found: {PROJECT_DIR}")
    print(f"   Current directory: {os.getcwd()}")
    
    # Check if processing scripts exist
    processing_scripts = [
        'data/ntu/get_raw_skes_data.py',
        'data/ntu/get_raw_denoised_data.py',
        'data/ntu/seq_transformation.py'
    ]
    
    missing_scripts = [s for s in processing_scripts if not os.path.exists(s)]
    if missing_scripts:
        print(f"\n⚠️  Missing processing scripts:")
        for script in missing_scripts:
            print(f"   - {script}")
        print("\n   Please ensure all processing scripts are present.")
    else:
        print("\n✅ All processing scripts found!")

## Step 4: Install Dependencies

Install required packages for data processing.

In [None]:
# Install required packages
!pip install numpy scipy tqdm pyyaml

# Verify installation
import numpy as np
import scipy
print(f"✅ NumPy version: {np.__version__}")
print(f"✅ SciPy version: {scipy.__version__}")

## Step 5: Process Raw Data

Process the raw skeleton data into `.npz` format. This creates:
- `NTU60_CS.npz` (Cross-Subject split)
- `NTU60_CV.npz` (Cross-View split)

**Note**: This step may take 30-60 minutes depending on your system.

In [None]:
# Change to processing directory
processing_dir = os.path.join(PROJECT_DIR, 'data', 'ntu')
if not os.path.exists(processing_dir):
    print(f"⚠️  Processing directory not found: {processing_dir}")
    print("   Creating directory...")
    os.makedirs(processing_dir, exist_ok=True)

os.chdir(processing_dir)
print(f"Current directory: {os.getcwd()}")

# Check if processing scripts exist
scripts = [
    'get_raw_skes_data.py',
    'get_raw_denoised_data.py', 
    'seq_transformation.py'
]

for script in scripts:
    if not os.path.exists(script):
        print(f"⚠️  Script not found: {script}")
        print("   Please ensure all processing scripts are in data/ntu/ directory")
    else:
        print(f"✅ Found: {script}")

In [None]:
# Step 5.1: Get raw skeleton data
print("=" * 60)
print("Step 1/3: Getting raw skeleton data...")
print("=" * 60)
!python get_raw_skes_data.py
print("\n✅ Step 1 complete!\n")

In [None]:
# Step 5.2: Remove bad skeletons
print("=" * 60)
print("Step 2/3: Removing bad skeletons...")
print("=" * 60)
!python get_raw_denoised_data.py
print("\n✅ Step 2 complete!\n")

In [None]:
# Step 5.3: Transform sequences
print("=" * 60)
print("Step 3/3: Transforming sequences...")
print("=" * 60)
!python seq_transformation.py
print("\n✅ Step 3 complete!\n")

## Step 6: Verify Processed Files

Check that the processed `.npz` files were created successfully.

In [None]:
# Check for processed files
import glob

processed_files = glob.glob('*.npz')
print(f"Found {len(processed_files)} processed files:")
for f in processed_files:
    size_mb = os.path.getsize(f) / (1024 * 1024)
    print(f"  ✅ {f} ({size_mb:.2f} MB)")

# Expected files
expected_files = ['NTU60_CS.npz', 'NTU60_CV.npz']
missing_files = [f for f in expected_files if f not in processed_files]

if missing_files:
    print(f"\n⚠️  Missing expected files: {missing_files}")
else:
    print("\n✅ All expected files created successfully!")

## Step 7: Verify Data Structure

Load and inspect one of the processed files to ensure it's correct.

In [None]:
# Load and inspect processed data
if 'NTU60_CS.npz' in processed_files:
    data = np.load('NTU60_CS.npz', allow_pickle=True)
    print("NTU60_CS.npz contents:")
    for key in data.keys():
        arr = data[key]
        if isinstance(arr, np.ndarray):
            print(f"  {key}: shape={arr.shape}, dtype={arr.dtype}")
        else:
            print(f"  {key}: {type(arr)}")
    print("\n✅ Data structure looks good!")
else:
    print("⚠️  NTU60_CS.npz not found for inspection")

## Step 8: Files are Already in Google Drive!

Since we're working directly in Google Drive, the processed files are already saved there. 

**Location**: `{PROJECT_DIR}/data/ntu/`

You can now use these files in Colab or any other environment by mounting Google Drive and pointing to this path.

## Step 9: Update Config Files (Optional)

If you want to use absolute paths in your config files, you can update them here.

In [None]:
# Example: Update config file paths
config_path = os.path.join(PROJECT_DIR, 'config', 'nturgbd-cross-subject', 'default.yaml')

if os.path.exists(config_path):
    print(f"Config file found: {config_path}")
    print("\nCurrent data_path in config:")
    import yaml
    with open(config_path, 'r') as f:
        config = yaml.safe_load(f)
        print(f"  train_feeder_args.data_path: {config.get('train_feeder_args', {}).get('data_path', 'N/A')}")
        print(f"  test_feeder_args.data_path: {config.get('test_feeder_args', {}).get('data_path', 'N/A')}")
    
    # If you want to use absolute paths, uncomment below:
    # config['train_feeder_args']['data_path'] = os.path.join(PROJECT_DIR, 'data', 'ntu', 'NTU60_CS.npz')
    # config['test_feeder_args']['data_path'] = os.path.join(PROJECT_DIR, 'data', 'ntu', 'NTU60_CS.npz')
    # with open(config_path, 'w') as f:
    #     yaml.dump(config, f)
    # print("\n✅ Config updated with absolute paths!")
else:
    print(f"Config file not found: {config_path}")

## Step 10: Summary

### ✅ Setup Complete!

Your NTU60 dataset is now set up in Google Drive:

**Raw Data Location**: `{DRIVE_ROOT}/CTR-GCN/data/nturgbd_raw/`  
**Processed Data Location**: `{DRIVE_ROOT}/CTR-GCN/data/ntu/`

### Files Created:
- `NTU60_CS.npz` - Cross-Subject split (for training/testing)
- `NTU60_CV.npz` - Cross-View split (for training/testing)

### Next Steps:

1. **In Colab**: Mount Google Drive and use the data directly:
   ```python
   from google.colab import drive
   drive.mount('/content/drive')
   # Data is at: /content/drive/MyDrive/CTR-GCN/data/ntu/
   ```

2. **Training**: Use the config files with relative paths (they should work if you run from project root):
   ```bash
   python main.py --config config/nturgbd-cross-subject/default.yaml --device 0
   ```

3. **Storage**: The processed `.npz` files are much smaller than raw data and can be easily shared or backed up.

### File Sizes (Approximate):
- Raw data: ~5-7 GB
- Processed `.npz` files: ~500 MB - 2 GB each
- **Total in Google Drive**: ~2-4 GB (just processed files) or ~7-10 GB (with raw data)