Download the github repo:

Run instruction guide:

1) Install the required library:
    python >=3.12
    pip install -r requirements.txt

2) Prepare the dataset: 
    Go to the site https://www.nuscenes.org/nuscenes#download. Register your account. 
    Download the nuscenes mini-v1.0 dataset from the link.
    Once the tar file is downloaded. Extract the datset into a data folder
    Set the dataset folder path in configs/base.yaml file. 
    Run the following commands to convert dataset info:
        python src/data_converter.py --config configs/base.yaml --split train
        python src/data_converter.py --config configs/base.yaml --split val
        python src/data_converter.py --config configs/base.yaml --split test
    
    Run he following command to validate if dataset converted:
        python src/data_converter.py --config configs/base.yaml --show-config
        python src/data_validate.py --config configs/base.yaml --show-config

    Run below commands to Diagnosis the dataset with samples prints:
    # Basic usage (5 samples)
        python src/validate_data_with_samples.py --config configs/base.yaml

    # More samples (10)
        python src/validate_data_with_samples.py --samples 10

    # Specific split
        python src/validate_data_with_samples.py --split train --samples 5

3) Training pipeline:
    Run the below commands to check the encoder: 
        python src/encoders.py
        python src/fusion.py

    Run the command to train the model:
        python src/train_detect.py train configs/base.yaml 
        python src/train_detect.py infer [path]     # Not tested

4) Run the evaluation:
    Run below command to evaluate the pretrained model over valid daaset, if needed.
    It save the output metric file into a txt file
        python src/eval.py configs/base.yaml

5) Run the inference:
    python src/inference.py --model checkpoints/best_model.pth

6) Experiments:

    The code is designed in such a way that user can change the modality_cfg into configs/base.yaml 
    to experiment with different modality configs. 

    ------------------|------------|-----------------
    modality            fusion          detection head
    ------------------|------------|-----------------
    camera_only         bevfusion       Centerhead
    lidar_only          bevfusion       Centerhead
    camera+Lidar        bevfusion       Centerhead   
    camera+lidar+radar  bevfusion       Centerhead

This notebook shows the instructions run for camera+Lidar+Radar combnations

In [None]:
! git clone https://github.com/meg89/bevfusion_multimodal_3d_object_detection.git
! cd bevfusion_multimodal_3d_object_detection


Cloning into 'bevfusion_multimodal_3d_object_detection'...
remote: Enumerating objects: 58, done.[K
remote: Counting objects: 100% (58/58), done.[K
remote: Compressing objects: 100% (44/44), done.[K
remote: Total 58 (delta 12), reused 55 (delta 12), pack-reused 0 (from 0)[K
Receiving objects: 100% (58/58), 1.40 MiB | 6.79 MiB/s, done.
Resolving deltas: 100% (12/12), done.


In [None]:

! mkdir data

mkdir: data: File exists


Download the nuscenes mini V1.0 dataset form the site: https://www.nuscenes.org/nuscenes#download. Follow above instructions. 

In [None]:
#convert dataset. It will create nuscenes_infos_test.pkl, nuscenes_infos_val.pkl, 
#nuscenes_infos_rain.pkl files
# run it through the working folder

! python src/data_converter.py --config configs/base.yaml --split train
! python src/data_converter.py --config configs/base.yaml --split val
! python src/data_converter.py --config configs/base.yaml --split test

#validate data
#! python src/data_validate.py --config configs/base.yaml

Loading NuScenes tables for version v1.0-mini...
23 category,
8 attribute,
4 visibility,
911 instance,
12 sensor,
120 calibrated_sensor,
31206 ego_pose,
8 log,
10 scene,
404 sample,
31206 sample_data,
18538 sample_annotation,
4 map,
Done loading in 0.126 seconds.
Reverse indexing ...
Done reverse indexing in 0.0 seconds.
Configuration loaded successfully!
Dataset: nuscenes v1.0-mini
Classes: 10 classes
Cameras: 6
Radars: 5
Converting train split only...

Processing train split...
Collected 283 samples for train split
Saved 283 samples to data/nuscenes/nuscenes_infos_train.pkl

TRAIN Split Statistics:
  Total samples: 283
  Total objects: 11313
  Objects per class:
    car: 4920
    truck: 528
    trailer: 52
    bus: 168
    bicycle: 207
    motorcycle: 320
    pedestrian: 3154
    barrier: 1964

✓ Conversion completed successfully!
Loading NuScenes tables for version v1.0-mini...
23 category,
8 attribute,
4 visibility,
911 instance,
12 sensor,
120 calibrated_sensor,
31206 ego_pose,
8 

In [25]:
# Training pipeline
! python src/encoders.py


EXAMPLE MODEL INSTANTIATION

1. Creating encoders WITHOUT config (direct parameters):
--------------------------------------------------------------------------------

✓ Camera Encoder (ResNet18) created
  Output channels: 512

✓ LiDAR Encoder (PointNet) created
  Input channels: 4
  Feature dimension: 1024

✓ Multi-Radar Encoder created
  Num radars: 5
  Feature dimension: 256

TESTING WITH DUMMY DATA

Camera input shape: torch.Size([2, 3, 3, 448, 800])
Camera output shape: torch.Size([2, 3, 512, 28, 50])

LiDAR input shape: torch.Size([2, 34720, 4])
LiDAR output shape: torch.Size([2, 1024])

Radar input shapes: [torch.Size([2, 125, 7]), torch.Size([2, 125, 7]), torch.Size([2, 125, 7]), torch.Size([2, 125, 7]), torch.Size([2, 125, 7])]
Radar output shape: torch.Size([2, 256])

2. Creating encoders WITH config.yaml:
--------------------------------------------------------------------------------

Trying to load config.yaml...
[DEBUG]: radar_config:  {'type': 'MultiRadar', 'input_chann

In [26]:
! python src/fusion.py


FLEXIBLE MULTI-MODAL 3D DETECTION WITH CONFIG SUPPORT

Supported Configurations:
  Modalities: camera_only, lidar_only, radar_only,
             camera+lidar, camera+radar, lidar+radar,
             camera+lidar+radar (all)
  Fusion: bev, attention, late
  Detection: centernet, mlp

EXAMPLE USAGE

1. Direct Parameters (Original Method):
  model = create_detector('camera_only', 'bev')

2. From Config File (NEW):
  model = create_detector(config_path='config.yaml')

3. Hybrid (Config + Override):
  model = create_detector(config_path='config.yaml', fusion_type='attention')

RUNNING TESTS
TESTING ALL MODALITY CONFIGURATIONS

✓ PASS camera+lidar         + bev       
   Config: camera+lidar_bev_centernet
   Parameters: 52,398,483
   heatmap: torch.Size([2, 10, 50, 50])
   offset: torch.Size([2, 2, 50, 50])
   size: torch.Size([2, 3, 50, 50])
   rot: torch.Size([2, 2, 50, 50])
   vel: torch.Size([2, 2, 50, 50])

✓ PASS camera+lidar         + attention 
   Config: camera+lidar_attention_cent

In [29]:
# for demo only 2 epochs are run

! python src/train_detect.py train configs/base.yaml 

Loading configuration from configs/base.yaml
Multi-Modal 3D Object Detection Training

Configuration:
  data_root: data/nuscenes
  batch_size: 4
  num_epochs: 2
  lr: 0.0001
  weight_decay: 0.01
  num_workers: 4
  device: cpu
  fusion_type: bev
  detection_head: centernet
  num_classes: 10
  checkpoint_dir: ./checkpoints
  log_interval: 1
  save_interval: 1
  use_camera: True
  use_lidar: True
  use_radar: True

Using device: cpu
Loaded 283 samples for train split
Loaded 81 samples for val split
[DEBUG]: radar_config:  {'type': 'MultiRadar', 'input_channels': 7, 'max_points_per_sensor': 125, 'num_radars': 5, 'feature_dim': 256, 'mlp_layers': [32, 64, 128, 256], 'fusion_method': 'concat', 'use_batch_norm': True, 'dropout': 0.1}
[DEBUG]: num_radars:  5
[DEBUG]: Initializing MultiRadarEncoder encoder

Model parameters:
  Total: 55,197,715
  Trainable: 55,197,715

Starting Training


Epoch 1/2
--------------------------------------------------------------------------------
Epoch 1: 100%|██

In [2]:
# evaluation pipeline 
# It will save the eval_results in eval_result folder
#The pretrained model is kept in checkpoints directory. The path is checkpoints/best_model.pth. 
#please change this path in eval.py if want to use different checkpoints

! python src/eval.py configs/base.yaml

Multi-Modal 3D Object Detection Evaluation

Configuration:
  data_root: ./data/nuscenes
  batch_size: 4
  num_epochs: 1
  lr: 0.0001
  weight_decay: 0.01
  num_workers: 4
  device: cpu
  fusion_type: attention
  detection_head: mlp
  num_classes: 10
  checkpoint_dir: ./checkpoints
  log_interval: 10
  save_interval: 5

Using device: cpu
Loaded 81 samples for val split
[DEBUG]: radar_config:  {'type': 'MultiRadar', 'input_channels': 7, 'max_points_per_sensor': 125, 'num_radars': 5, 'feature_dim': 256, 'mlp_layers': [32, 64, 128, 256], 'fusion_method': 'concat', 'use_batch_norm': True, 'dropout': 0.1}
[DEBUG]: num_radars:  5
[DEBUG]: Initializing MultiRadarEncoder encoder
Loaded model from ./checkpoints/best_model.pth

Model parameters:
  Total: 55,197,715
  Trainable: 55,197,715

Running validation...
Evaluating: 100%|███████████████████████████████| 21/21 [01:24<00:00,  4.02s/it]
predictions:  [{'boxes': tensor([[-4.8638e+01, -3.0718e+01, -1.0000e+00,  6.4131e-03,  5.4859e-03,
        

In [None]:
# Inference
# It will save the results in "inference_result folder"

! python src/inference.py --model checkpoints/best_model.pth

Loading dataset...
Loaded 40 samples for test split
INFERENCE ENGINE INITIALIZED
Model: checkpoints/best_model.pth
Config: configs/base.yaml
Device: cpu
Show visualizations: True
Save directory: ./inference_results
Classes: 10

RUNNING INFERENCE

Loading model from checkpoints/best_model.pth...
✓ Model loaded successfully (Epoch: 1)

Raw predictions:
  heatmap: torch.Size([1, 10, 50, 50])
  offset: torch.Size([1, 2, 50, 50])
  size: torch.Size([1, 3, 50, 50])
  rot: torch.Size([1, 2, 50, 50])
  vel: torch.Size([1, 2, 50, 50])

✓ Detected 0 objects (score > 0.3)

Top 0 Detections:
--------------------------------------------------------------------------------

METRICS

Overall:
  True Positives:  0
  False Positives: 0
  False Negatives: 22
  Precision: 0.000
  Recall:    0.000
  F1 Score:  0.000
  Mean IoU:  0.000

Per-class counts:
  car                       GT:  10, Pred:   0
  bus                       GT:   1, Pred:   0
  motorcycle                GT:   2, Pred:   0
  pedestrian 