Skip to content

VITA-Group/Simple3D-Former

Repository files navigation

Simple3D-Former

This is the official repo for Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer?

Perquisitive

Environment Setup

It is tested in python 3.7 with the following packages as minimal support:

  • einops==0.3.0
  • linformer==0.2.1
  • torch==1.7.1
  • torchvision==0.8.2
  • tqdm
  • hydra==2.5
  • hydra-core==1.1.1
  • omegaconf==2.1.1
  • h5py
  • plyfile

In addition, since DeIT heavily depends on timm, make sure you have

pip install timm==0.3.2

We provide a simple requirements.txt to install the library (with full package lists provided) with pip as well, by excecuting;

pip install -r requirements.txt

DataSet Preparation

Currently ShapeNetV2/ModelNet40/ShapeNetPart are required. The teacher dataset is the ImageNet validation set (in ImageNet 1K).

Then extract all files in ./data/ folder in current project. You can modify config files under ./config/ for a specific data location (especially if you are downloading full ImageNet instead of this particular subset). In addition, after downloading ModelNet40, you need to create all *.binvox file by doing:

cd data/
python binvox_convert.py ModelNet40/ --remove-all-dupes

How to run

Voxel Classification

Run train_cls_voxel.py script. The default usage of this script is:

python train_cls_voxel.py 

To reproduce, one needs to enumerate configurations of backbone and positional embeddings, as well as dataset. We provide two examples of ModelNet40 and ShapeNetV2 respectively:

python train_cls_voxel.py --data-root ./data/ModelNet40 --batchSize 64 --pretrained --lwf --epochs 100 --gpus 1 --dataset ModelNet40 --transformer-name deit_small_patch16_224 --outf ./cls/ --pos-embedding default --embed-layer VoxelEmbed --cell-size 6 --patch-size 5 --lr 1e-3
python train_cls_voxel.py --data-root ./data/ShapeNetCore_v2 --batchSize 64 --pretrained --lwf --epochs 100 --gpus 1 --dataset ShapeNetV2 --transformer-name deit_base_patch16_224 --outf ./cls/ --pos-embedding group_embed --embed-layer VoxelEmbed_no_average --cell-size 9 --patch-size 14 --lr 1e-3

The configuration --pos-embedding and --embed-layer shall match. Three different tokenized scheme refers to:

  • Naive Tokenize: --pos-embedding default --embed-layer VoxelEmbed;
  • 2D Projection: --pos-embedding default --embed-layer VoxelEmbed_no_average;
  • Group Embedding: --pos-embedding group_embed --embed-layer VoxelEmbed_no_average;

Point Cloud Tasks

These part of scripts is adapted from https://github.com/qq456cvb/Point-Transformers One can modify ./config files to adjust parameters. To run the script, simply run:

  • Point Cloud Classification: python train_cls.py or python train_cls_scanobjectnn.py;
  • Point Cloud Part Segmentation: python train_partseg.py;
  • Point Cloud Part Segmentation with 2D knowledge: python train_partseg_lwf.py
  • Point Cloud Object Detection (TBD)

About

[Preprint 2022] “Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer?” by Yi Wang, Zhiwen Fan, Tianlong Chen, Hehe Fan, Zhangyang Wang

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages