Simple3D-Former

This is the official repo for Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer?

Perquisitive

Environment Setup

It is tested in python 3.7 with the following packages as minimal support:

einops==0.3.0
linformer==0.2.1
torch==1.7.1
torchvision==0.8.2
tqdm
hydra==2.5
hydra-core==1.1.1
omegaconf==2.1.1
h5py
plyfile

In addition, since DeIT heavily depends on timm, make sure you have

pip install timm==0.3.2

We provide a simple requirements.txt to install the library (with full package lists provided) with pip as well, by excecuting;

pip install -r requirements.txt

DataSet Preparation

Currently ShapeNetV2/ModelNet40/ShapeNetPart are required. The teacher dataset is the ImageNet validation set (in ImageNet 1K).

ShapeNetV2: Download it from here: https://shapenet.cs.stanford.edu/shapenet/obj-zip/ShapeNetCore.v2.zip;
ModelNet40: Download it from here: https://modelnet.cs.princeton.edu/ModelNet40.zip;
ModelNet40 point cloud samples: Download it from here: https://shapenet.cs.stanford.edu/media/modelnet40_normal_resampled.zip;
ShapeNetPart: Download it from here: https://shapenet.cs.stanford.edu/media/shapenetcore_partanno_segmentation_benchmark_v0_normal.zip;
ImageNet 1K: Download it from kaggle: https://www.kaggle.com/c/imagenet-object-localization-challenge/data;
ScanObjectNN: Download it from here: http://103.24.77.34/scanobjectnn/h5_files.zip

Then extract all files in ./data/ folder in current project. You can modify config files under ./config/ for a specific data location (especially if you are downloading full ImageNet instead of this particular subset). In addition, after downloading ModelNet40, you need to create all *.binvox file by doing:

cd data/
python binvox_convert.py ModelNet40/ --remove-all-dupes

How to run

Voxel Classification

Run train_cls_voxel.py script. The default usage of this script is:

python train_cls_voxel.py

To reproduce, one needs to enumerate configurations of backbone and positional embeddings, as well as dataset. We provide two examples of ModelNet40 and ShapeNetV2 respectively:

python train_cls_voxel.py --data-root ./data/ModelNet40 --batchSize 64 --pretrained --lwf --epochs 100 --gpus 1 --dataset ModelNet40 --transformer-name deit_small_patch16_224 --outf ./cls/ --pos-embedding default --embed-layer VoxelEmbed --cell-size 6 --patch-size 5 --lr 1e-3

python train_cls_voxel.py --data-root ./data/ShapeNetCore_v2 --batchSize 64 --pretrained --lwf --epochs 100 --gpus 1 --dataset ShapeNetV2 --transformer-name deit_base_patch16_224 --outf ./cls/ --pos-embedding group_embed --embed-layer VoxelEmbed_no_average --cell-size 9 --patch-size 14 --lr 1e-3

The configuration --pos-embedding and --embed-layer shall match. Three different tokenized scheme refers to:

Naive Tokenize: --pos-embedding default --embed-layer VoxelEmbed;
2D Projection: --pos-embedding default --embed-layer VoxelEmbed_no_average;
Group Embedding: --pos-embedding group_embed --embed-layer VoxelEmbed_no_average;

Point Cloud Tasks

These part of scripts is adapted from https://github.com/qq456cvb/Point-Transformers One can modify ./config files to adjust parameters. To run the script, simply run:

Point Cloud Classification: python train_cls.py or python train_cls_scanobjectnn.py;
Point Cloud Part Segmentation: python train_partseg.py;
Point Cloud Part Segmentation with 2D knowledge: python train_partseg_lwf.py
Point Cloud Object Detection (TBD)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
config		config
data		data
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
global_var.py		global_var.py
provider.py		provider.py
requirements.txt		requirements.txt
shapenet_synset_dict_v1.json		shapenet_synset_dict_v1.json
shapenet_synset_dict_v2.json		shapenet_synset_dict_v2.json
train_cls.py		train_cls.py
train_cls_scanobjectnn.py		train_cls_scanobjectnn.py
train_cls_voxel.py		train_cls_voxel.py
train_partseg.py		train_partseg.py
train_partseg_lwf.py		train_partseg_lwf.py
train_pure_mlp.py		train_pure_mlp.py
train_s3dis_semseg.py		train_s3dis_semseg.py
visualize_attention_map_voxel.py		visualize_attention_map_voxel.py
visualize_point_cloud.py		visualize_point_cloud.py

License

VITA-Group/Simple3D-Former

Folders and files

Latest commit

History

Repository files navigation

Simple3D-Former

Perquisitive

Environment Setup

DataSet Preparation

How to run

Voxel Classification

Point Cloud Tasks

About

Resources

License

Stars

Watchers

Forks

Languages