We conceptualize a wavelet attention-like backbone together with ray-based encoding technique for Human-Object Interaction Detection. The proposed mechanism delivers a competitive result with better efficiency.
This repository contains the PyTorch implementation.
In this section, we provide instructions for ImageNet classification experiments.
Create a new conda environment
conda create -y -n ray-encoder python=3.12
conda activate ray-encoder
Install Pytorch>=2.4.0, torchvision>= 0.19.0 following official instructions. For example:
conda install -y pytorch==2.4.0 torchvision==0.19.0 pytorch-cuda=11.8 -c pytorch -c nvidia
Clone this repo and install required packages.
git clone https://github.com/henry-pay/SpaRTAN.git
conda install -y timm
conda install -y hydra-core
conda install -y cupy pkg-config libjpeg-turbo opencv numba
pip install ffcv
Download the ImageNet-1k classification dataset and structure the data as follows. You can extract ImageNet with this script.
│imagenet/
├──train/
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
│ │ ├── ......
│ ├── ......
├──val/
│ ├── n01440764
│ │ ├── ILSVRC2012_val_00000293.JPEG
│ │ ├── ILSVRC2012_val_00002138.JPEG
│ │ ├── ......
│ ├── ......
Place the imagenet dataset under data directory within the repository.
│RayEncoder/
├──data/
│ ├── imagenet/
│ │ ├──train/
│ │ ├──val/
├──src/
We provide ImageNet-1k training commands here.
Taking Wavelet+3 Rays as an example, you can use the following command to run the experiment on a single machine (4 GPUs)
OMP_NUM_THREADS=8 torchrun --nproc-per-node=4 image-classification/main.py
- Batch size scaling. The effective batch size is equal to
--nproc-per-node*batch_size(which is specified in the dataset config). In the provided config file, the effective batch size is4*512=2048. Running on machine, we can reducebatch_sizeand setuse_ampflag in the config to avoid OOM issues while keeping the total batch size unchanged. - OMP_NUM_THREADS is the easiest switch that can be used to accelerate computations. It determines number of threads used for OpenMP computations. Details can be found in documentation.
To train other model variants, parameters within the config need to be changed.
The experiment is carried out using FGAHOI. Please refer to the corresponding repository for installation and dataset preparation instructions. The training experiments can be run based on the given instructions in FGAHOI by replacing the models directory with given models
This project is licensed under the Apache 2.0 License
If you find this repository helpful, please consider citing:
@inproceedings{
title={Conceptualizing Multi-scale Wavelet Attention and Ray-based Encoding for Human-Object Interaction Detection},
author={Pay, Quan Bi and Baskaran, Vishnu Monn and Loo, Junn Yong and Wong, KokSheik and See, Simon},
booktitle={2025 International Joint Conference on Neural Networks (IJCNN)},
pages={to appear},
year={2025},
organization={IEEE},
note={Accepted}
}