GitHub - henry-pay/RayEncoder

Conceptualizing Multi-scale Wavelet Attention and Ray-based Encoding for Human-Object Interaction Detection (IJCNN 2025)

We conceptualize a wavelet attention-like backbone together with ray-based encoding technique for Human-Object Interaction Detection. The proposed mechanism delivers a competitive result with better efficiency.

This repository contains the PyTorch implementation.

Image Classification

1. Installation

In this section, we provide instructions for ImageNet classification experiments.

1.1 Dependency Setup

Create a new conda environment

conda create -y -n ray-encoder python=3.12
conda activate ray-encoder

Install Pytorch>=2.4.0, torchvision>= 0.19.0 following official instructions. For example:

conda install -y pytorch==2.4.0 torchvision==0.19.0 pytorch-cuda=11.8 -c pytorch -c nvidia

Clone this repo and install required packages.

git clone https://github.com/henry-pay/SpaRTAN.git
conda install -y timm 
conda install -y hydra-core 
conda install -y cupy pkg-config libjpeg-turbo opencv numba
pip install ffcv

1.2 Dataset Preparation

Download the ImageNet-1k classification dataset and structure the data as follows. You can extract ImageNet with this script.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Place the imagenet dataset under data directory within the repository.

│RayEncoder/
├──data/
│  ├── imagenet/
│  │   ├──train/
│  │   ├──val/
├──src/

2. Training

We provide ImageNet-1k training commands here.

Taking Wavelet+3 Rays as an example, you can use the following command to run the experiment on a single machine (4 GPUs)

OMP_NUM_THREADS=8 torchrun --nproc-per-node=4 image-classification/main.py

Batch size scaling. The effective batch size is equal to --nproc-per-node * batch_size (which is specified in the dataset config). In the provided config file, the effective batch size is 4*512=2048. Running on machine, we can reduce batch_size and set use_amp flag in the config to avoid OOM issues while keeping the total batch size unchanged.
OMP_NUM_THREADS is the easiest switch that can be used to accelerate computations. It determines number of threads used for OpenMP computations. Details can be found in documentation.

To train other model variants, parameters within the config need to be changed.

Human-Object Interaction Detection

The experiment is carried out using FGAHOI. Please refer to the corresponding repository for installation and dataset preparation instructions. The training experiments can be run based on the given instructions in FGAHOI by replacing the models directory with given models

License

This project is licensed under the Apache 2.0 License

Citation

If you find this repository helpful, please consider citing:

@inproceedings{
    title={Conceptualizing Multi-scale Wavelet Attention and Ray-based Encoding for Human-Object Interaction Detection},
    author={Pay, Quan Bi and Baskaran, Vishnu Monn and Loo, Junn Yong and Wong, KokSheik and See, Simon},
    booktitle={2025 International Joint Conference on Neural Networks (IJCNN)},
    pages={to appear},
    year={2025},
    organization={IEEE},
    note={Accepted}
}

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
hoi-detection/models		hoi-detection/models
image-classification		image-classification
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conceptualizing Multi-scale Wavelet Attention and Ray-based Encoding for Human-Object Interaction Detection (IJCNN 2025)

Image Classification

1. Installation

1.1 Dependency Setup

1.2 Dataset Preparation

2. Training

Human-Object Interaction Detection

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Conceptualizing Multi-scale Wavelet Attention and Ray-based Encoding for Human-Object Interaction Detection (IJCNN 2025)

Image Classification

1. Installation

1.1 Dependency Setup

1.2 Dataset Preparation

2. Training

Human-Object Interaction Detection

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages