Accepted KDD 2023: https://arxiv.org/pdf/2308.00454.pdf
EEGViT is a hybrid Vision Transformer (ViT) incorporated with Depthwise Convolution in patch embedding layers. This work is based on Dosovitskiy, et al.'s "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale". After finetuning EEGViT pretrained on ImageNet, it achieves a considerable improvement over the SOTA on the Absolute Position task in EEGEyeNet dataset.
This repository consists of four models: ViT pretrained and non-pretrained; EEGViT pretrained and non-pretrained. The pretrained weights of ViT layers are loaded from huggingface.co.
Download data for EEGEyeNet absolute position task
wget -O "./dataset/Position_task_with_dots_synchronised_min.npz" "https://osf.io/download/ge87t/"
For more details about EEGEyeNet dataset, please refer to "EEGEyeNet: a Simultaneous Electroencephalography and Eye-tracking Dataset and Benchmark for Eye Movement Prediction" and OSF repository
First install the general_requirements.txt
pip3 install -r general_requirements.txt
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
For other installation details and different cuda versions, visit pytorch.org.