Skip to content

giocoal/algonauts2023-image-fMRI-encoding-model

Repository files navigation

Code for the Master's Thesis "Deep Neural Encoding Models of the Human Visual Cortex to Predict fMRI Responses to Natural Visual Scenes".

Research Internship - MSc in Data Science - University of Milano-Bicocca - Imaging and Vision Laboratory.

This is a repository for my submission for the Algonauts Project 2023 Challenge (id: giorgiocarbone).

Contributors Forks Stargazers Issues MIT License LinkedIn Slides Thesis

Table of contents

Introduction

One of the main objectives of computational neuroscience is to comprehend the biological mechanisms that enable humans to perceive, process, and understand complex visual scenes. Visual neural encoding models are computational models that mimic the hierarchical processes underlying the human visual system and aim to explain the relationship between visual stimuli and corresponding neural activations evoked in the human visual cortex. A visual encoder can serve as a structured system for testing biological hypotheses concerning how visual information is processed, represented, and organized in the human brain.

The main objective of this thesis is to develop a comprehensive voxel-based and subject-specific image-fMRI neural encoding model of the human visual cortex based on Deep Neural Networks (DNNs) and transfer learning for the prediction of local neural blood oxygen level-dependent (BOLD) functional magnetic resonance imaging (fMRI) responses to complex visual stimuli.

We applied a two-step linearizing strategy to visual encoding, based on the use of two separate computational models respectively for the non-linear feature mapping (employing pre-trained computer vision DNNs as feature extractors) of the stimulus image into its latent representations and the subsequent linear activity mapping of the visual features into the BOLD response amplitudes of the individual voxels, using Principal Component Analysis (PCA) to reduce the dimensionality of the visual features and independent ridge regression models to map the PCA components in the activity of each voxel.

Furthermore, in order to meet the criteria of mappability and predictivity that characterize a good encoding model, we adopted a ROI-wise and mixed encoding strategy, modeling the encoding of voxels belonging to different regions of interest (ROIs, groups of voxels that share functional properties) separately to achieve maximum accuracy across the entire visual cortex and within individual ROIs. To determine the best feature mapping method for each region of interest, we tested the extraction of visual features from layers at varying depths of several pre-trained Convolutional Neural Networks (AlexNet, ZFNet, RetinaNet, EfficientNet-B2, VGG-16, VGG-19) and Vision Transformers (ViTs), characterized by different training parameters (training goal, training dataset, and learning method). During this testing phase, the existence of similarity and functional alignment between the hierarchical architecture of the pre-trained DNNs and the structure of the visual cortex emerged, a result that motivated the use of the ROI-wise strategy.

The proposed model achieves, in predicting the neural responses to the images of the test set of the Algonauts Project 2023 Challenge dataset, an overall accuracy score of 0.52, expressed as the Median Noise Normalized Squared Correlation (MNNSC) across all voxels of the cortical surfaces of all subjects, outperforming the baseline model proposed by the challenge organizers (which achieved a score of 0.41). The results of this thesis demonstrate the effectiveness of mixed, ROI-wise, deep, and transfer learning-based approaches in the context of image-fMRI visual encoding modeling.

Dataset

The thesis project was developed using the Algonauts Project 2023 Challenge dataset, a large collection of eight subjects' fMRI responses to visual scenes. During the fMRI scans, each subject viewed 9,000-10,000 colored natural scenes, and the corresponding activations for the 39,548 voxels of the visual cortex were encoded as betas, which are single-value estimates of the amplitude of the BOLD fMRI response, indirectly representing the activation or deactivation of the neurons in a specific voxel evoked by viewing a stimulus.

Requirements

  • Python 3.9.16
  • CUDA Toolkit 11.6
  • CuDNN 8302
  • Pillow 9.2.0
  • NiBabel 5.2.0
  • Nilearn 0.10.3
  • Plotly 5.14.1
  • torch 1.13.0
  • torchvision 0.14.0
  • Transformers 4.31.0
  • PyTorchCV 0.0.67
  • EfficientNet-PyTorch 0.7.1
  • matplotlib 3.5.2
  • numpy 1.22.4
  • pandas 1.5.3
  • scikit_learn 1.1.1
  • scipy 1.7.3
  • tqdm 4.64.1
  • torchmetrics 0.11.4
  • plotly 5.14.1

Status

Project is: ##c5f015 Done

Contact

Feel free to contact me!

About

Code for my Master's Thesis "Deep Neural Encoding Models of the Human Visual Cortex to Predict fMRI Responses to Natural Visual Scenes" and my submission for the "Algonauts Project 2023 Challenge".

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages