Official codebase for Know Thyself by Knowing Others: Learning Neuron Identity from Population Context .
Note
We will be updating and cleaning this repository regularly until this notice is up. Apologies for the inconvenience.
This project has been developed on Python3.10, and uses venv to manage the environment.
Use the following utility script to create an environment and install all requirements:
source utils/venv_setup.sh1. Preprocessing datasets
Please follow the steps in preprocess/README.md
2. Downloading neuron metadata
Download metadata (csv files) about neurons in all four datasets from this
link
and unzip into ./neuron_metadata
3. Training To train on ephys. datasets (IBL, Allen, Steinmetz et. al.):
python train.py --config-name train_ephys \
data=<data-config> \
batch_size=128 \
num_epochs=<num_epochs>- Options for
<data-config>can be found inconfigs/data/*.yaml. E.g.data=ibl_bwm_probes_dev - Set
num_epochssuch that the total number of training steps is roughly 50,000. - The checkpoints would be stored in
../ckptby default. - Other available configurations can be found in
configs/train_ephys.yaml
To train on calcium imaging data (Bugeon et. al.):
python train.py --config-name train_ca \
data=<data-config> batch_size=128 num_epochs=<num_epochs>- Options for
<data-config>can be found inconfigs/data/*.yaml. E.g.data=bugeon_dev - Set
num_epochssuch that the total number of training steps is roughly 50,000. - The checkpoints would be stored in
../ckptby default. - Other available configurations can be found in
configs/train_ca.yaml
4. Forward pass for final embeddings A final forward pass over the entire data is needed to get the embeddings from a particular checkpoint. The training script would print a "run_id" for the corresponding run. Use this to run the follwing command:
bash utils/forward_all_epochs.sh <run_id> <data-config-name> [batch_size] [epoch_stride]This would store the embeddings in ../embs/<run_id>/embs_epoch_*.pt depending on
the run_id and epoch number of the checkpoints used.
In most cases, you would want to use the "transductive" versions of each dataset, since we
want to compute embeddings for all neurons here.
5. Run evaluation on the produced embeddings
Evaluation notebooks are present and documented in the eval_notebooks/ directory.
If you find this repository useful in your research, please consider giving a star ⭐ and a citation
@inproceedings{
arora2025nuclr,
title={Know Thyself by Knowing Others: Learning Neuron Identity from Population Context},
author={Vinam Arora and Divyansha Lachi and Ian J Knight and Mehdi Azabou and Blake Richards and Cole Hurwitz and Joshua H Siegle and Eva L Dyer},
booktitle={Thirty-ninth Conference on Neural Information Processing Systems},
year={2025},
url={https://neurips.cc/virtual/2025/poster/115008}
}