NuCLR

Official codebase for Know Thyself by Knowing Others: Learning Neuron Identity from Population Context .

Note

We will be updating and cleaning this repository regularly until this notice is up. Apologies for the inconvenience.

Usage

This project has been developed on Python3.10, and uses venv to manage the environment. Use the following utility script to create an environment and install all requirements:

source utils/venv_setup.sh

1. Preprocessing datasets Please follow the steps in preprocess/README.md

2. Downloading neuron metadata Download metadata (csv files) about neurons in all four datasets from this link and unzip into ./neuron_metadata

3. Training To train on ephys. datasets (IBL, Allen, Steinmetz et. al.):

python train.py --config-name train_ephys \
	data=<data-config> \
	batch_size=128 \
	num_epochs=<num_epochs>

Options for <data-config> can be found in configs/data/*.yaml. E.g. data=ibl_bwm_probes_dev
Set num_epochs such that the total number of training steps is roughly 50,000.
The checkpoints would be stored in ../ckpt by default.
Other available configurations can be found in configs/train_ephys.yaml

To train on calcium imaging data (Bugeon et. al.):

python train.py --config-name train_ca \
	data=<data-config> batch_size=128 num_epochs=<num_epochs>

Options for <data-config> can be found in configs/data/*.yaml. E.g. data=bugeon_dev
Set num_epochs such that the total number of training steps is roughly 50,000.
The checkpoints would be stored in ../ckpt by default.
Other available configurations can be found in configs/train_ca.yaml

4. Forward pass for final embeddings A final forward pass over the entire data is needed to get the embeddings from a particular checkpoint. The training script would print a "run_id" for the corresponding run. Use this to run the follwing command:

bash utils/forward_all_epochs.sh <run_id> <data-config-name> [batch_size] [epoch_stride]

This would store the embeddings in ../embs/<run_id>/embs_epoch_*.pt depending on the run_id and epoch number of the checkpoints used. In most cases, you would want to use the "transductive" versions of each dataset, since we want to compute embeddings for all neurons here.

5. Run evaluation on the produced embeddings Evaluation notebooks are present and documented in the eval_notebooks/ directory.

Citation

If you find this repository useful in your research, please consider giving a star ⭐ and a citation

@inproceedings{
    arora2025nuclr,
    title={Know Thyself by Knowing Others: Learning Neuron Identity from Population Context},
    author={Vinam Arora and Divyansha Lachi and Ian J Knight and Mehdi Azabou and Blake Richards and Cole Hurwitz and Joshua H Siegle and Eva L Dyer},
    booktitle={Thirty-ninth Conference on Neural Information Processing Systems},
    year={2025},
    url={https://neurips.cc/virtual/2025/poster/115008}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NuCLR

Usage

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
configs		configs
eval_notebooks		eval_notebooks
preprocess		preprocess
src		src
utils		utils
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

License

nerdslab/nuclr

Folders and files

Latest commit

History

Repository files navigation

NuCLR

Usage

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages