BrainCode

Project investigating human and artificial neural representations of code.

This branch is currently under development, and should be considered unstable. To replicate specific papers, git checkout the corresponding branch, e.g., NeurIPS2022, and follow instructions in the README.md.

This pipeline supports several major functions.

MVPA (multivariate pattern analysis) evaluates decoding of code properties or code model representations from their respective brain representations within a collection of canonical brain regions.
RSA (representational similarity analysis) is also supported as an alternative to MVPA.
VWEA (voxel-wise encoding analysis) evaluates prediction of voxel-level activation patterns using code properties and code model representations as features.
NLEA (network-level encoding analysis) uses the same features to evaluate encoding of mean network-level activation strength.
PRDA (program representation decoding analysis) evaluates decoding of code properties from code model representations.
PREA (program representation encoding analysis) evaluates encoding of code model representations using the set of code properties explored in this work.

Note: VWEA and NLEA also support ceiling estimates at the network level, calculated via an identical pipeline but with the features being the representations of other participants to the same stimuli rather than the properties extracted from those stimuli. To invoke a ceiling analysis, prefix the requested analysis type with a "C", e.g., CNLEA.

Supported Brain Regions

brain-md_lh (Multiple Demand Network: Left Hemisphere)
brain-md_rh (Multiple Demand Network: Right Hemisphere)
brain-lang_lh (Language Network: Left Hemisphere)
brain-lang_rh (Language Network: Right Hemisphere)

Supported Code Features

Code Properties

task-structure (seq vs. for vs. if) ^*ControlFlow
task-content (math vs. str) ^*DataType
task-nodes (# of nodes in AST) ^*ASTNodes
task-lines (# of runtime steps during execution) ^{*LinesExecuted}

Code Models

Baseline:

code-tokens (arbitrary projection encoding presence of individual tokens)

LLM Suite (CodeGen¹):

code-llm_350m_nl
code-llm_2b_nl
code-llm_6b_nl
code-llm_16b_nl
code-llm_350m_mono
code-llm_2b_mono
code-llm_6b_mono
code-llm_16b_mono

Note: checkpoints vary in size and pre-training (nl—ThePile; mono—ThePile+BigQuery+BigPython)

Installation

Requirements: Anaconda, GNU Make

git clone --branch main --depth 1 https://github.com/benlipkin/braincode
cd braincode
make setup

Run

usage: __main__.py [-h] [-f FEATURE] [-t TARGET] [-m METRIC] [-d CODE_MODEL_DIM] [-p BASE_PATH] [-s] [-b] {mvpa,rsa,vwea,nlea,cvwea,cnlea,prda,prea}

run specified analysis type

positional arguments:
  {mvpa,rsa,vwea,nlea,cvwea,cnlea,prda,prea}

optional arguments:
  -h, --help            show this help message and exit
  -f FEATURE, --feature FEATURE
  -t TARGET, --target TARGET
  -m METRIC, --metric METRIC
  -d CODE_MODEL_DIM, --code_model_dim CODE_MODEL_DIM
  -p BASE_PATH, --base_path BASE_PATH
  -s, --score_only
  -b, --debug

Note: BASE_PATH must be specified to match setup.sh if changed from default.

Sample calls

# basic examples
python -m braincode mvpa -f brain-md_lh -t task-structure # brain -> {task, model}
python -m braincode rsa -f brain-lang_lh -t code-llm_2b_nl # brain <-> {task, model}
python -m braincode vwea -f brain-md_rh -t code-tokens # brain <- {task, model}
python -m braincode nlea -f brain-lang_rh -t task-content # brain <- {task, model}
python -m braincode prda -f code-llm_350m_mono -t task-lines # model -> task
python -m braincode prea -f code-tokens -f task-content # model <- task

# more complex examples
python -m braincode cnlea -f all -m SpearmanRho --score_only # check metrics module for all options
python -m braincode mvpa -f brain-lang_lh+brain-lang_rh -t code-tokens -d 64 -p $BASE_PATH
python -m braincode vwea -t task-content+task-structure+task-nodes+task-lines
# note how `+` operator can be used to join multiple representations via concatenation

Citation

If you use this work, please cite XXX (under review)

Name		Name	Last commit message	Last commit date
Latest commit History 431 Commits
.github/workflows		.github/workflows
analysis		analysis
braincode		braincode
setup		setup
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BrainCode

Supported Brain Regions

Supported Code Features

Installation

Run

Citation

License

About

Releases

Packages

Languages

License

shashank-srikant/braincode

Folders and files

Latest commit

History

Repository files navigation

BrainCode

Supported Brain Regions

Supported Code Features

Installation

Run

Citation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages