<div align="center">
<h1> MetaBCR unveils a germinal center-derived atypical memory B cell subset expressing broadly neutralizing antibodies </h1>


<a href="https://github.com/jianqingzheng/meta_bcr"><img src="https://img.shields.io/github/stars/jianqingzheng/meta_bcr?style=social&label=Code+★" /></a>
\|
[![Explore MetaBCR in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jianqingzheng/meta_bcr/blob/main/meta_bcr.ipynb)
</div>


Code for paper [MetaBCR unveils a germinal center-derived atypical memory B cell subset expressing broadly neutralizing antibodies]()

> This implementation includes the training and inference pipeline of MetaBCR based on PyTorch.

---
#### Contents ####
- 1. Installation
- 2. Usage
  - 2.1. Training (optional)
  - 2.2a. Inference by entering data
  - 2.2b. Batch Inference
- 3. Citing this work
---

In [1]:
#@title 1. Installation {run: "auto"}
#@markdown Clone code from Github repo: https://github.com/jianqingzheng/meta_bcr.git

!git clone https://github.com/jianqingzheng/meta_bcr.git
%cd meta_bcr/

#@markdown and Install packages

#@markdown > `pytorch==1.12.1` was the version originally used, but has changed here due to Colab compatibility issues.\
#@markdown > Other versions of the packages could also be applicable


Cloning into 'meta_bcr'...
remote: Enumerating objects: 179, done.[K
remote: Counting objects: 100% (179/179), done.[K
remote: Compressing objects: 100% (80/80), done.[K
remote: Total 179 (delta 127), reused 132 (delta 94), pack-reused 0 (from 0)[K
Receiving objects: 100% (179/179), 124.74 KiB | 3.37 MiB/s, done.
Resolving deltas: 100% (127/127), done.
/content/meta_bcr


## 2. Usage

\* Setup
```
[$DOWNLOAD_DIR]/meta_bcr/
├── Analysis/
|   └── ...
├── Config/
|   |   # configure file (.json files)
|   └── config_[$data_name].json
|   └── ...
├── Data/
|   ├── /
|   └── ...
├── External/
|   ├── prot_bert/
|   └── ...
├── MetaBCR/
|   ├── /
|   └── ...
├── Models/
|   └── ...
└── ...
```
> Default model can be also downloaded from [Model](https://drive.google.com/drive/folders/1om6Rt9kvjuebvVd3TrouVkCuTKVWYAjX)

### 2.1. Training (optional)
1. Upload the experimental data in `/content/XBCR-net/data/binding/exper/` and the non-experimental data in `/content/XBCR-net/data/binding/nonexp/`

2. Run
```!python main_train.py --model_name XBCR_net --data_name binding --model_num $model_num --max_epochs max_epochs --include_light [1/0]```

3. Check the saved model in `/content/XBCR-net/models/binding/binding-XBCR_net/`

<div align="center">

| Argument              | Description                                	|
| --------------------- | ----------------------------------------------|
| `--data_name` 	| The data folder name                       	|
| `--model_name`        | The used model                      	     	|
| `--model_num`         | The index number of trained model          	|
| `--max_epochs`        | The max epoch number for training 	     	|
| `--include_light`     | 1/0: include/exclude input of a light chain	|

</div>

In [3]:
#@markdown \* Example for training (optional):

data_name = 'flu-bind' #@param ['flu-bind','flu-neu','sars-bind','sars-neu']

!python train_semi_supervise.py --dataset {data_name}

2025-04-20 22:29:47.962312: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1745188188.002084    1632 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1745188188.013972    1632 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-04-20 22:29:48.053324: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO [ Config.config.get_config ] : Parameters from config file
    train_mode: flu-bind
    model: XBCR_ACNN
    batch_sz: 1

### 2.2a. Inference by entering data ###

In [None]:
#@markdown \* Example for a single data point:

HEAVY='VQLVESGGGLVQPGGSLRLSCAASGFTFSSYDMHWVRQTTGKGLEWVSTIGTAGDTYYPDSVKGRFTISREDAKNSLYLQMNSLRAGDTAVYYCARGDSSGYYYYFDYWGQGTLLTVSS' #@param {type:"string"}
LIGHT='DIEMTQSPSSLSAAVGDRVTITCRASQSIGSYLNWYQQKPGKAPKLLIYAASSLQSGVPSRFSGSGSGTDFTLTISSLQPEDFAIYYCQQSYVSPTYTFGPGTKVDIK'      #@param {type:"string"}
ANTIG='RVQPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNF' #@param {type:"string"}

#@markdown > Leave `LIGHT=""` or `LIGHT="_"` to exclude the input of light chain.

if LIGHT=='' or LIGHT=='_' or LIGHT=='*' or LIGHT==',':
  LIGHT = '_'

!python pred_bcr.py --heavy $HEAVY --light {LIGHT} --antig $ANTIG --model_name XBCR_net --data_name binding --model_num 0


In [None]:
#@markdown \* Example for multiple data points (split by ','):

HEAVY='VQLVESGGGLVQPGGSLRLSCAASGFTFSSYDMHWVRQTTGKGLEWVSTIGTAGDTYYPDSVKGRFTISREDAKNSLYLQMNSLRAGDTAVYYCARGDSSGYYYYFDYWGQGTLLTVSS,EVQLVESGGGLVQPGGSLRLSCAASGFTFNNYWMSWVRQAPGKGLEWVANINQDGSEKYYVDSVMGRFAISRDNAKNSLYLQMNSLRAEDTAVYYCARDQGYGDYFEYNWFDPWGQGTLVTVSS' #@param {type:"string"}
LIGHT='DIEMTQSPSSLSAAVGDRVTITCRASQSIGSYLNWYQQKPGKAPKLLIYAASSLQSGVPSRFSGSGSGTDFTLTISSLQPEDFAIYYCQQSYVSPTYTFGPGTKVDIK,DIQLTQSPSFLSASVGDRVTITCRASQGIYSYLAWYQQKPGKAPKLLIYAASTLQSGVPSRFSGSGSGTEFTLTISSLQPEDFATYYCQQLNSYPITFGQGTRLEIK' #@param {type:"string"}
ANTIG='RVQPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNF,RVQPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNF' #@param {type:"string"}

#@markdown > Set `LIGHT="XXX, ,XXX"` or `LIGHT="XXX,_,XXX"` to selectively exclude the input of light chains.\
#@markdown > Spaces (' ' or '_') and carriage returns ('\n') are not recognized as a part of sequence.

if LIGHT=='' or LIGHT=='_' or LIGHT=='*' or LIGHT==',':
  LIGHT = '_'

!python pred_bcr.py --heavy $HEAVY --light $LIGHT --antig $ANTIG --model_name XBCR_net --data_name binding --model_num 0


<div align="center">

| Argument              | Description                                	|
| --------------------- | ----------------------------------------------|
| `--heavy` 		| The heavy chain           			|
| `--light` 		| The light chain                       	|
| `--antig` 		| The antigen                       		|
| `--data_name` 	| The data folder name                       	|
| `--data_name` 	| The data folder name                       	|
| `--model_name`        | The used model                      	     	|
| `--model_num`         | The index number of the used model         	|

</div>

### 2.2b. Batch Inference ###
1. Upload the antibody files in `/content/XBCR-net/data/binding/ab_to_pred/` and the antigen files in `/content/XBCR-net/data/binding/ag_to_pred/`

2. Run
```!python main_infer.py --model_name XBCR_net --data_name binding --model_num $model_num --include_light [1/0]```

3. Download the result excel file from `/content/XBCR-net/data/binding/test/results/*`

<div align="center">

| Argument              | Description                                	|
| --------------------- | ----------------------------------------------|
| `--data_name` 	| The data folder name                       	|
| `--model_name`        | The used model                      	     	|
| `--model_num`         | The index number of trained model          	|
| `--include_light`     | 1/0: include/exclude input of a light chain	|

</div>

In [None]:
#@markdown \* Example for batch inference:

model_name = 'XBCR_net' #@param {type:"string"}
data_name = 'binding' #@param {type:"string"}
model_num = 0     #@param {type:"integer"}
include_light = True #@param {type:"boolean"}
include_light = int(include_light)

!python main_infer.py --model_name {model_name} --data_name {data_name} --model_num {model_num} --include_light {include_light}


In [None]:
#@markdown \* Download the result file from `/content/XBCR-net/data/binding/test/results/`.

from google.colab import files
import os
download_path = os.path.join('data',data_name,'test','results','results_rbd_'+model_name+'-'+str(model_num)+'.xlsx')
files.download(download_path)
print('Download the file: '+download_path)

## 3. Citing this work

Any publication that discloses findings arising from using this source code or the network model should cite

- Hantao Lou, Jianqing Zheng, Xiaohang Leo Fang, Zhu Liang, Meihan Zhang, Yu Chen, Chunmei Wang, Xuetao Cao, "Deep learning-based rapid generation of broadly reactive antibodies against SARS-CoV-2 and its Omicron variant." *Cell Research* 33.1 (2023): 80-82.

```bibtex
@article{lou2022deep,
  title={Deep learning-based rapid generation of broadly reactive antibodies against SARS-CoV-2 and its Omicron variant},
  author={Lou, Hantao and Zheng, Jianqing and Fang, Xiaohang Leo and Liang, Zhu and Zhang, Meihan and Chen, Yu and Wang, Chunmei and Cao, Xuetao},
  journal={Cell Research},
  pages={1--3},
  year={2022},
  publisher={Nature Publishing Group},
  doi={10.1038/s41422-022-00727-6},
}
```
