# A Hybrid Transformer Architecture with a Quantized Self-Attention Mechanism Applied to Molecular Generation.

## Summary
The success of the self-attention mechanism in classical machine learning models has inspired the development of quantum analogs aimed at reducing computational overhead. Self-attention integrates learnable query and key matrices to calculate attention scores between all pairs of tokens in a sequence that are then multiplied with a learnable value matrix to obtain the output self-attention matrix, enabling the model effectively capture long-range dependencies within the input sequence. Here, we propose a hybrid quantum-classical self-attention mechanism as part of a transformer decoder, the architecture underlying Large Language Models (LLMs). To demonstrate its utility in chemistry, we train this model on the QM9 dataset for conditional generation, providing SMILES strings as input, each labeled with a set of physicochemical properties that serve as conditions during inference. Our theoretical analysis shows that the time complexity of the query-key dot product is reduced from $\mathcal{O}(n^2 d)$ in a classical model to $\mathcal{O}(n^2\log d)$ in our quantum model, where $n$  and $d$ represent the sequence length and embedding dimension, respectively. We perform simulations using NVIDIA's CUDA-Q platform, which is designed for efficient GPU scalability. This work provides an avenue for quantum-enhanced Natural Language Processing (NLP).

## Installation

**Note**: If running on Google Colab, you must be connected to a GPU runtime to ensure this notebook works properly.

In [None]:
!git clone https://github.com/anthonysmaldone/Quantum-Transformer.git
%pip install cudaq==0.9.1
%pip install rdkit==2024.9.4
%pip install torch==2.5.1+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
%pip install pandas==2.2.2
%pip install torchdata==0.10.1
%pip install tqdm==4.67.1
%pip install scikit-learn==1.5.1
%pip install seaborn==0.13.2
%pip install gdown==5.2.0

%cd Quantum-Transformer

## Reproducing Results & Figures

This repository provides a script, `reproduce.py`, to facilitate the reproduction of key results from the paper, including model training, figure generation, and inference.

The trained checkpoint files for the best and last (20th) epoch for each models are located in `model_checkpoints/`. The figures and inference reproducability functions uses these models to recreate the data from the paper.

### **1. Generate Figures**
To generate figures used in the paper, use the --figures flag. No additional options are required.

In [None]:
!python reproduce.py --figures

### **2. Run Inference**

To perform inference on a trained model, use the `--inference-results` flag along with the required `--model` and `--mode options`.

Model Options (--model)
- quantum – Train the quantum model.
- classical_eq – Train the classical equivalent model.
- classical – Train the standard classical model.
  
Training Mode Options (--mode)
- sequence – Train using sequence-only data.
- conditions – Train using both sequence and condition data.

```
python reproduce.py --inference-results --model <MODEL_TYPE> --mode <MODE>
```

### Examples
- Run inference on the pre-trained quantum model with SMILES data only:

In [None]:
!python reproduce.py --inference-results --model classical --mode sequence

- Run inference on the pre-trained classical model with SMILES and physicochemical properties:

In [None]:
!python reproduce.py --inference-results --model quantum --mode conditions

### **3. Train Models**
To re-train a model, use the `--train-models` flag along with the required `--model` and `--mode` options as explain above.

```
python reproduce.py --train-models --model <MODEL_TYPE> --mode <MODE>
```

**Note**: These models were trained on Perlmutter using 4 NVIDIA A100 GPUs, and exact numerical reproducability is not garunteed across hardware architectures. Thus, results may slightly differ when running inference and training reproducability functions. 

## Usage
A notebook tutorial can be found here (coming soon) and documentation on available functions can be found in `docs/`.

We have provided the pretrained model files to be used for inference.

### Inference

If a model was trained with molecular property embeddings, the `generate_smiles` function allows it to be sampled from conditionally and grants us the option to specify molecular properties. The following properties can be specified:

- MW (molecular weight)
- HBA (number of hydrogen bond acceptors) 
- HBD (number of hydrogen bond donors) 
- nRot (number of rotatable bonds) 
- nRing (number of rings)
- nHet (number of heteroatoms)
- TPSA (topological polar surface area)
- LogP (octanol-water partition coefficent)
- StereoCenters (number of stereocenter)

Below is an example of two sampling experiments where molecules are sampled to have a target weight of 120 from the quantum model:

In [None]:
from src.analysis import generate_smiles

valid, unique, novel= generate_smiles(
                        checkpoint_path="./model_checkpoints/quantum_conditions/model_epoch_20.pt",
                        save_dir="./generated_molecules/quantum_conditions_MW_120.csv",
                        num_of_model_queries=1000,
                        sampling_batch_size=250,
                        imputation_dataset_path="./dataset/train_dataset.csv",
                        dataset_novelty_check_path="./dataset/train_dataset.csv",
                        device="gpu",
                        MW=120,
                    )

import pandas as pd
df = pd.read_csv('./generated_molecules/quantum_conditions_MW_120.csv')
df.head()

### Training

The `train_transformer` function trains a transformer model with either classical or quantum attention mechanisms. It supports various hyperparameters to customize training, including learning rate, batch size, number of epochs, and quantum-specific settings such as the number of qubits and quantum gradient calculation.

During the training, the parameters for each epoch are saved in the specified checkpoint_dir, along with `most_recent_batch.pt` if the `save_every_n_batches` argument is specified.

The below example runs a fully classical model where the number of parameters is equal to the number of parameters used in the quantum model. This is triggered by setting the `classical_parameter_reduction` argument to `True`.

In [None]:
from src.train import train_transformer

train_transformer(
    training_data="./dataset/qm9.csv",
    checkpoint_dir="./checkpoints/classical_example/",
    checkpoint_resume_path=None,
    learning_rate=0.005,
    weight_decay=0.1,
    batch_size=256,
    epochs=3,
    save_every_n_batches=0,
    validation_split=0.05,
    attn_type="classical",
    num_qubits=6,
    ansatz_layers=1,
    conditional_training=False,
    quantum_gradient_method="spsa",
    spsa_epsilon=0.01,
    sample_percentage=1.0,
    seed=42,
    classical_parameter_reduction=True,
    device="gpu",
    qpu_count=-1,
)

To train the model with molecular property embeddings, `conditional_training` gets set to `True`:

In [None]:
from src.train import train_transformer

train_transformer(
    training_data="./dataset/qm9.csv",
    checkpoint_dir="./checkpoints/classical_example_conditions/",
    checkpoint_resume_path=None,
    learning_rate=0.005,
    weight_decay=0.1,
    batch_size=256,
    epochs=3,
    save_every_n_batches=0,
    validation_split=0.05,
    attn_type="classical",
    num_qubits=6,
    ansatz_layers=1,
    conditional_training=True,
    quantum_gradient_method="spsa",
    spsa_epsilon=0.01,
    sample_percentage=1.0,
    seed=42,
    classical_parameter_reduction=True,
    device="gpu",
    qpu_count=-1,
)

The quantum model can be trained by switching the `attn_type` argument to `'quantum'`:

In [None]:
from src.train import train_transformer

train_transformer(
    training_data="./dataset/qm9.csv",
    checkpoint_dir="./checkpoints/quantum_example/",
    checkpoint_resume_path=None,
    learning_rate=0.005,
    weight_decay=0.1,
    batch_size=256,
    epochs=3,
    save_every_n_batches=0,
    validation_split=0.05,
    attn_type="quantum",
    num_qubits=6,
    ansatz_layers=1,
    conditional_training=False,
    quantum_gradient_method="spsa",
    spsa_epsilon=0.01,
    sample_percentage=1.0,
    seed=42,
    classical_parameter_reduction=True,
    device="gpu",
    qpu_count=-1,
)

**Note**: Other datasets may be specified in the `training_data` argument. However, one has to ensure correct tokenization rules for your task. Currently, these can be modified by `self.smiles_regex` in the `Transformer_Dataset` class, which is located in `src.train`.