# Setup

## Anaconda Setup

First, we set up Anaconda environment to install all necessary packages.

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install()

⏬ Downloading https://github.com/conda-forge/miniforge/releases/download/23.1.0-1/Mambaforge-23.1.0-1-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:13
🔁 Restarting kernel...


Check that Anaconda is installed properly and also check the version of Anaconda.

In [None]:
!conda --version

conda 23.1.0


In [None]:
import condacolab
condacolab.check()

✨🍰✨ Everything looks OK!


Now we can install all necessary packages to run HBCVTr via Anaconda and pip.

In [None]:
!conda install -c conda-forge rdkit=2023.3.2 -y
!conda install -c conda-forge deepsmiles
!pip install transformers==4.31.0 SmilesPE==0.0.3

## Import Git Repository

Clone the repository from Github

In [None]:
!git clone https://github.com/imeewan/HBCVTr

Cloning into 'HBCVTr'...
remote: Enumerating objects: 131, done.[K
remote: Counting objects: 100% (61/61), done.[K
remote: Compressing objects: 100% (61/61), done.[K
remote: Total 131 (delta 36), reused 0 (delta 0), pack-reused 70[K
Receiving objects: 100% (131/131), 170.07 KiB | 2.43 MiB/s, done.
Resolving deltas: 100% (70/70), done.


Change directory to the cloned repository.

In [None]:
%cd HBCVTr

/content/HBCVTr


## Download Models

Finally, we download the trained models from Google Drive.

In [None]:
!gdown --id 1hDDNY9kE3Y-IFJEeILDxwG5NbRWMCWA8
!gdown --id 1vAkxP3y-FD5N5BpbfXIzTn5-nORlnv4T

Downloading...
From: https://drive.google.com/uc?id=1hDDNY9kE3Y-IFJEeILDxwG5NbRWMCWA8
To: /content/HBCVTr/hbv_model.pt
100% 1.12G/1.12G [00:15<00:00, 71.9MB/s]
Downloading...
From: https://drive.google.com/uc?id=1vAkxP3y-FD5N5BpbfXIzTn5-nORlnv4T
To: /content/HBCVTr/hcv_model.pt
100% 1.12G/1.12G [00:13<00:00, 80.2MB/s]


In [None]:
!mv hbv_model.pt model/
!mv hcv_model.pt model/

# Run Demo

In this demo, we will run a prediction using our HBCVTr model.

First, let's import all necessary packages.

In [None]:
from BartDataset import BartDataset
from CustomBart_Atomic_Tokenizer import CustomBart_Atomic_Tokenizer
from CustomBart_FG_Tokenizer import CustomBart_FG_Tokenizer
from TqdmWrap import TqdmWrap
from DualInputDataset import DualInputDataset
from DualBartModel import DualBartModel, CustomBartModel
import torch
from torch import nn
from torch.utils.data import DataLoader, RandomSampler, Dataset
from torch.optim import AdamW
import pandas as pd
import numpy as np
import random
import deepsmiles
from SmilesPE.tokenizer import *
from SmilesPE.pretokenizer import atomwise_tokenizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from rdkit import Chem
import codecs
from transformers import AdamW, BartTokenizer, BartForConditionalGeneration, BartConfig, get_linear_schedule_with_warmup, DataCollatorForSeq2Seq, Seq2SeqTrainingArguments, Seq2SeqTrainer, PreTrainedTokenizer
import re
from tqdm.auto import tqdm
from tqdm import tqdm
import itertools
import json
import os
from utils import *
from pretrained_utils import *
from rdkit import Chem
from rdkit.Chem import SaltRemover

Input the smiles and virus choice to predict here.

In [None]:
# smiles = input("Enter the SMILES of the compound: ")
smiles = 'C[C@H](Cn1cnc2c(N)ncnc21)OCP(=O)(O)OP(=O)(O)CO[C@H](C)Cn1cnc2c(N)ncnc21'
# virus_choice = input("Do you want to predict the compound's activity against HBV or HCV? (Enter HBV or HCV): ").lower()
virus_choice = 'hbv'

Finally, we run the prediction.

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

print("Analysis in progress ...")

if virus_choice == 'hbv':
  model_path = "model/hbv_model.pt"
  max_pact = max_pact_hbv
  min_pact = min_pact_hbv
elif virus_choice == 'hcv':
  model_path = "model/hcv_model.pt"
  max_pact = max_pact_hcv
  min_pact = min_pact_hcv
else:
  raise ValueError("Invalid input. Please enter either 'HBV' or 'HCV'.")

max_length = 250

model = DualBartModel(config1, config2, reg_mod)
model.load_state_dict(torch.load(model_path, map_location=device))
model.to(device)

smiles_data_no_salt = remove_salt(smiles)
smiles = smiles_data_no_salt

input_encoding1 = tokenizer1.encode_plus(smiles, truncation=True, max_length=max_length, padding='max_length', return_tensors="pt")
input_encoding2 = tokenizer2.encode_plus(smiles, truncation=True, max_length=max_length, padding='max_length', return_tensors="pt")

input_ids1 = input_encoding1['input_ids'].to(device)
attention_mask1 = input_encoding1['attention_mask'].to(device)
input_ids2 = input_encoding2['input_ids'].to(device)
attention_mask2 = input_encoding2['attention_mask'].to(device)


with torch.no_grad():
  output = model(input_ids1=input_ids1, attention_mask1=attention_mask1,
                  input_ids2=input_ids2, attention_mask2=attention_mask2)

prediction = output
prediction_value = prediction.cpu().numpy()[0]
print('SMILES: ', smiles)
print('Predicted pACT: ', prediction_value * (max_pact - min_pact) + min_pact)
predicted_EC50 = 10**-(prediction_value * (max_pact - min_pact) + min_pact) * 10**9
print('Predicted EC50 :', predicted_EC50, 'nM')

Analysis in progress ...
SMILES:  C[C@H](Cn1cnc2c(N)ncnc21)OCP(=O)(O)OP(=O)(O)CO[C@H](C)Cn1cnc2c(N)ncnc21
Predicted pACT:  8.122957168817521
Predicted EC50 : 7.534298651631602 nM
