# üß¨ molGPT: Conditional Molecular Generation with Transformers

**molGPT** is an end-to-end pipeline for generating novel drug-like molecules using a transformer-based language model (GPT-style). This project focuses on **conditional SMILES generation**, where molecular properties such as LogP, QED, TPSA, and scaffold are used to guide the generation process. It's an exciting intersection of **natural language processing** and **computational drug discovery**.

---

## üöÄ Highlights

- ‚úÖ Trains a decoder-only transformer (GPT) to generate valid SMILES strings
- üéØ Conditioned on molecular properties like:
  - **LogP** (lipophilicity)
  - **QED** (quantitative estimate of drug-likeness)
  - **TPSA** (topological polar surface area)
  - **Scaffold** (molecular backbone)
- üìä Includes evaluation metrics:
  - SMILES validity
  - Molecular uniqueness
  - Structural novelty (Tanimoto similarity)
  - Property alignment

---

## üß™ Dataset

Uses the [MOSES](https://github.com/molecularsets/moses) dataset ‚Äî a curated collection of drug-like molecules, suitable for generative modeling.

---




In [1]:
# Dependencies

!pip install pandas rdkit transformers[torch] accelerate>=0.26.0
!pip install scikit-learn matplotlib tqdm pathos
!pip install torch --index-url https://download.pytorch.org/whl/cu118
# !pip install git+https://github.com/molecularsets/moses.git

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pylibcugraph-cu12 24.12.0 requires pylibraft-cu12==24.12.*, but you have pylibraft-cu12 25.2.0 which is incompatible.
pylibcugraph-cu12 24.12.0 requires rmm-cu12==24.12.*, but you have rmm-cu12 25.2.0 which is incompatible.[0m[31m
Looking in indexes: https://download.pytorch.org/whl/cu118


## 1. Import packages

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path
from tqdm.auto import tqdm
from pathos.multiprocessing import ProcessingPool as Pool
from functools import partial

from rdkit import Chem
from rdkit.Chem import AllChem, Draw, Descriptors, QED, rdMolDescriptors
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem.Scaffolds.MurckoScaffold import GetScaffoldForMol

import torch
from torch.utils.data import Dataset, DataLoader

from transformers import (
    GPT2LMHeadModel,
    GPT2Tokenizer,
    GPT2Config,
    Trainer,
    TrainingArguments,
    DataCollatorForLanguageModeling
)

2025-04-23 10:51:53.734325: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1745405513.959311      31 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1745405514.024374      31 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


## 2. Data Loading and Filtering

 We'll use the MOSES dataset, which is a curated set of drug-like molecules specifically
 designed for machine learning applications. It's much smaller than ChEMBL
 (https://www.ebi.ac.uk/chembl/, database: https://chembl.gitbook.io/chembl-interface-documentation/)
 but still contains high-quality, drug-like compounds.

In [3]:
def load_moses_data():
    filtered_path = Path('/kaggle/input/dataset-v1-filtered/dataset_v1_filtered.csv')
    if filtered_path.exists():
      print(f"Loading pre-filtered dataset from {filtered_path}")
      df = pd.read_csv(filtered_path)
      print(f"Loaded {len(df)} pre-filtered drug-like molecules")
      return df
    train_path = Path('/kaggle/input/dataset-v1/dataset_v1.csv')
    print(f"Reading the original dataset from {train_path}")
    df = pd.read_csv(train_path)
    print(df.info)
    print("\nFirst few rows:")
    print(df.head())

    if 'smiles' in df.columns:
      df = df.rename(columns={'smiles': 'SMILES'})
    elif 'SMILES' not in df.columns:
      print("Available columns:", df.columns.tolist())
      raise ValueError("No 'smiles' or 'SMILES' column found in the dataset.")
    smiles_list = df['SMILES'].values
    print(f"\nFound {len(smiles_list)} SMILES strings in the dataset.")

    # Here we begin with filtering
    valid_mols = []
    for smi in tqdm(smiles_list, desc="Validating and filtering SMILES"):
      # Filter 1 ensures that all molecules are chemically and structurally valid
      mol = Chem.MolFromSmiles(smi)
      if mol is not None:
        # Filter 2 makes sure molecules with physicochemical properties out of a desirable range are removed from the list
        mw = Descriptors.ExactMolWt(mol) # Molecular weight (in Da units)
        logp = Descriptors.MolLogP(mol) # LogP(measured lipophilicity, i.e., how much a molecule likes to be solved in fat versus water)
        hbd = rdMolDescriptors.CalcNumHBD(mol) # Number of a molecule's hydrogen-bond donor heavy atoms
        hba = rdMolDescriptors.CalcNumHBA(mol) # Number of a molecule's hydrogen-bond acceptor heavy atoms

        if mw <= 500 and logp <=5 and hbd <= 5 and hba <= 10:
          # Filter 3 screens for problematic chemical groups shown to be associated with toxicity, carcinogenicity, etc.
          has_bad_groups = False
          patt_list = [
              '[N+]([O-])=O',  # Nitro groups: Highly reactive, can cause DNA damage and carcinogenicity
              '[S](=[O])(=[O])',  # Sulfonyl groups: Can be chemically reactive and cause skin/eye irritation
              '[P](=[O])',  # Phosphoryl groups: Potential toxicity and instability in biological systems
              '[As]'  # Arsenic: Highly toxic heavy metal with severe health risks and carcinogenic properties
          ]
          for patt in patt_list:
              if mol.HasSubstructMatch(Chem.MolFromSmarts(patt)):
                has_bad_groups = True
                break
          if not has_bad_groups:
            valid_mols.append(smi)

    # Create the filtered dataframe
    filtered_df = df[df['SMILES'].isin(valid_mols)]
    print(f"\nAfter filtering, {len(filtered_df)} molecules remain")
    return filtered_df

# Load and display filtered data
filtered_df = load_moses_data()
print("\nFirst few rows of filtered dataset:")
print(filtered_df.head())



Loading pre-filtered dataset from /kaggle/input/dataset-v1-filtered/dataset_v1_filtered.csv
Loaded 1735494 pre-filtered drug-like molecules

First few rows of filtered dataset:
                                   SMILES  SPLIT
0  CCCS(=O)c1ccc2[nH]c(=NC(=O)OC)[nH]c2c1  train
1    CC(C)(C)C(=O)C(Oc1ccc(Cl)cc1)n1ccnc1  train
2  CC1C2CCC(C2)C1CN(CCO)C(=O)c1ccc(Cl)cc1   test
3     Cc1c(Cl)cccc1Nc1ncccc1C(=O)OCC(O)CO  train
4        Cn1cnc2c1c(=O)n(CC(O)CO)c(=O)n2C  train


## 3. Descriptor Calculation (Scaffolds, logP, QED, TPSA)
 We compute additional descriptors needed:
 - Murcko Scaffolds: Core molecular framework obtained by removing side chains and keeping only ring systems and linkers between rings
 - QED
 - TPSA
 - LogP

We'll store these in the DataFrame alongside the SMILES.

In [4]:
def calculate_descriptors(smiles: str):
    mol = Chem.MolFromSmiles(smiles)
    if not mol:
        return None, None, None, None

    try:
        scaffold = GetScaffoldForMol(mol)
        scaffold_smiles = Chem.MolToSmiles(scaffold)

        qed_val = QED.qed(mol)

        tpsa_val = rdMolDescriptors.CalcTPSA(mol)

        logp_val = Descriptors.MolLogP(mol)

        return scaffold_smiles, logp_val, qed_val, tpsa_val
    except:
        return None, None, None, None

def process_batch(smiles_batch):
    results = []
    for smi in smiles_batch:
        results.append(calculate_descriptors(smi))
    return results

def calculate_descriptors_parallel(df, parallel=True, batch_size=100):
    print("Calculating molecular descriptors...")

    if parallel:
        n_cores = Pool().ncpus
        print(f"Detected {n_cores} CPU cores")
        print(f"Running descriptor calculations in parallel across {n_cores} cores")

        smiles_list = df['SMILES'].tolist()
        n_batches = (len(smiles_list) + batch_size - 1) // batch_size
        batches = [smiles_list[i*batch_size:(i+1)*batch_size]
                  for i in range(n_batches)]

        print(f"Processing {len(smiles_list)} SMILES strings in {n_batches} batches")

        with Pool() as pool:
            results = list(tqdm(
                pool.imap(process_batch, batches),
                total=n_batches,
                desc="Processing batches"
            ))

        all_results = [item for batch in results for item in batch]

    else:
        print("Running descriptor calculations sequentially")
        all_results = []
        for smi in tqdm(df['SMILES'], desc="Calculating descriptors"):
            all_results.append(calculate_descriptors(smi))

    scaffolds, logps, qeds, tpsas = zip(*all_results)

    df['Scaffold'] = scaffolds
    df['LogP'] = logps
    df['QED'] = qeds
    df['TPSA'] = tpsas

    df = df.dropna(subset=['Scaffold', 'LogP', 'QED', 'TPSA'])
    print(f"Final dataset size after descriptor calculation: {len(df)}")

    print("\nDescriptor Statistics:")
    print(f"LogP range: {df['LogP'].min():.2f} to {df['LogP'].max():.2f}")
    print(f"QED range: {df['QED'].min():.2f} to {df['QED'].max():.2f}")
    print(f"TPSA range: {df['TPSA'].min():.2f} to {df['TPSA'].max():.2f}")

    return df

filtered_df = calculate_descriptors_parallel(filtered_df, parallel=True)
filtered_df.head()

Calculating molecular descriptors...
Detected 4 CPU cores
Running descriptor calculations in parallel across 4 cores
Processing 1735494 SMILES strings in 17355 batches


Processing batches:   0%|          | 0/17355 [00:00<?, ?it/s]

Final dataset size after descriptor calculation: 1735494

Descriptor Statistics:
LogP range: -4.16 to 5.00
QED range: 0.21 to 0.95
TPSA range: 0.00 to 206.50


Unnamed: 0,SMILES,SPLIT,Scaffold,LogP,QED,TPSA
0,CCCS(=O)c1ccc2[nH]c(=NC(=O)OC)[nH]c2c1,train,N=c1[nH]c2ccccc2[nH]1,1.6807,0.896898,87.31
1,CC(C)(C)C(=O)C(Oc1ccc(Cl)cc1)n1ccnc1,train,c1ccc(OCn2ccnc2)cc1,3.7293,0.862259,44.12
2,CC1C2CCC(C2)C1CN(CCO)C(=O)c1ccc(Cl)cc1,test,O=C(NCC1CC2CCC1C2)c1ccccc1,3.4567,0.901948,40.54
3,Cc1c(Cl)cccc1Nc1ncccc1C(=O)OCC(O)CO,train,c1ccc(Nc2ccccn2)cc1,2.29702,0.701022,91.68
4,Cn1cnc2c1c(=O)n(CC(O)CO)c(=O)n2C,train,O=c1[nH]c(=O)c2[nH]cnc2[nH]1,-2.2131,0.646083,102.28


## 4. Preparing the Data for Conditional Generation

We will train a GPT model to generate the full SMILES given:
1. The scaffold SMILES
2. The desired LogP
3. The desired QED
4. The desired TPSA

One straightforward approach is to serialize these conditions into a single text string.
For example:

    "SCAFFOLD: Cc1ccccn1 | LOGP: 2.3 | QED: 0.72 | TPSA: 32.4 => FULL_SMILES"

We then train the model in a language modeling fashion to predict FULL_SMILES from
these inputs. Alternatively, we could build a more sophisticated approach that
incorporates the numeric data differently, but for demonstration, we'll do it text-based

In [5]:
def create_condition_text(row):
  scaffold_str = row['Scaffold']
  logp_str = f"{row['LogP']:.2f}"
  qed_str = f"{row['QED']:.2f}"
  tpsa_str = f"{row['TPSA']:.1f}"
  full_smiles = row['SMILES']

  input_text = f"SCAFFOLD: {scaffold_str} | LOGP: {logp_str} | QED: {qed_str} | TPSA: {tpsa_str} => f{full_smiles}"
  return input_text

filtered_df['conditional_text'] = filtered_df.apply(create_condition_text, axis=1)
filtered_df.head()

Unnamed: 0,SMILES,SPLIT,Scaffold,LogP,QED,TPSA,conditional_text
0,CCCS(=O)c1ccc2[nH]c(=NC(=O)OC)[nH]c2c1,train,N=c1[nH]c2ccccc2[nH]1,1.6807,0.896898,87.31,SCAFFOLD: N=c1[nH]c2ccccc2[nH]1 | LOGP: 1.68 |...
1,CC(C)(C)C(=O)C(Oc1ccc(Cl)cc1)n1ccnc1,train,c1ccc(OCn2ccnc2)cc1,3.7293,0.862259,44.12,SCAFFOLD: c1ccc(OCn2ccnc2)cc1 | LOGP: 3.73 | Q...
2,CC1C2CCC(C2)C1CN(CCO)C(=O)c1ccc(Cl)cc1,test,O=C(NCC1CC2CCC1C2)c1ccccc1,3.4567,0.901948,40.54,SCAFFOLD: O=C(NCC1CC2CCC1C2)c1ccccc1 | LOGP: 3...
3,Cc1c(Cl)cccc1Nc1ncccc1C(=O)OCC(O)CO,train,c1ccc(Nc2ccccn2)cc1,2.29702,0.701022,91.68,SCAFFOLD: c1ccc(Nc2ccccn2)cc1 | LOGP: 2.30 | Q...
4,Cn1cnc2c1c(=O)n(CC(O)CO)c(=O)n2C,train,O=c1[nH]c(=O)c2[nH]cnc2[nH]1,-2.2131,0.646083,102.28,SCAFFOLD: O=c1[nH]c(=O)c2[nH]cnc2[nH]1 | LOGP:...


## 5. Train/Validation/Test Split

In [6]:
from sklearn.model_selection import train_test_split

train_df, valtest_df = train_test_split(filtered_df, test_size=0.2, random_state=42)
val_df, test_df = train_test_split(valtest_df, test_size=0.5, random_state=42)

print(f"Train size: {len(train_df)}")
print(f"Val size: {len(val_df)}")
print(f"Test size: {len(test_df)}")

Train size: 1388395
Val size: 173549
Test size: 173550


## 6. Defining the GPT Model and Tokenizer
We'll create a small GPT2 model from scratch (or you could fine-tune a pretrained GPT2).
For demonstration, we'll use a smaller config for faster training.

In [7]:
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

tokenizer.pad_token = tokenizer.eos_token

config = GPT2Config(
    vocab_size=tokenizer.vocab_size,
    n_positions=256,
    n_ctx=256,
    n_embd=128,
    n_layer=4,
    n_head=4
)


model = GPT2LMHeadModel(config)



tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

## 7. Create a Custom Dataset for Language Modeling

In [8]:
class SmilesConditionalDataset(Dataset):
    def __init__(self, dataframe, tokenizer, max_length=256):
        self.data = dataframe
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        txt = self.data.iloc[idx]['conditional_text']

        return self.tokenizer(
            txt,
            max_length=self.max_length,
            truncation=True,
            return_special_tokens_mask=True
        )


train_dataset = SmilesConditionalDataset(train_df, tokenizer)
val_dataset   = SmilesConditionalDataset(val_df, tokenizer)
test_dataset  = SmilesConditionalDataset(test_df, tokenizer)

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

## 8. Training the model

In [9]:
# training_args = TrainingArguments(
#     output_dir="./conditional_gpt_smiles",
#     overwrite_output_dir=True,
#     num_train_epochs=1,
#     per_device_train_batch_size=2,
#     per_device_eval_batch_size=2,
#     eval_strategy="steps",
#     eval_steps=50,
#     save_steps=50,
#     logging_steps=50,
#     save_total_limit=1,
#     learning_rate=1e-4,
#     warmup_steps=100,
#     weight_decay=0.01,
#     run_name="gpt-smiles-cond-epoch1",                # <--- ÈÅøÂÖç W&B ÁöÑ warning
#     report_to="none",               # <--- ÂêØÁî® wandb ÈõÜÊàêÔºàËá™Âä®ËÆ∞ÂΩï metricsÔºâ
 
# )

training_args = TrainingArguments(
    output_dir="./conditional_gpt_smiles",
    overwrite_output_dir=True,
    num_train_epochs=3,
    per_device_train_batch_size=16,        # Increased
    per_device_eval_batch_size=16,         # Increased
    gradient_accumulation_steps=8,        # Added
    fp16=True,                            # Added for mixed precision
    eval_strategy="steps",
    eval_steps=1500,                       # Reduced frequency
    save_steps=1500,                       # Reduced frequency
    logging_steps=1200,                    # Reduced frequency
    save_total_limit=1,
    learning_rate=1e-3,                   # Adjusted for larger batch
    warmup_steps=100,
    weight_decay=0.01,
    dataloader_num_workers=4,             # Added
    run_name="gpt-smiles-cond-epoch1",               
    report_to="none",
)

# training_args = TrainingArguments(
#     output_dir="./conditional_gpt_smiles",
#     overwrite_output_dir=True,
#     num_train_epochs=1,
#     per_device_train_batch_size=16,
#     per_device_eval_batch_size=16,
#     gradient_accumulation_steps=8,
#     fp16=True,
#     fp16_opt_level="O2",           # More aggressive mixed precision
#     eval_strategy="steps",
#     eval_steps=500,
#     save_steps=500,
#     logging_steps=200,
#     save_total_limit=1,
#     learning_rate=1e-3,            # Adjusted for larger effective batch
#     warmup_steps=100,
#     weight_decay=0.01,
#     dataloader_num_workers=8,
#     dataloader_pin_memory=True,    # Faster data transfer
#     gradient_checkpointing=True,   # Memory efficient training
#     run_name="gpt-smiles-cond-epoch1",
#     report_to="none",
# )

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
    eval_dataset=val_dataset
)
print("Starting training...")
trainer.train()

Starting training...


`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.


Step,Training Loss,Validation Loss
1500,1.2926,0.555617
3000,0.5846,0.493483
4500,0.5339,0.464609
6000,0.4903,0.447951
7500,0.4775,0.436301
9000,0.4677,0.425407
10500,0.4589,0.419265
12000,0.4472,0.41348
13500,0.442,0.407734
15000,0.4378,0.403773


TrainOutput(global_step=32538, training_loss=0.47483191506879785, metrics={'train_runtime': 11973.771, 'train_samples_per_second': 347.859, 'train_steps_per_second': 2.717, 'total_flos': 1905046386204672.0, 'train_loss': 0.47483191506879785, 'epoch': 2.999919331604725})

## 9. Inference (Generating SMILES from Conditions)

During inference, we'll provide the scaffold and desired property values. For example:

"Scaffold: <scaffold> | LogP: <val> | QED: <val> | TPSA: <val> =>"

The model should complete the sequence by generating a SMILES string.

We'll generate multiple samples to test uniqueness and validity.

In [11]:

# def generate_smiles_from_conditions(model, tokenizer, scaffold, logp, qed, tpsa,
#                                     max_length=256, num_return_sequences=1):
#     prompt = f"Scaffold: {scaffold} | LogP: {logp:.2f} | QED: {qed:.2f} | TPSA: {tpsa:.2f} =>"

#     input_ids = tokenizer.encode(prompt, return_tensors='pt')
#     input_ids = input_ids.to(model.device)

#     with torch.no_grad():
#         outputs = model.generate(
#             input_ids=input_ids,
#             max_length=max_length,
#             num_return_sequences=num_return_sequences,
#             do_sample=True,           # Use sampling
#             top_k=50,                 # Adjust as desired
#             top_p=0.95,               # Adjust as desired
#             temperature=0.7,          # Adjust as desired
#             pad_token_id=tokenizer.eos_token_id
#         )

#     generated_texts = []
#     for output in outputs:
#         text = tokenizer.decode(output, skip_special_tokens=True)
#         # We want only the SMILES part after the "=>"
#         if "=>" in text:
#             smiles_part = text.split("=>")[-1].strip()
#             generated_texts.append(smiles_part)
#         else:
#             generated_texts.append(text)

#     return generated_texts

# trainer.model.eval()
# row = test_df.iloc[0]
# generated_smiles = generate_smiles_from_conditions(
#     model=trainer.model,
#     tokenizer=tokenizer,
#     scaffold=row['Scaffold'],
#     logp=row['LogP'],
#     qed=row['QED'],
#     tpsa=row['TPSA'],
#     num_return_sequences=5
# )
# # print(generated_smiles)


def generate_smiles_from_conditions(model, tokenizer, scaffold, logp, qed, tpsa,
                                    max_length=256, num_return_sequences=1):
    prompt = f"SCAFFOLD: {scaffold} | LOGP: {logp:.2f} | QED: {qed:.2f} | TPSA: {tpsa:.2f} =>"

    # ÊòæÂºèÂàõÂª∫Ê≥®ÊÑèÂäõÊé©Á†Å
    encoded_input = tokenizer(prompt, return_tensors='pt', padding=True)
    input_ids = encoded_input['input_ids'].to(model.device)
    attention_mask = encoded_input['attention_mask'].to(model.device)
    
    print(f"ÁîüÊàêÊù°‰ª∂: {prompt}")
    print(f"ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: {input_ids.shape}")

    with torch.no_grad():
        outputs = model.generate(
            input_ids=input_ids,
            max_length=max_length,
            num_return_sequences=num_return_sequences,
            do_sample=True,           # Use sampling
            top_k=100,                # Increased from 50 to 100
            top_p=0.9,                # Lowered from 0.95 to 0.9
            temperature=0.5,          # Lowered from 0.7 to 0.5
            pad_token_id=tokenizer.eos_token_id
        )
        #     print("Ê®°ÂûãÁîüÊàêÂÆåÊàê")
        # except Exception as e:
        #     print(f"ÁîüÊàêËøáÁ®ã‰∏≠ÂèëÁîüÈîôËØØ: {e}")
        #     return []

    generated_texts = []
    for output in outputs:
        text = tokenizer.decode(output, skip_special_tokens=True)
        print(f"ÂéüÂßãÁîüÊàêÊñáÊú¨: {text}")
        # Êàë‰ª¨Âè™ÈúÄË¶Å "=>" ÂêéÈù¢ÁöÑSMILESÈÉ®ÂàÜ
        if "=>" in text:
            smiles_part = text.split("=>")[-1].strip()
            generated_texts.append(smiles_part)
        else:
            generated_texts.append(text)

    return generated_texts

# Ê∑ªÂä†Ë∞ÉËØï‰ø°ÊÅØ
print("ÂáÜÂ§áËØÑ‰º∞Ê®°Âûã...")
trainer.model.eval()
print("Ê®°ÂûãÂ∑≤ËÆæÁΩÆ‰∏∫ËØÑ‰º∞Ê®°Âºè")

# try:
#     row = test_df.iloc[0]
#     print(f"ÊµãËØïÊ†∑Êú¨: Scaffold={row['Scaffold']}, LogP={row['LogP']:.2f}, QED={row['QED']:.2f}, TPSA={row['TPSA']:.2f}")
    
#     generated_smiles = generate_smiles_from_conditions(
#         model=trainer.model,
#         tokenizer=tokenizer,
#         scaffold=row['Scaffold'],
#         logp=row['LogP'],
#         qed=row['QED'],
#         tpsa=row['TPSA'],
#         num_return_sequences=1  # ÂÖàÂ∞ùËØïÂè™ÁîüÊàê‰∏Ä‰∏™Ê†∑Êú¨
#     )
    
#     print("ÁîüÊàêÁªìÊûú:")
#     for i, smi in enumerate(generated_smiles):
#         print(f"ÁîüÊàêÁöÑSMILES {i+1}: {smi}")
#         mol = Chem.MolFromSmiles(smi)
#         if mol:
#             print(f"ÊúâÊïàÁöÑSMILES: ÊòØ")
#         else:
#             print(f"ÊúâÊïàÁöÑSMILES: Âê¶")
# except Exception as e:
#     print(f"ÊâßË°åËøáÁ®ã‰∏≠ÂèëÁîüÈîôËØØ: {str(e)}")



ÂáÜÂ§áËØÑ‰º∞Ê®°Âûã...
Ê®°ÂûãÂ∑≤ËÆæÁΩÆ‰∏∫ËØÑ‰º∞Ê®°Âºè


## 10. Evaluation
We will:
1. Generate 1000 molecules with random conditions from the test set.
2. Check:
   - Valid SMILES: Can RDKit parse them?
   - Unique SMILES: How many are duplicates?
   - Tanimoto similarity to training set.
   - Distribution of predicted properties.

In [12]:
def is_valid_smiles(smi):
    mol = Chem.MolFromSmiles(smi)
    return mol is not None

def compute_tanimoto_similarity(smi1, smi2, radius=2, nBits=2048):
    mol1 = Chem.MolFromSmiles(smi1)
    mol2 = Chem.MolFromSmiles(smi2)
    if mol1 is None or mol2 is None:
        return None
    fp1 = AllChem.GetMorganFingerprintAsBitVect(mol1, radius, nBits=nBits)
    fp2 = AllChem.GetMorganFingerprintAsBitVect(mol2, radius, nBits=nBits)
    return rdMolDescriptors.TanimotoSimilarity(fp1, fp2)

# Let's define a quick function to evaluate.
# In practice, you might do more robust analysis.
def evaluate_model(
    model,
    tokenizer,
    reference_df,
    n_samples=1000
):
    model.eval()

    valid_count = 0
    unique_smiles = set()
    similarities = []

    # We'll store the property differences if we want to check property distribution.
    requested_logps, generated_logps = [], []
    requested_qeds, generated_qeds = [], []
    requested_tpsas, generated_tpsas = [], []

    # Convert train_df SMILES to a list for similarity reference
    train_smiles_list = train_df['SMILES'].tolist()

    for i in range(n_samples):
        # Randomly select a row from the reference set or sample property values
        row = reference_df.sample(n=1).iloc[0]
        scaffold = row['Scaffold']
        logp_req = row['LogP']
        qed_req = row['QED']
        tpsa_req = row['TPSA']

        gen_smiles_list = generate_smiles_from_conditions(
            model, tokenizer, scaffold, logp_req, qed_req, tpsa_req,
            num_return_sequences=1
        )

        gen_smi = gen_smiles_list[0]

        if is_valid_smiles(gen_smi):
            valid_count += 1
            unique_smiles.add(Chem.MolToSmiles(Chem.MolFromSmiles(gen_smi)))  # canonical

            # Tanimoto similarity (just to the original training data)
            # We'll compute the max similarity to any molecule in the training set
            # as a measure of novelty.
            best_sim = 0
            for train_smi in train_smiles_list[:1000]:  # limit to 1000 for speed
                sim = compute_tanimoto_similarity(gen_smi, train_smi)
                if sim is not None and sim > best_sim:
                    best_sim = sim
            similarities.append(best_sim)

            # Check the property distribution if desired
            gen_mol = Chem.MolFromSmiles(gen_smi)
            if gen_mol:
                gen_logp = Descriptors.MolLogP(gen_mol)
                gen_qed = QED.qed(gen_mol)
                gen_tpsa = Descriptors.TPSA(gen_mol)

                requested_logps.append(logp_req)
                generated_logps.append(gen_logp)
                requested_qeds.append(qed_req)
                generated_qeds.append(gen_qed)
                requested_tpsas.append(tpsa_req)
                generated_tpsas.append(gen_tpsa)

    validity_ratio = valid_count / n_samples
    uniqueness_ratio = len(unique_smiles) / n_samples
    avg_similarity = np.mean(similarities) if similarities else 0

    print(f"Validity: {validity_ratio:.2f}")
    print(f"Uniqueness: {uniqueness_ratio:.2f}")
    print(f"Average Tanimoto similarity to training set: {avg_similarity:.2f}")

    # Plot property distribution comparisons
    # For example, requested vs generated LogP
    if len(requested_logps) > 0:
        plt.figure()
        plt.scatter(requested_logps, generated_logps, alpha=0.5)
        plt.xlabel("Requested LogP")
        plt.ylabel("Generated LogP")
        plt.title("Requested vs. Generated LogP")
        plt.show()

        plt.figure()
        plt.scatter(requested_qeds, generated_qeds, alpha=0.5)
        plt.xlabel("Requested QED")
        plt.ylabel("Generated QED")
        plt.title("Requested vs. Generated QED")
        plt.show()

        plt.figure()
        plt.scatter(requested_tpsas, generated_tpsas, alpha=0.5)
        plt.xlabel("Requested TPSA")
        plt.ylabel("Generated TPSA")
        plt.title("Requested vs. Generated TPSA")
        plt.show()

In [13]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from rdkit import Chem
from rdkit.Chem import AllChem, Descriptors, QED, rdMolDescriptors
import torch

# Assuming you have already loaded or have access to:
# 1. The trained model (model)
# 2. The tokenizer (tokenizer)
# 3. The test dataset (test_df)
# 4. The training dataset (train_df)

# Set the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Run the evaluation with 1000 samples
print("Starting model evaluation with 1000 samples...")
evaluate_model(
    model=model,
    tokenizer=tokenizer,
    reference_df=test_df,
    n_samples=10
)

# If you want to run with fewer samples for a quicker test
print("\nQuick evaluation with 100 samples...")
evaluate_model(
    model=model,
    tokenizer=tokenizer,
    reference_df=test_df,
    n_samples=10
)

# Optional: Save some example molecules as images
def visualize_examples(model, tokenizer, test_df, num_examples=5):
    print(f"\nGenerating {num_examples} example molecules...")
    example_rows = test_df.sample(n=num_examples)
    
    for i, row in enumerate(example_rows.itertuples()):
        scaffold = row.Scaffold
        logp_req = row.LogP
        qed_req = row.QED
        tpsa_req = row.TPSA
        
        print(f"\nExample {i+1}:")
        print(f"Scaffold: {scaffold}")
        print(f"Requested LogP: {logp_req:.2f}")
        print(f"Requested QED: {qed_req:.2f}")
        print(f"Requested TPSA: {tpsa_req:.2f}")
        
        gen_smiles_list = generate_smiles_from_conditions(
            model, tokenizer, scaffold, logp_req, qed_req, tpsa_req,
            num_return_sequences=3
        )
        
        for j, smi in enumerate(gen_smiles_list):
            if is_valid_smiles(smi):
                mol = Chem.MolFromSmiles(smi)
                print(f"  Generated SMILES {j+1}: {smi}")
                
                # Calculate actual properties
                gen_logp = Descriptors.MolLogP(mol)
                gen_qed = QED.qed(mol)
                gen_tpsa = Descriptors.TPSA(mol)
                
                print(f"  Actual LogP: {gen_logp:.2f} (diff: {gen_logp - logp_req:.2f})")
                print(f"  Actual QED: {gen_qed:.2f} (diff: {gen_qed - qed_req:.2f})")
                print(f"  Actual TPSA: {gen_tpsa:.2f} (diff: {gen_tpsa - tpsa_req:.2f})")
            else:
                print(f"  Generated SMILES {j+1}: {smi} (INVALID)")

# Run the visualization function
visualize_examples(model, tokenizer, test_df, num_examples=5)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Starting model evaluation with 1000 samples...
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: c1ccccc1 | LOGP: 2.35 | QED: 0.81 | TPSA: 38.77 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 34])
ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: c1ccccc1 | LOGP: 2.35 | QED: 0.81 | TPSA: 38.77 => fCOC(=O)C(C)Nc1cccc(C(F)(F)F)c1F)C(C)C11C=O | TPSA: 81.2)ccc1F)O)N1CCO)C1)C111C1 | LOGP: TPSA:1)cccccc1)F)N)O)C#N)C1C1F)N)C1F)C1C#N1C#N1F)C#N1F)C#N1F)C#N1C#N1C#N1C#N1)C#N1F)C#N1F)C#N1F)C#N1C#N1F)C#N1F)C#N1F)C#N1F)C#N1F)C#N1F)C#N1F)C#N1FF#N1F)
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: O=C(NC1CCCCC1)c1cccnn1 | LOGP: 2.10 | QED: 0.91 | TPSA: 72.11 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 44])


[14:54:01] SMILES Parse Error: syntax error while parsing: fCOC(=O)C(C)Nc1cccc(C(F)(F)F)c1F)C(C)C11C=O
[14:54:01] SMILES Parse Error: check for mistakes around position 1:
[14:54:01] fCOC(=O)C(C)Nc1cccc(C(F)(F)F)c1F)C(C)C11C
[14:54:01] ^
[14:54:01] SMILES Parse Error: Failed parsing SMILES 'fCOC(=O)C(C)Nc1cccc(C(F)(F)F)c1F)C(C)C11C=O' for input: 'fCOC(=O)C(C)Nc1cccc(C(F)(F)F)c1F)C(C)C11C=O'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(NC1CCCCC1)c1cccnn1 | LOGP: 2.10 | QED: 0.91 | TPSA: 72.11 => fCC(=O)Nc1ccc(C(=O)NC2CCCCC2)nn1C(C)C)C1CC1F1 | TPSA:2)F11111111111111111111111111 | LOGP: 0.4 | T1)1)nn1)C11)C11)C11114444444444444421 | TPS3142 | TPS3142 | TPS42 | T42 | T4CO4CO4CO42CO42C3CO4CO4CO4CO4C2CO4CO4CO4CO4CO4CO4CO4CO4C2C2C2CO2C2CO4C1C2C2C2CO2CO1CO1CO4C2C2CO2C
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: O=C(Cc1cnoc1)N1CCC(Oc2ccccc2)CC1 | LOGP: 3.52 | QED: 0.85 | TPSA: 55.57 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 53])


[14:54:02] SMILES Parse Error: syntax error while parsing: fCC(=O)Nc1ccc(C(=O)NC2CCCCC2)nn1C(C)C)C1CC1F1
[14:54:02] SMILES Parse Error: check for mistakes around position 1:
[14:54:02] fCC(=O)Nc1ccc(C(=O)NC2CCCCC2)nn1C(C)C)C1C
[14:54:02] ^
[14:54:02] SMILES Parse Error: Failed parsing SMILES 'fCC(=O)Nc1ccc(C(=O)NC2CCCCC2)nn1C(C)C)C1CC1F1' for input: 'fCC(=O)Nc1ccc(C(=O)NC2CCCCC2)nn1C(C)C)C1CC1F1'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(Cc1cnoc1)N1CCC(Oc2ccccc2)CC1 | LOGP: 3.52 | QED: 0.85 | TPSA: 55.57 => fCc1noc(C)c1CC(=O)N1CCC(Oc2ccc(Cl)cc2)CC1C1C)C1111111111111111111111111FO2 | T1)C1C2 | TPS21F4O4CO2 | TPS21F2C2 | TPS31F4CO4CO4CO4CO4CO2F4CO4CO4C2F4CO4C2CO4CO4CO2C2C2CO2C2C2C2)C2)C2C2C2CO2C2C2CO2C2CO4CO2C2CO2)CO4CO2C2C2C2C2CO2C2C2CO2CO2F31CO2CO2)CO2C2F
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: O=C(Nc1ccccc1)NC1(c2ncon2)CCCC1 | LOGP: 3.37 | QED: 0.86 | TPSA: 89.28 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 52])


[14:54:03] SMILES Parse Error: syntax error while parsing: fCc1noc(C)c1CC(=O)N1CCC(Oc2ccc(Cl)cc2)CC1C1C)C1111111111111111111111111FO2
[14:54:03] SMILES Parse Error: check for mistakes around position 1:
[14:54:03] fCc1noc(C)c1CC(=O)N1CCC(Oc2ccc(Cl)cc2)CC1
[14:54:03] ^
[14:54:03] SMILES Parse Error: Failed parsing SMILES 'fCc1noc(C)c1CC(=O)N1CCC(Oc2ccc(Cl)cc2)CC1C1C)C1111111111111111111111111FO2' for input: 'fCc1noc(C)c1CC(=O)N1CCC(Oc2ccc(Cl)cc2)CC1C1C)C1111111111111111111111111FO2'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(Nc1ccccc1)NC1(c2ncon2)CCCC1 | LOGP: 3.37 | QED: 0.86 | TPSA: 89.28 => fCOC(=O)c1cccc(NC(=O)NC2(c3noc(C)n3)CCCC2)c1C)C1C)C)C1C)C1C)C)C1C1C1C1C1C)C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1 | LOGPPPPPPP: | LOGPP: 0.72 | TPSA: 0.82 | TPSA:)C1C1C1C1C1C1C1C1C1C1C1C1
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: O=c1[nH]c(Cc2cccc3cccnc23)nc2ccsc12 | LOGP: 3.12 | QED: 0.62 | TPSA: 58.64 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 54])


[14:54:04] SMILES Parse Error: syntax error while parsing: fCOC(=O)c1cccc(NC(=O)NC2(c3noc(C)n3)CCCC2)c1C)C1C)C)C1C)C1C)C)C1C1C1C1C1C)C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1
[14:54:04] SMILES Parse Error: check for mistakes around position 1:
[14:54:04] fCOC(=O)c1cccc(NC(=O)NC2(c3noc(C)n3)CCCC2
[14:54:04] ^
[14:54:04] SMILES Parse Error: Failed parsing SMILES 'fCOC(=O)c1cccc(NC(=O)NC2(c3noc(C)n3)CCCC2)c1C)C1C)C)C1C)C1C)C)C1C1C1C1C1C)C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1' for input: 'fCOC(=O)c1cccc(NC(=O)NC2(c3noc(C)n3)CCCC2)c1C)C1C)C)C1C)C1C)C)C1C1C1C1C1C)C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=c1[nH]c(Cc2cccc3cccnc23)nc2ccsc12 | LOGP: 3.12 | QED: 0.62 | TPSA: 58.64 => fCc1cc2c(=O)[nH]c(Cc3cccc4cccnc34)nc2s1 | TPSA: 80.7 => fO)cc2n1Cc1ccsc1)O2)C1C1C1C1F)N1F)N1F)C1F)N1F)C1F)C1F)C1F)C1F)N1F)C1F)C1F)C1C1C1F)C1C1CC1CC1CC111F1C1F)C1C1CC1F1F)C1F1F1C1F11C1F1C1F1F1C11C1C1 | TPSA:F1F1 | TPSA:F)C1C1F)C1C1F
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: O=C(Cn1ccc2ccccc21)N=c1[nH]nc2ccccn12 | LOGP: 2.71 | QED: 0.61 | TPSA: 67.45 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 57])


[14:54:05] SMILES Parse Error: syntax error while parsing: fO)cc2n1Cc1ccsc1)O2)C1C1C1C1F)N1F)N1F)C1F)N1F)C1F)C1F)C1F)C1F)N1F)C1F)C1F)C1C1C1F)C1C1CC1CC1CC111F1C1F)C1C1CC1F1F)C1F1F1C1F11C1F1C1F1F1C11C1C1
[14:54:05] SMILES Parse Error: check for mistakes around position 1:
[14:54:05] fO)cc2n1Cc1ccsc1)O2)C1C1C1C1F)N1F)N1F)C1F
[14:54:05] ^
[14:54:05] SMILES Parse Error: Failed parsing SMILES 'fO)cc2n1Cc1ccsc1)O2)C1C1C1C1F)N1F)N1F)C1F)N1F)C1F)C1F)C1F)C1F)N1F)C1F)C1F)C1C1C1F)C1C1CC1CC1CC111F1C1F)C1C1CC1F1F)C1F1F1C1F11C1F1C1F1F1C11C1C1' for input: 'fO)cc2n1Cc1ccsc1)O2)C1C1C1C1F)N1F)N1F)C1F)N1F)C1F)C1F)C1F)C1F)N1F)C1F)C1F)C1C1C1F)C1C1CC1CC1CC111F1C1F)C1C1CC1F1F)C1F1F1C1F11C1F1C1F1F1C11C1C1'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(Cn1ccc2ccccc21)N=c1[nH]nc2ccccn12 | LOGP: 2.71 | QED: 0.61 | TPSA: 67.45 => fO=C(Cn1ccc2ccccc21)N=c1[nH]nc2cc(Cl)ccn12 | TPSA:)n1C(F)F)c1ccccn1F)F)F)F)F)FF)F | TPSA:FFF | TPSA:F)c1F | TPSA:F)F)cn1FFF)cn1FFFFF)FFF)cn1FFFFFFFFFFFFFF | TPSA:F1cccccccccccccccccc1F)F)F | TPSA:F)F | TPSA:F | TPSA:n1F)cn1F)cccc1FFF)n1FF)cn1FFFFFF)n
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: O=C(c1occc1-c1ccccc1)N1CCNCC1 | LOGP: 1.88 | QED: 0.86 | TPSA: 62.99 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 51])


[14:54:06] SMILES Parse Error: syntax error while parsing: fO=C(Cn1ccc2ccccc21)N=c1[nH]nc2cc(Cl)ccn12
[14:54:06] SMILES Parse Error: check for mistakes around position 1:
[14:54:06] fO=C(Cn1ccc2ccccc21)N=c1[nH]nc2cc(Cl)ccn1
[14:54:06] ^
[14:54:06] SMILES Parse Error: Failed parsing SMILES 'fO=C(Cn1ccc2ccccc21)N=c1[nH]nc2cc(Cl)ccn12' for input: 'fO=C(Cn1ccc2ccccc21)N=c1[nH]nc2cc(Cl)ccn12'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(c1occc1-c1ccccc1)N1CCNCC1 | LOGP: 1.88 | QED: 0.86 | TPSA: 62.99 => fCC(=O)N1CCN(C(=O)c2occc2-c2ccccc2)CC1C(=O)OC)C1C)C1C | TPSA:)C11C1C)C1C11 | TPSA:: 581C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1 | TPSA:11C11C1C11C1)C11C1C11C1C1C11C1C1)C1 | LOGP:1C11C1C1C1 | LOGP:11 | LOGP:1C11C1C1C1 | LOGP:::1)C1 | LOGP:1)C1
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: c1ccc(-n2ncc3c2ncn2cnnc32)cc1 | LOGP: 1.77 | QED: 0.52 | TPSA: 60.90 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 49])


[14:54:07] SMILES Parse Error: syntax error while parsing: fCC(=O)N1CCN(C(=O)c2occc2-c2ccccc2)CC1C(=O)OC)C1C)C1C
[14:54:07] SMILES Parse Error: check for mistakes around position 1:
[14:54:07] fCC(=O)N1CCN(C(=O)c2occc2-c2ccccc2)CC1C(=
[14:54:07] ^
[14:54:07] SMILES Parse Error: Failed parsing SMILES 'fCC(=O)N1CCN(C(=O)c2occc2-c2ccccc2)CC1C(=O)OC)C1C)C1C' for input: 'fCC(=O)N1CCN(C(=O)c2occc2-c2ccccc2)CC1C(=O)OC)C1C)C1C'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: c1ccc(-n2ncc3c2ncn2cnnc32)cc1 | LOGP: 1.77 | QED: 0.52 | TPSA: 60.90 => fCc1nnc2c3c(nn2n1)c1cnn2-c1ccccc1F)C(F)F2F)FF1F | TPSA:F)F1FF)N1ccccc1F)F)F)F)F)F)F)F1FF1F1F1F)N1F)F1FF1F1F1F)F1F | TPSA:F1F1F1F1F1F1F | TPSA:F)N1F1F)N1F1F1F1F1F1F1CCO4O4O4O4O4O4O4O4O4O4O4O4O4O4O4O4O4O4O4O4O4O4O4O4O4O4O4O4O4
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: O=C(NCc1ncon1)NCC1(c2ccccc2)CC1 | LOGP: 2.19 | QED: 0.88 | TPSA: 80.05 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 52])


[14:54:09] SMILES Parse Error: syntax error while parsing: fCc1nnc2c3c(nn2n1)c1cnn2-c1ccccc1F)C(F)F2F)FF1F
[14:54:09] SMILES Parse Error: check for mistakes around position 1:
[14:54:09] fCc1nnc2c3c(nn2n1)c1cnn2-c1ccccc1F)C(F)F2
[14:54:09] ^
[14:54:09] SMILES Parse Error: Failed parsing SMILES 'fCc1nnc2c3c(nn2n1)c1cnn2-c1ccccc1F)C(F)F2F)FF1F' for input: 'fCc1nnc2c3c(nn2n1)c1cnn2-c1ccccc1F)C(F)F2F)FF1F'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(NCc1ncon1)NCC1(c2ccccc2)CC1 | LOGP: 2.19 | QED: 0.88 | TPSA: 80.05 => fCc1nc(CNC(=O)NCC2(c3ccc(F)cc3)CC2)no1 | TPSA: 88.0)no1 | TPSA: 101.0)0)0)C1)C1C1C1C1C1C1C1C1COCC1O1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1CO1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: O=C(CSc1ncn(-c2ccccc2)n1)N1CCc2ccccc21 | LOGP: 2.95 | QED: 0.69 | TPSA: 51.02 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 57])


[14:54:10] SMILES Parse Error: syntax error while parsing: fCc1nc(CNC(=O)NCC2(c3ccc(F)cc3)CC2)no1
[14:54:10] SMILES Parse Error: check for mistakes around position 1:
[14:54:10] fCc1nc(CNC(=O)NCC2(c3ccc(F)cc3)CC2)no1
[14:54:10] ^
[14:54:10] SMILES Parse Error: Failed parsing SMILES 'fCc1nc(CNC(=O)NCC2(c3ccc(F)cc3)CC2)no1' for input: 'fCc1nc(CNC(=O)NCC2(c3ccc(F)cc3)CC2)no1'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(CSc1ncn(-c2ccccc2)n1)N1CCc2ccccc21 | LOGP: 2.95 | QED: 0.69 | TPSA: 51.02 => fO=C(CSc1ncn(-c2ccccc2)n1)N1CCc2ccccc21F)N1CC1FFFF1FF | TFF)F1F1FF1FFF | TPSA:n1F)F)F | TPSA:F | TPSA:F1F)F)F1F1F1F1F1F | TPSA:1F1F1F1F | TPSA:1F)N1F1F1F1F1F1F11F1F)N11F1F1F1F1F1cncccccccccccccccccccccccccccccc1F1N1F)N1F)F1N1F1 | Tnn1F1F1F1 | TO4nn
Validity: 0.00
Uniqueness: 0.00
Average Tanimoto similarity to training set: 0.00

Quick evaluation with 100 samples...
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: O=C(NC1CCNCC1)c1ccc2[nH]cnc2c1 | LOGP: 1.52 | QED: 0.88 | TPSA: 87.32 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 53])


[14:54:11] SMILES Parse Error: syntax error while parsing: fO=C(CSc1ncn(-c2ccccc2)n1)N1CCc2ccccc21F)N1CC1FFFF1FF
[14:54:11] SMILES Parse Error: check for mistakes around position 1:
[14:54:11] fO=C(CSc1ncn(-c2ccccc2)n1)N1CCc2ccccc21F)
[14:54:11] ^
[14:54:11] SMILES Parse Error: Failed parsing SMILES 'fO=C(CSc1ncn(-c2ccccc2)n1)N1CCc2ccccc21F)N1CC1FFFF1FF' for input: 'fO=C(CSc1ncn(-c2ccccc2)n1)N1CCc2ccccc21F)N1CC1FFFF1FF'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(NC1CCNCC1)c1ccc2[nH]cnc2c1 | LOGP: 1.52 | QED: 0.88 | TPSA: 87.32 => fCCOC(=O)N1CCC(NC(=O)c2ccc3[nH]cnc3c2)CC1COCOC2C1C1)C1C1C1C1 | | TA:A:C1 | TPSA:2)C1)C1C1)n1)N1)C1C1)N1F)C1C1F)N1C1 | TPSA:2)C1C1C1F)C1F)N1C1C1C1C1C1C1C1C1C1F)C1C1C1C1C1C1C1C1C1C1C1)C1C1C1C1C1 | LOGP:F)C1 | LOGP:1C1C1C1C1C1C1 | LOGP:1
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: c1ccc2c(c1)CCNC2 | LOGP: 3.13 | QED: 0.76 | TPSA: 12.03 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 41])


[14:54:12] SMILES Parse Error: syntax error while parsing: fCCOC(=O)N1CCC(NC(=O)c2ccc3[nH]cnc3c2)CC1COCOC2C1C1)C1C1C1C1
[14:54:12] SMILES Parse Error: check for mistakes around position 1:
[14:54:12] fCCOC(=O)N1CCC(NC(=O)c2ccc3[nH]cnc3c2)CC1
[14:54:12] ^
[14:54:12] SMILES Parse Error: Failed parsing SMILES 'fCCOC(=O)N1CCC(NC(=O)c2ccc3[nH]cnc3c2)CC1COCOC2C1C1)C1C1C1C1' for input: 'fCCOC(=O)N1CCC(NC(=O)c2ccc3[nH]cnc3c2)CC1COCOC2C1C1)C1C1C1C1'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: c1ccc2c(c1)CCNC2 | LOGP: 3.13 | QED: 0.76 | TPSA: 12.03 => fFc1ccc2c(c1Br)C(F)(F)NCC2COC(F)F)FF1F2F1F | TPSA:1F | TPSA:1F | TPSA:1F1F | QED:1F)N1F)F)N1CC1CCOCCO1F)F1F)N1F)F)N1F1F)N1CCO)F)N1CCOCCO1CCO1CCOCCO1F1F1F1F1F1F)N1F1F1F1CCOCCO1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: O=C(NCCN1CCCCCC1=O)c1cncs1 | LOGP: 1.89 | QED: 0.92 | TPSA: 62.30 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 48])


[14:54:13] SMILES Parse Error: syntax error while parsing: fFc1ccc2c(c1Br)C(F)(F)NCC2COC(F)F)FF1F2F1F
[14:54:13] SMILES Parse Error: check for mistakes around position 1:
[14:54:13] fFc1ccc2c(c1Br)C(F)(F)NCC2COC(F)F)FF1F2F1
[14:54:13] ^
[14:54:13] SMILES Parse Error: Failed parsing SMILES 'fFc1ccc2c(c1Br)C(F)(F)NCC2COC(F)F)FF1F2F1F' for input: 'fFc1ccc2c(c1Br)C(F)(F)NCC2COC(F)F)FF1F2F1F'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(NCCN1CCCCCC1=O)c1cncs1 | LOGP: 1.89 | QED: 0.92 | TPSA: 62.30 => fCc1ncsc1C(=O)NCCN1CCCCCC1=O | TPSA: 75.2 | TPSA: 82.3)CCCC1=O)c1cncs1 | TPSA:)N1CCCCCC1=O)N1 | TPSA: 62.3)C#N1C#N)C#N)C#N)C#N1C#N)C#C#C#N)C#N1C#N)C1C1C1C#N1CCO4 | TPSA:3 | TPSA:3)C#N1C1C#N1C#N1C#N)C2)C#N1C#N1C1C1C2)C2)C2)C2)C2)C#N2)C2)C2)C2)C2)C
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: O=C(CCNC(=O)C1CC1)NCC1CC=CCC1 | LOGP: 1.62 | QED: 0.72 | TPSA: 58.20 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 50])


[14:54:15] SMILES Parse Error: syntax error while parsing: fCc1ncsc1C(=O)NCCN1CCCCCC1=O
[14:54:15] SMILES Parse Error: check for mistakes around position 1:
[14:54:15] fCc1ncsc1C(=O)NCCN1CCCCCC1=O
[14:54:15] ^
[14:54:15] SMILES Parse Error: Failed parsing SMILES 'fCc1ncsc1C(=O)NCCN1CCCCCC1=O' for input: 'fCc1ncsc1C(=O)NCCN1CCCCCC1=O'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(CCNC(=O)C1CC1)NCC1CC=CCC1 | LOGP: 1.62 | QED: 0.72 | TPSA: 58.20 => fO=C(CCNC(=O)C1CC1)NCC1CC=CCC1F)N(F)F)C1CC1 | TPSA:F)F)F1111111111111111111 | T1)F1)F1F1)F1F41F4CO4CO4CO4CO4CO4CO4CO4CO4CO4CO4CO4CO4CO4CO4)F31F4)CO4CO4CO4)CO4)CO4CO4)C2C2)CO4CO4CO4CO4C2)C2)F31F31F4)F31)C2)CO4CO2C2C2)F31F31)F4)CO4C2F31F31C2C2)C2)C2F31C2C2)
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: O=C(Cc1ccccc1)NCc1cn[nH]c1 | LOGP: 2.28 | QED: 0.84 | TPSA: 56.15 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 50])


[14:54:16] SMILES Parse Error: syntax error while parsing: fO=C(CCNC(=O)C1CC1)NCC1CC=CCC1F)N(F)F)C1CC1
[14:54:16] SMILES Parse Error: check for mistakes around position 1:
[14:54:16] fO=C(CCNC(=O)C1CC1)NCC1CC=CCC1F)N(F)F)C1C
[14:54:16] ^
[14:54:16] SMILES Parse Error: Failed parsing SMILES 'fO=C(CCNC(=O)C1CC1)NCC1CC=CCC1F)N(F)F)C1CC1' for input: 'fO=C(CCNC(=O)C1CC1)NCC1CC=CCC1F)N(F)F)C1CC1'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(Cc1ccccc1)NCc1cn[nH]c1 | LOGP: 2.28 | QED: 0.84 | TPSA: 56.15 => fC#CCOc1ccc(CC(=O)NCc2c(C)nn(C)c2C)cc1FOC)cc1FFA:1F | TPSA:2)cc1F)cc1F)F4111F4 | TPSA:FO4)O4O4O4 | TPSA:2)N1F)N1F)F)N1F)C1F)N1F)N1F)C1F)O4O4 | TPSA:1CCO4)N1F)N1F)N1CCO4)N1F)N1CCO4)C#N1CCO4)C1F)N1CCO4)N1CCO4)N1CCO4)N1F)N1F1CCO4)N1CCO4)C1CC
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: O=C(C1CCOCC1)N1CCOC2CCC1C2OCC1CC1 | LOGP: 1.60 | QED: 0.79 | TPSA: 48.00 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 53])


[14:54:17] SMILES Parse Error: syntax error while parsing: fC#CCOc1ccc(CC(=O)NCc2c(C)nn(C)c2C)cc1FOC)cc1FFA:1F
[14:54:17] SMILES Parse Error: check for mistakes around position 1:
[14:54:17] fC#CCOc1ccc(CC(=O)NCc2c(C)nn(C)c2C)cc1FOC
[14:54:17] ^
[14:54:17] SMILES Parse Error: Failed parsing SMILES 'fC#CCOc1ccc(CC(=O)NCc2c(C)nn(C)c2C)cc1FOC)cc1FFA:1F' for input: 'fC#CCOc1ccc(CC(=O)NCc2c(C)nn(C)c2C)cc1FOC)cc1FFA:1F'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(C1CCOCC1)N1CCOC2CCC1C2OCC1CC1 | LOGP: 1.60 | QED: 0.79 | TPSA: 48.00 => fO=C(C1CCOCC1)N1CCOC2CCC1C2OCC1CC1)O2CC1)C2)C1F1FFFFFF4F)F1FF7F4F4F4F4F4F4F4F4F4F)F4F4F4F4F31F31 | TPSA:4CC1F4F31F21 | TPSA:4F21 | LOGP:4CCC1F4F4F31)F4C1F31)F31)F4C4F4C4C4C4C4C4F4C4C4F4O4C4C4C1C4C1CO4C1N1F31O4C1N1O1N1F4C2CO4C4C4CO1O4C4C4
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: O=C(Nc1nc[nH]n1)c1ccccc1 | LOGP: 1.19 | QED: 0.87 | TPSA: 88.91 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 49])


[14:54:18] SMILES Parse Error: syntax error while parsing: fO=C(C1CCOCC1)N1CCOC2CCC1C2OCC1CC1)O2CC1)C2)C1F1FFFFFF4F)F1FF7F4F4F4F4F4F4F4F4F4F)F4F4F4F4F31F31
[14:54:18] SMILES Parse Error: check for mistakes around position 1:
[14:54:18] fO=C(C1CCOCC1)N1CCOC2CCC1C2OCC1CC1)O2CC1)
[14:54:18] ^
[14:54:18] SMILES Parse Error: Failed parsing SMILES 'fO=C(C1CCOCC1)N1CCOC2CCC1C2OCC1CC1)O2CC1)C2)C1F1FFFFFF4F)F1FF7F4F4F4F4F4F4F4F4F4F)F4F4F4F4F31F31' for input: 'fO=C(C1CCOCC1)N1CCOC2CCC1C2OCC1CC1)O2CC1)C2)C1F1FFFFFF4F)F1FF7F4F4F4F4F4F4F4F4F4F)F4F4F4F4F31F31'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(Nc1nc[nH]n1)c1ccccc1 | LOGP: 1.19 | QED: 0.87 | TPSA: 88.91 => fCC(=O)Nc1ccc(C(=O)Nc2ncn(C)n2)cc1OC(F)F | TPSA:F)cc1F | TPSA: 99.1CC1)C1)C111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: O=C(CCc1ccccc1)NCC1CCCO1 | LOGP: 2.12 | QED: 0.90 | TPSA: 62.12 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 46])


[14:54:19] SMILES Parse Error: syntax error while parsing: fCC(=O)Nc1ccc(C(=O)Nc2ncn(C)n2)cc1OC(F)F
[14:54:19] SMILES Parse Error: check for mistakes around position 1:
[14:54:19] fCC(=O)Nc1ccc(C(=O)Nc2ncn(C)n2)cc1OC(F)F
[14:54:19] ^
[14:54:19] SMILES Parse Error: Failed parsing SMILES 'fCC(=O)Nc1ccc(C(=O)Nc2ncn(C)n2)cc1OC(F)F' for input: 'fCC(=O)Nc1ccc(C(=O)Nc2ncn(C)n2)cc1OC(F)F'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(CCc1ccccc1)NCC1CCCO1 | LOGP: 2.12 | QED: 0.90 | TPSA: 62.12 => fCOc1cccc(CCC(=O)NCC2CCCO2)c1OC(C)C2C1C1)C2 | TPSA:2 | TPSA:2)C1C1O1C1O1)C1C1)O1C1F1F)C1F)C1C1C1C1C1C1C1C1C1C1C1C1)C1C1C1C1C1C1C1)C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1)C1C1C1 | LOGPPPP:C1C1C1C1O1C1C1C1)C1O1C1C1C1C1C1C1O1O1C1C1C1C1)C1C
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: O=C(NC1CCNCC1)c1ccc[nH]1 | LOGP: 0.98 | QED: 0.82 | TPSA: 74.43 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 48])


[14:54:20] SMILES Parse Error: syntax error while parsing: fCOc1cccc(CCC(=O)NCC2CCCO2)c1OC(C)C2C1C1)C2
[14:54:20] SMILES Parse Error: check for mistakes around position 1:
[14:54:20] fCOc1cccc(CCC(=O)NCC2CCCO2)c1OC(C)C2C1C1)
[14:54:20] ^
[14:54:20] SMILES Parse Error: Failed parsing SMILES 'fCOc1cccc(CCC(=O)NCC2CCCO2)c1OC(C)C2C1C1)C2' for input: 'fCOc1cccc(CCC(=O)NCC2CCCO2)c1OC(C)C2C1C1)C2'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(NC1CCNCC1)c1ccc[nH]1 | LOGP: 0.98 | QED: 0.82 | TPSA: 74.43 => fCOC(=O)C1CCN(C(=O)c2cc(C#N)cn2C)CC1C(C)C)CC1C)C111C)C111 | TC1)C1 | TPSA | TPSA:)N1)O)[n1)C1)N1)C1)O)=O)[n1)C1)C1)N1)N1)C1C1C1)C1C1C1)C1C1C1C1)C1)C1C1C1C1C1)C1C1C1C1C1C1CC1C1C1C1C1C1C1C1C1C1C1C#N1C1C1C1)C#N1C1O)=O)=O)=O1)N1C#N1)
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: O=C(NCc1nnc2n1CCCCC2)C1CCNCC1 | LOGP: 0.51 | QED: 0.84 | TPSA: 89.35 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 50])


[14:54:21] SMILES Parse Error: syntax error while parsing: fCOC(=O)C1CCN(C(=O)c2cc(C#N)cn2C)CC1C(C)C)CC1C)C111C)C111
[14:54:21] SMILES Parse Error: check for mistakes around position 1:
[14:54:21] fCOC(=O)C1CCN(C(=O)c2cc(C#N)cn2C)CC1C(C)C
[14:54:21] ^
[14:54:21] SMILES Parse Error: Failed parsing SMILES 'fCOC(=O)C1CCN(C(=O)c2cc(C#N)cn2C)CC1C(C)C)CC1C)C111C)C111' for input: 'fCOC(=O)C1CCN(C(=O)c2cc(C#N)cn2C)CC1C(C)C)CC1C)C111C)C111'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(NCc1nnc2n1CCCCC2)C1CCNCC1 | LOGP: 0.51 | QED: 0.84 | TPSA: 89.35 => fCOCC(=O)N1CCC(C(=O)NCc2nnc3n2CCCCC3)CC1C1CC1 | TPSA:)C1111111111111111111111111 | TPSA:)C1)C1)C1)C1)C1C1)C1)C111)C1)C111C11111111111111111111111111111111111111111111114CO2 | LOGP:C1C2CO2C2 | LOGP:C14CO2CO2CO2C2C2C2C2 | LOGP:F2CO2CO1C2CO2CO2C2 |
Validity: 0.00
Uniqueness: 0.00
Average Tanimoto similarity to training set: 0.00

Generating 5 example molecules...

Example 1:
Scaffold: c1ccc2c(c1)NCCO2
Requested LogP: 0.06
Requested QED: 0.81
Requested TPSA: 70.67
ÁîüÊàêÊù°‰ª∂: SCAFFOLD: c1ccc2c(c1)NCCO2 | LOGP: 0.06 | QED: 0.81 | TPSA: 70.67 =>
ËæìÂÖ•Â∫èÂàóÈïøÂ∫¶: torch.Size([1, 42])


[14:54:23] SMILES Parse Error: syntax error while parsing: fCOCC(=O)N1CCC(C(=O)NCc2nnc3n2CCCCC3)CC1C1CC1
[14:54:23] SMILES Parse Error: check for mistakes around position 1:
[14:54:23] fCOCC(=O)N1CCC(C(=O)NCc2nnc3n2CCCCC3)CC1C
[14:54:23] ^
[14:54:23] SMILES Parse Error: Failed parsing SMILES 'fCOCC(=O)N1CCC(C(=O)NCc2nnc3n2CCCCC3)CC1C1CC1' for input: 'fCOCC(=O)N1CCC(C(=O)NCc2nnc3n2CCCCC3)CC1C1CC1'


ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: c1ccc2c(c1)NCCO2 | LOGP: 0.06 | QED: 0.81 | TPSA: 70.67 => fCNC(=O)CNC(=O)CN1CCOc2ccc(F)cc21 | TPSA: 87.7 => fOCC1Cc1ccccc1F)C1C(=O)NCCO1C1C)C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1)N1C1C1C1 | LOGP:1C1)O1)O1C1C11C1C1 | LOGP:: 3.77 | TPSA: 3.75 | TPSA::: 0.89 | TPSA:::: 0.75 | TPSA: 3.2)N1C1C1)N1)N1C1C1C1C1C1C1C1C1C1
ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: c1ccc2c(c1)NCCO2 | LOGP: 0.06 | QED: 0.81 | TPSA: 70.67 => fCC(=O)NCC(=O)N1CCOc2ccc(F)cc21 | TPSA: 111.9 => fOCCO)C1OCCOc1ccccc1N1CCO1)C(C)=O)O1)C111C1O1C1C1C#N1C1C1C1C1C#N1C#N1C1C#N1C1C1F)C1C1C1C1C1C1C1C1C1C1C1 | LOGP::1C1C1C1C1C1 | LOGP:: | LOGP::::::: 0.75 | TPSA:: TPSA:: TPSA: TPSA::: TPSA: TPSA: TPSA: TPSA: TPSA:O4C#N)C#
ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: c1ccc2c(c1)NCCO2 | LOGP: 0.06 | QED: 0.81 | TPSA: 70.67 => fCCNC(=O)CN1CCOc2ccc(C(=O)NCCO)cc21 | TPSA: 85.5 => fO)cc1F)C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1)C1)O1C1 | TPSA:::::: 0.4N1C1C1)C1C1C1C1C1C1C1C1C1C1C1C1)C1C1C1 | LOGP::::::::::::::::::::: 0.78 | TP

[14:54:24] SMILES Parse Error: syntax error while parsing: fOCC1Cc1ccccc1F)C1C(=O)NCCO1C1C)C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1)N1C1C1C1
[14:54:24] SMILES Parse Error: check for mistakes around position 1:
[14:54:24] fOCC1Cc1ccccc1F)C1C(=O)NCCO1C1C)C1C1C1C1C
[14:54:24] ^
[14:54:24] SMILES Parse Error: Failed parsing SMILES 'fOCC1Cc1ccccc1F)C1C(=O)NCCO1C1C)C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1)N1C1C1C1' for input: 'fOCC1Cc1ccccc1F)C1C(=O)NCCO1C1C)C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1C1)N1C1C1C1'
[14:54:24] SMILES Parse Error: syntax error while parsing: fOCCO)C1OCCOc1ccccc1N1CCO1)C(C)=O)O1)C111C1O1C1C1C#N1C1C1C1C1C#N1C#N1C1C#N1C1C1F)C1C1C1C1C1C1C1C1C1C1C1
[14:54:24] SMILES Parse Error: check for mistakes around position 1:
[14:54:24] fOCCO)C1OCCOc1ccccc1N1CCO1)C(C)=O)O1)C111
[14:54:24] ^
[14:54:24] SMILES Parse Error: Failed parsing SMILES 'fOCCO)C1OCCOc1ccccc1N1CCO1)C(C)=O)O1)C111C1O1C1C1C#N1C1C1C1C1C#N1C#N1C1C#N1C1C1F)C1C1C1C1C1C1C1C1C1C1C1' for input: 'fOCCO)C1OCC

ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(CNc1nnnn1-c1ccccc1)N1CCCC1 | LOGP: 0.70 | QED: 0.89 | TPSA: 75.94 => fO=C(CNc1nnnn1-c1ccccc1)N1CCCC1CO)N1CCCC1FFFFFF)FFF1FF1F | TPSAA:44 | TPSA:74)nnnnnn1F)N1)F)N1F)F)F)N1F)F1F)N1F41F1F1F)N1F1F)F1F41F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F41F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F
ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(CNc1nnnn1-c1ccccc1)N1CCCC1 | LOGP: 0.70 | QED: 0.89 | TPSA: 75.94 => fO=C(CNc1nnnn1-c1ccccc1)N1CCCC1CO)N1CCCC1FFFFFFFFFF | Tcc1FF | T4F)FF1F44F111F1F | TPSA:F)F1F4 | TPSA:F4 | TPSA:F4 | TPSA:F)N1F9F1F9F4cncccccccccccccccccc1O4cn1F)N1F)N1F)no1F1F)on1F1F)F4cn1F1F1F1F1F1O41F1F41F1F1F1O41F411F11F11F1F1111F1F1F411O4111
ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(CNc1nnnn1-c1ccccc1)N1CCCC1 | LOGP: 0.70 | QED: 0.89 | TPSA: 75.94 => fO=C(CNc1nnnn1-c1ccccc1)N1CCCC1CO)N1CCCC1FFFF)FFFF1F1F | TPSA:F | TPSA:74)F1F1F4)N1F)N1ccccccccccccccn1F)N1F)F)F)F)F1F)F)N1F1F1F1F1F1F1F1F1F)N1F)F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F1F 4.111F11F 4.41F1F1F1F1F 4.41F1nnnn

[14:54:25] SMILES Parse Error: syntax error while parsing: fO=C(CNc1nnnn1-c1ccccc1)N1CCCC1CO)N1CCCC1FFFFFF)FFF1FF1F
[14:54:25] SMILES Parse Error: check for mistakes around position 1:
[14:54:25] fO=C(CNc1nnnn1-c1ccccc1)N1CCCC1CO)N1CCCC1
[14:54:25] ^
[14:54:25] SMILES Parse Error: Failed parsing SMILES 'fO=C(CNc1nnnn1-c1ccccc1)N1CCCC1CO)N1CCCC1FFFFFF)FFF1FF1F' for input: 'fO=C(CNc1nnnn1-c1ccccc1)N1CCCC1CO)N1CCCC1FFFFFF)FFF1FF1F'
[14:54:25] SMILES Parse Error: syntax error while parsing: fO=C(CNc1nnnn1-c1ccccc1)N1CCCC1CO)N1CCCC1FFFFFFFFFF
[14:54:25] SMILES Parse Error: check for mistakes around position 1:
[14:54:25] fO=C(CNc1nnnn1-c1ccccc1)N1CCCC1CO)N1CCCC1
[14:54:25] ^
[14:54:25] SMILES Parse Error: Failed parsing SMILES 'fO=C(CNc1nnnn1-c1ccccc1)N1CCCC1CO)N1CCCC1FFFFFFFFFF' for input: 'fO=C(CNc1nnnn1-c1ccccc1)N1CCCC1CO)N1CCCC1FFFFFFFFFF'
[14:54:25] SMILES Parse Error: syntax error while parsing: fO=C(CNc1nnnn1-c1ccccc1)N1CCCC1CO)N1CCCC1FFFF)FFFF1F1F
[14:54:25] SMILES Parse Error: chec

ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: c1ccc(-c2cscn2)cc1 | LOGP: 3.89 | QED: 0.89 | TPSA: 71.09 => fCC(C)(C)OC(=O)Nc1nc(-c2ccccc2)cs1 | TCCO)c1C(C)C1ccccc1F)O1111111O1F 4.1F1F1F4O1F1F4)F4F4)F1F)F1F4F4 | T1F4O4O4F4)N1F4)F4)N44 | TPS1F4O4)C1F444 | TPSA:9F4CO444CO4CO42F4F4F4CO4O4)N42F1F4)N4N1F4N44O42)N4O4F42F42F42F4F4N4F4)N1F2)N42F1F1F2F2F4
ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: c1ccc(-c2cscn2)cc1 | LOGP: 3.89 | QED: 0.89 | TPSA: 71.09 => fCC(C)C(=O)Nc1nc(-c2ccc(Cl)cc2)cs1 | TPSA: 42.4)cc1C(=O)NC1C)C1C11C1C1C11 | LOGP: 3.1 | T1)C1C1C1C1C1C1C1C1C1C1C1C1C1)C1C1C1C1C1C1)C1)C1C1)C1C1C1C1C1C1C1C1C1C1C1C1C1C1)C1C1C1C1C1C1C1)C1C1C1C1)C1C1C1C1C11C1)C1C1C11C11C1 | LOGPPPPPPPPPPPPPP:
ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: c1ccc(-c2cscn2)cc1 | LOGP: 3.89 | QED: 0.89 | TPSA: 71.09 => fCC(=O)Nc1nc(-c2ccc(NC(=O)CC(C)C)cc2)cs1 | TPSA: 42.0)cs1 | TPSA:)C1CC1)C11111111111111111111)F1 | LOGP:::1)F111111)C11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
  Generat

[14:54:26] SMILES Parse Error: syntax error while parsing: fCC(C)(C)OC(=O)Nc1nc(-c2ccccc2)cs1
[14:54:26] SMILES Parse Error: check for mistakes around position 1:
[14:54:26] fCC(C)(C)OC(=O)Nc1nc(-c2ccccc2)cs1
[14:54:26] ^
[14:54:26] SMILES Parse Error: Failed parsing SMILES 'fCC(C)(C)OC(=O)Nc1nc(-c2ccccc2)cs1' for input: 'fCC(C)(C)OC(=O)Nc1nc(-c2ccccc2)cs1'
[14:54:26] SMILES Parse Error: syntax error while parsing: fCC(C)C(=O)Nc1nc(-c2ccc(Cl)cc2)cs1
[14:54:26] SMILES Parse Error: check for mistakes around position 1:
[14:54:26] fCC(C)C(=O)Nc1nc(-c2ccc(Cl)cc2)cs1
[14:54:26] ^
[14:54:26] SMILES Parse Error: Failed parsing SMILES 'fCC(C)C(=O)Nc1nc(-c2ccc(Cl)cc2)cs1' for input: 'fCC(C)C(=O)Nc1nc(-c2ccc(Cl)cc2)cs1'
[14:54:26] SMILES Parse Error: syntax error while parsing: fCC(=O)Nc1nc(-c2ccc(NC(=O)CC(C)C)cc2)cs1
[14:54:26] SMILES Parse Error: check for mistakes around position 1:
[14:54:26] fCC(=O)Nc1nc(-c2ccc(NC(=O)CC(C)C)cc2)cs1
[14:54:26] ^
[14:54:26] SMILES Parse Error: Failed parsing 

ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C1CCN(CCC(=O)c2ccccc2)CC1 | LOGP: 2.55 | QED: 0.78 | TPSA: 46.61 => fO=C1CCN(CCC(=O)c2ccc(Cl)cc2)CC1=O | TPSA: 75.4)CC1C1C1C1C1C1C1C1C1C1C1C1 | LOGP:C1C1 | TPSA: 2.6 | TPSA:2 | TPSA:2=O4)C1C1C1)C1)C1C1)C1C1C1C1C1)C1C1C1C1C1C1)C1C1)C1C1C1)C1(N)=O)=O4C1C1(C#N)=O4C#N)=O)=O4N)=O4C#N)=O4C#N)=O4O4O4O4CO4O1O4O4O4O4O1
ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C1CCN(CCC(=O)c2ccccc2)CC1 | LOGP: 2.55 | QED: 0.78 | TPSA: 46.61 => fC=CCOc1cccc(C(=O)CCN2CCC(=O)CC2)c1F)C1CC1C)C1C)C1C1C1 | TPSA: | TPSA:2)C1C1)C1C1C1C1C1C1)C1C1)C1C1)C1C1C1C1C1C1C1)C1C1C1C1C1)C1)C1C1C1C1C1C11C1)C11C1C1C1)C1C11C1C1C1)C1 | LOGP: | LOGP: | LOGP::::1C1C1C1C1 | LOGP::1C1C1C1C1C1C1C1C1)C1)C1
ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C1CCN(CCC(=O)c2ccccc2)CC1 | LOGP: 2.55 | QED: 0.78 | TPSA: 46.61 => fC=CCOc1ccc(C(=O)CCN2CCC(=O)CC2)cc1F)cc1F | TPSA:FFOLD:FFOLD:)C1CC1 | TPSA:)C1CC1 | T1 | T1)cc1)F)nn1C1C1)N)C1C1C1)C1CC1)C1C1)C1C1C1C1C1C1C1C1)N1C1C1C1C1)C1C1C1C1 | TPSA:1C1C1O1C1C1C1C1C1C1C1C1C1C1C1)C1 | LOGP:1 | LOGP

[14:54:28] SMILES Parse Error: syntax error while parsing: fO=C1CCN(CCC(=O)c2ccc(Cl)cc2)CC1=O
[14:54:28] SMILES Parse Error: check for mistakes around position 1:
[14:54:28] fO=C1CCN(CCC(=O)c2ccc(Cl)cc2)CC1=O
[14:54:28] ^
[14:54:28] SMILES Parse Error: Failed parsing SMILES 'fO=C1CCN(CCC(=O)c2ccc(Cl)cc2)CC1=O' for input: 'fO=C1CCN(CCC(=O)c2ccc(Cl)cc2)CC1=O'
[14:54:28] SMILES Parse Error: syntax error while parsing: fC=CCOc1cccc(C(=O)CCN2CCC(=O)CC2)c1F)C1CC1C)C1C)C1C1C1
[14:54:28] SMILES Parse Error: check for mistakes around position 1:
[14:54:28] fC=CCOc1cccc(C(=O)CCN2CCC(=O)CC2)c1F)C1CC
[14:54:28] ^
[14:54:28] SMILES Parse Error: Failed parsing SMILES 'fC=CCOc1cccc(C(=O)CCN2CCC(=O)CC2)c1F)C1CC1C)C1C)C1C1C1' for input: 'fC=CCOc1cccc(C(=O)CCN2CCC(=O)CC2)c1F)C1CC1C)C1C)C1C1C1'
[14:54:28] SMILES Parse Error: syntax error while parsing: fC=CCOc1ccc(C(=O)CCN2CCC(=O)CC2)cc1F)cc1F
[14:54:28] SMILES Parse Error: check for mistakes around position 1:
[14:54:28] fC=CCOc1ccc(C(=O)CCN2CCC(=O)CC2)

ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(Oc1ccccc1)C1CC(=O)N(Cc2ccco2)C1 | LOGP: 2.80 | QED: 0.63 | TPSA: 59.75 => fCc1ccc(C)c(OC(=O)C2CC(=O)N(Cc3ccco3)C2)c1C(=O)N(C)C)C1C2C1C1C1C1C1C1C111C11C1C1C11C11C1C11C1C1C1C1C11C1C1C1C1C1C1111C1C11C11111C111C1111 | LOGP: 0. | LOGP: 0.96 | TPSA: 0.76 | TPSA: 0. | TPSA: 0.62 | TPSA:: 3. | TPSA:2 | TPSA:1C1C11111C1
ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(Oc1ccccc1)C1CC(=O)N(Cc2ccco2)C1 | LOGP: 2.80 | QED: 0.63 | TPSA: 59.75 => fCc1ccc(OC(=O)C2CC(=O)N(Cc3ccco3)C2)c(C)c1C)C1C1CC1 | TPSA:)O1)O1)C1)C11)C11)C1111F)C1C1C1F)O1C111F)N1F)C1F)C1F)N1F)C1F)C1C1F)C1F)C1F)C1F)C1F1C1C1C1F)C1FF1C1FF4O4)C1C1C1C1CC1F)C1C1F1C1C1F4O4O4O4O1F)N1C1C1C1F)
ÂéüÂßãÁîüÊàêÊñáÊú¨: SCAFFOLD: O=C(Oc1ccccc1)C1CC(=O)N(Cc2ccco2)C1 | LOGP: 2.80 | QED: 0.63 | TPSA: 59.75 => fO=C(Oc1cccc(Cl)c1)C1CC(=O)N(Cc2ccco2)C1C1CC1 | TPSA: TPSA: 75.66111 | TPSA: 75.71)C1)C11)OCC1)N1)N11111F)N111F)C111111F)C1F)C1C111111111111F)N1C1C1F)C11C11C11C1C1C1C1CO1C1CO1CO1C1C1C1CO1CO1CO1CO1C1C1CO1CO1CO1CO1CO1CO1CO1CO1CO1C

[14:54:29] SMILES Parse Error: syntax error while parsing: fCc1ccc(C)c(OC(=O)C2CC(=O)N(Cc3ccco3)C2)c1C(=O)N(C)C)C1C2C1C1C1C1C1C1C111C11C1C1C11C11C1C11C1C1C1C1C11C1C1C1C1C1C1111C1C11C11111C111C1111
[14:54:29] SMILES Parse Error: check for mistakes around position 1:
[14:54:29] fCc1ccc(C)c(OC(=O)C2CC(=O)N(Cc3ccco3)C2)c
[14:54:29] ^
[14:54:29] SMILES Parse Error: Failed parsing SMILES 'fCc1ccc(C)c(OC(=O)C2CC(=O)N(Cc3ccco3)C2)c1C(=O)N(C)C)C1C2C1C1C1C1C1C1C111C11C1C1C11C11C1C11C1C1C1C1C11C1C1C1C1C1C1111C1C11C11111C111C1111' for input: 'fCc1ccc(C)c(OC(=O)C2CC(=O)N(Cc3ccco3)C2)c1C(=O)N(C)C)C1C2C1C1C1C1C1C1C111C11C1C1C11C11C1C11C1C1C1C1C11C1C1C1C1C1C1111C1C11C11111C111C1111'
[14:54:29] SMILES Parse Error: syntax error while parsing: fCc1ccc(OC(=O)C2CC(=O)N(Cc3ccco3)C2)c(C)c1C)C1C1CC1
[14:54:29] SMILES Parse Error: check for mistakes around position 1:
[14:54:29] fCc1ccc(OC(=O)C2CC(=O)N(Cc3ccco3)C2)c(C)c
[14:54:29] ^
[14:54:29] SMILES Parse Error: Failed parsing SMILES 'fCc1ccc(OC(=O)C2CC(=O)N(