## Part 4: Designing improved inhibitors for the Zika virus NS2B‚ÄìNS3 protease (PDB: 7I9O) ü¶ü‚ùå

Following Part 3 where we introduced the NS2B-NS3 protease target and the resolved inhibitor (PDB: 7I9O), in this notebook you will make your own decisions acting as an early-stage antiviral drug designer. The following information was already defined in Part 2 but included to remind you again.

Our biological target is the **Zika virus NS2B‚ÄìNS3 protease**, a serine protease formed by the NS3 catalytic domain together with its NS2B cofactor. This protease is essential for ZIKV replication because it cleaves the viral polyprotein into the individual structural and non-structural proteins the virus needs to assemble and replicate, which makes it a high-value antiviral target. ([Nature][1])
We will use the experimentally solved crystal structure **7I9O**, which captures the ZIKV NS2B‚ÄìNS3 protease bound to a small-molecule inhibitor. ([rcsb.org][2])

You are given a **core scaffold** derived from a weak hit in that pocket. The atoms in the core are ‚Äúlocked‚Äù: we assume they are important for binding. Certain positions on the scaffold are marked with `*`. Those `*` positions are attachment points where we are allowed to grow new **R-groups** to make the molecule bind better and look more like a viable lead.

We will use **LibINVENT**, a scaffold-decorator model, to explore these R-groups. LibINVENT proposes substituents for the `*` sites and we will train/steer it with reinforcement learning. The key lever you control is the **scoring function** in `zika.toml`: you will define what ‚Äúgood‚Äù means (for example: reasonable molecular weight, acceptable physicochemical properties, no obvious liabilities, etc.), and LibINVENT will try to generate molecules that satisfy that profile.

Your workflow in this notebook following the same flow as Part 2 with many spots for tweaking:

1. **Define a scoring function**
   Edit the TOML so that high score = ‚Äúthis looks like a plausible NS2B‚ÄìNS3 protease inhibitor and a drug-like small molecule‚Äù.
2. **Generate candidates with LibINVENT**
   Run RL to sample decorated molecules starting from the given scaffold.
3. **Triage / down-select**
   You cannot dock thousands of structures. You should prioritise and filter the generated molecules (chemistry sanity, diversity, properties) and choose at most ~100 molecules that are worth docking into 7I9O.
4. **Nominate synthesis candidates**
   From the docked / prioritised set, choose your final **top 10 compounds**. These 10 are the ones you would hand to medicinal chemistry as proposed ‚Äúnext-step‚Äù lead ideas against the Zika NS2B‚ÄìNS3 protease. Use py3Dmol to calculate interactions and use in your proposal!!

[1]: https://www.nature.com/articles/s41467-025-63602-z?utm_source=chatgpt.com "Combined crystallographic fragment screening and deep ..."
[2]: https://www.rcsb.org/structure/7i9o?utm_source=chatgpt.com "7I9O: Group deposition of ZIKV NS2B-NS3 protease in ..."


### Reminder on files to use:

- `data/scaffold.smi` - Scaffold with `*` attachment points for R-group decoration
- `data/7I9O-receptor.pdb` / `data/7I9O-receptor.pdbqt` - Receptor structure for docking
- `data/7I9O-ligand.sdf` - Original inhibitor ligand (reference structure)
- `config/zika.toml` - Template for scoring function configuration for LibINVENT RL training
- `priors/libinvent.prior` - LibINVENT prior model


In [1]:
# Import all required libraries
import py3Dmol
import os
import subprocess
import pandas as pd
import numpy as np
import yaml
import gc

from pathlib import Path
from rdkit import Chem
from rdkit.Chem import Draw, Descriptors, Lipinski, FilterCatalog, rdMolDescriptors
from rdkit import RDLogger

# Disable RDKit warnings
RDLogger.DisableLog('rdApp.*')

# Set thread limits for compatibility
os.environ["OMP_NUM_THREADS"] = "1"
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["OPENBLAS_NUM_THREADS"] = "1"

# Import MDAnalysis and prolif after setting thread limits
import MDAnalysis as mda
import prolif

  from pkg_resources import resource_filename
MDAnalysis.topology.tables has been moved to MDAnalysis.guesser.tables. This import point will be removed in MDAnalysis version 3.0.0


In [3]:
# Continue!!