# Streamlined ALS → SOD1 Ligand Discovery Workflow

This notebook demonstrates a minimal pipeline to go from ALS (Amyotrophic lateral sclerosis) to the SOD1 target, retrieve known ligands, and export them for docking.

## Set Disease and Search for ALS Target Genes
ALS (Amyotrophic lateral sclerosis) and search for associated target genes.

In [59]:
# Set disease and search for ALS target genes using Open Targets Platform (robust to network errors)
import requests
import ipywidgets as widgets
from IPython.display import display, clear_output

disease_name = "Amyotrophic lateral sclerosis"
# EFO ID for ALS: Orphanet_803 (can also use EFO_0000253 for ALS)
efo_id = "Orphanet_803"  # Or use "EFO_0000253"

gene_hits = []
try:
    ot_url = f"https://platform-api.opentargets.io/v3/platform/public/association/filter?disease={efo_id}&size=30&fields=target.gene_info.symbol,target.gene_info.name,association_score.overall"
    resp = requests.get(ot_url, timeout=10)
    if resp.status_code == 200:
        data = resp.json()
        for hit in data.get('data', []):
            symbol = hit['target']['gene_info']['symbol']
            name = hit['target']['gene_info'].get('name', '')
            score = hit['association_score']['overall']
            gene_hits.append({'symbol': symbol, 'name': name, 'score': score})
    else:
        print('Failed to retrieve ALS target genes from Open Targets (bad status code).')
except Exception as e:
    print('Could not connect to Open Targets API. Using static ALS gene list.')

if gene_hits:
    # Sort by association score, descending
    gene_hits = sorted(gene_hits, key=lambda x: x['score'], reverse=True)
    als_genes = [g['symbol'] for g in gene_hits]
    print('Top ALS target genes:', als_genes)
else:
    als_genes = [
        "SOD1", "FUS", "TARDBP", "C9orf72", "UBQLN2", "OPTN", "VCP", "ANG", "SETX", "PFN1", "CHCHD10", "TBK1", "NEK1", "MATR3", "TIA1", "HNRNPA1", "HNRNPA2B1", "ALS2", "DAO", "FIG4", "SPG11", "ATXN2", "GRN", "SQSTM1", "SIGMAR1", "DCTN1", "KIF5A"
    ]
    print('Using static ALS gene list.')

gene_dropdown = widgets.Dropdown(
    options=als_genes,
    value=als_genes[0],
    description="ALS Gene:",
    style={'description_width': 'initial'}
 )

def on_gene_change(change):
    global target_gene
    target_gene = change['new']
    clear_output(wait=True)
    display(gene_dropdown)
    print(f"Selected ALS target gene: {target_gene}")

gene_dropdown.observe(on_gene_change, names='value')
display(gene_dropdown)
target_gene = gene_dropdown.value
print(f"Selected ALS target gene: {target_gene}")

Dropdown(description='ALS Gene:', index=1, options=('SOD1', 'FUS', 'TARDBP', 'C9orf72', 'UBQLN2', 'OPTN', 'VCP…

https://search.rcsb.org/query-editor.html



## Search PDB for SOD1 Co-crystal Structures and Ligands
We will search the RCSB PDB for SOD1 (UniProt: P00441) structures with co-crystallized ligands and extract ligand information for benchmarking and docking.

In [66]:
# Paste your PDB ID list here (from batch search):
pdb_ids = [
    '2RSQ', '4RFX', '7WWT', '7WWY', '7WX0', '7WX1', '6FN8', '2NAM', '3GTT', '3LTV', '3GTV', '4OJA', '2MP3', '6FLH', '6Z3V', '6Z4H', '6Z4J', '6Z4K', '6Z4L', '6Z4M', '6Z4O', '8ZD5', '8ZD6', '3CE1', '1T6I', '1T6Q', '4NCQ', '8IMD', '3ECU', '3ECV', '3ECW', '6Z4G', '6Z4I', '8CCX', '8Q6M', '2CRL', '3G4Z', '3G50', '3L9Y', '5J07', '3G4X', '3GZQ', '3HFF', '5O3Y', '5O40', '7VZF', '8IHU', '8IHV', '8K3A', '9IYD', '9IYJ', '9JBO', '6FON', '3L9E', '5K02', '3CQP', '3GQF', '3GZP', '3H2Q', '3RE0', '6DTK', '7NXX', '7XX3', '8K33', '8K3L', '8YAF', '9IYK', '9JBP', '3GZO', '5J0C', '5YTU', '6A9O', '6FOI', '8YAT', '3CQQ', '3H2P', '6FP6', '2AF2', '5DLI', '5IIW', '5WOR', '6B79', '1OZT', '1OZU', '4A7S', '4A7T', '4A7V', '5J0F', '7T8E', '7T8F', '7T8G', '7T8H', '1T6U', '3QQD', '4A7U', '5YTO', '5YUL', '6SPI', '6SPJ', '6SPK',
]
import requests
import pandas as pd

# DEBUG: Inspect all entity types for a few PDB IDs to understand the structure
def print_all_entities_for_pdb(pdb_id):
    url = f'https://data.rcsb.org/rest/v1/core/entry/{pdb_id}'
    try:
        resp = requests.get(url, timeout=10)
        if resp.status_code == 200:
            data = resp.json()
            print(f'\nEntities for PDB {pdb_id}:')
            # Print polymer entities (proteins, nucleic acids)
            for i, entity in enumerate(data.get('polymer_entities', [])):
                print(f'  Polymer entity {i+1}:', entity.get('rcsb_polymer_entity_container_identifiers', {}).get('entity_id', ''), entity.get('entity_poly', {}).get('type', ''), entity.get('rcsb_polymer_entity', {}).get('pdbx_description', ''))
            # Print nonpolymer entities (ligands, ions, etc.)
            for i, entity in enumerate(data.get('nonpolymer_entities', [])):
                chem_comp = entity.get('chem_comp', {})
                print(f'  Nonpolymer entity {i+1}:', chem_comp.get('id', ''), chem_comp.get('name', ''), chem_comp.get('type', ''))
            # Print branched entities (carbohydrates, etc.)
            for i, entity in enumerate(data.get('branched_entities', [])):
                chem_comp = entity.get('chem_comp', {})
                print(f'  Branched entity {i+1}:', chem_comp.get('id', ''), chem_comp.get('name', ''), chem_comp.get('type', ''))
        else:
            print(f'No entry found for PDB {pdb_id}')
    except Exception as e:
        print(f'Error fetching entry for PDB {pdb_id}:', e)

# Inspect the first 3 PDB IDs for all entity types
for pdb_id in pdb_ids[:3]:
    print_all_entities_for_pdb(pdb_id)

# --- Enhanced batch extraction code below ---
all_results = []
empty_entity_pdbs = []
for pdb_id in pdb_ids:
    lig_url = f'https://data.rcsb.org/rest/v1/core/entry/{pdb_id}'
    try:
        lig_resp = requests.get(lig_url, timeout=10)
        if lig_resp.status_code == 200:
            data = lig_resp.json()
            found = False
            # Extract all non-protein entities: ligands, ions, nucleic acids, etc.
            for entity in data.get('nonpolymer_entities', []):
                chem_comp = entity.get('chem_comp', {})
                ligand_id = chem_comp.get('id', '')
                ligand_name = chem_comp.get('name', '')
                ligand_type = chem_comp.get('type', '')
                all_results.append({'pdb_id': pdb_id, 'entity_id': ligand_id, 'entity_name': ligand_name, 'entity_type': ligand_type})
                print(f'PDB: {pdb_id}, Entity: {ligand_id}, Name: {ligand_name}, Type: {ligand_type}')
                found = True
            if not data.get('nonpolymer_entities'):
                print(f'PDB: {pdb_id} has NO nonpolymer entities (ligands/ions)')
            if not found:
                empty_entity_pdbs.append(pdb_id)
        else:
            print(f'No entry found for PDB {pdb_id}')
    except Exception as e:
        print(f'Error fetching entry for PDB {pdb_id}:', e)

# Export results to CSV if any found
if all_results:
    df_all = pd.DataFrame(all_results)
    df_all.to_csv('batch_als_pdb_entities.csv', index=False)
    print('Exported batch_als_pdb_entities.csv with all non-protein entities for your PDB list.')
else:
    print('No non-protein entities found in the PDB entries you provided.')

# Log PDB IDs with no nonpolymer entities found
if empty_entity_pdbs:
    print(f'PDB entries with NO nonpolymer entities: {empty_entity_pdbs}')


Entities for PDB 2RSQ:

Entities for PDB 4RFX:

Entities for PDB 7WWT:
PDB: 2RSQ has NO nonpolymer entities (ligands/ions)

Entities for PDB 7WWT:
PDB: 2RSQ has NO nonpolymer entities (ligands/ions)
PDB: 4RFX has NO nonpolymer entities (ligands/ions)
PDB: 7WWT has NO nonpolymer entities (ligands/ions)
PDB: 4RFX has NO nonpolymer entities (ligands/ions)
PDB: 7WWT has NO nonpolymer entities (ligands/ions)
PDB: 7WWY has NO nonpolymer entities (ligands/ions)
PDB: 7WX0 has NO nonpolymer entities (ligands/ions)
PDB: 7WWY has NO nonpolymer entities (ligands/ions)
PDB: 7WX0 has NO nonpolymer entities (ligands/ions)
PDB: 7WX1 has NO nonpolymer entities (ligands/ions)
PDB: 6FN8 has NO nonpolymer entities (ligands/ions)
PDB: 7WX1 has NO nonpolymer entities (ligands/ions)
PDB: 6FN8 has NO nonpolymer entities (ligands/ions)
PDB: 2NAM has NO nonpolymer entities (ligands/ions)
PDB: 3GTT has NO nonpolymer entities (ligands/ions)
PDB: 2NAM has NO nonpolymer entities (ligands/ions)
PDB: 3GTT has NO non

## How to Search PDB for ALS Structures with Ligands Only

To retrieve only ALS-related PDB entries that contain ligands (non-protein entities), you can use the RCSB PDB Advanced Search with a query that combines your gene/protein of interest and a filter for the presence of nonpolymer entities.

**Example: RCSB Query JSON**

You can use the [RCSB Query Editor](https://search.rcsb.org/query-editor.html) to build this, or use the following JSON as a template:

```json
{
  "query": {
    "type": "group",
    "logical_operator": "and",
    "nodes": [
      {
        "type": "terminal",
        "service": "full_text",
        "parameters": { "value": "SOD1" }
      },
      {
        "type": "terminal",
        "service": "exists",
        "parameters": { "attribute": "rcsb_nonpolymer_instance_feature_summary.comp_id" }
      }
    ]
  },
  "return_type": "entry"
}
```

- Replace `"SOD1"` with any ALS gene/protein of interest.
- The `exists` node ensures only entries with at least one nonpolymer (ligand/ion) are returned.

**How to use:**
1. Go to the [RCSB Query Editor](https://search.rcsb.org/query-editor.html).
2. Paste the JSON above into the "JSON" tab.
3. Run the query to get a list of PDB IDs with ligands for your target.
4. Use these filtered PDB IDs in your batch extraction code.

This approach ensures you only analyze structures with co-crystallized ligands or small molecules.

## Retrieve UniProt Accession for SOD1

import requests
uniprot_url = f'https://rest.uniprot.org/uniprotkb/search?query=gene_exact:{target_gene}+AND+organism_id:9606&fields=accession,protein_name,gene_names,organism_name,length&format=tsv'
resp = requests.get(uniprot_url)
lines = resp.text.strip().split('\n')
if len(lines) > 1:
    acc = lines[1].split('\t')[0]
    print(f'UniProt accession for {target_gene}: {acc}')
else:
    acc = None
    print('No UniProt entry found.')

In [47]:
## Find Known Ligands for SOD1 in ChEMBL

In [48]:
from chembl_webresource_client.new_client import new_client
target = new_client.target
activity = new_client.activity
if acc:
    targets = list(target.filter(target_components__accession=acc))
    if targets:
        chembl_id = targets[0]['target_chembl_id']
        print(f'ChEMBL target ID: {chembl_id}')
        acts = activity.filter(target_chembl_id=chembl_id, standard_type='IC50')
        ligands = []
        for a in list(acts)[:10]:
            mol_id = a.get('molecule_chembl_id', 'NA')
            ic50 = a.get('standard_value', 'NA')
            print(f'Molecule: {mol_id}, IC50: {ic50}')
            ligands.append(mol_id)
    else:
        print('No ChEMBL target found for this UniProt accession.')
else:
    print('No UniProt accession available for ligand search.')

ChEMBL target ID: CHEMBL2354
Molecule: CHEMBL273030, IC50: 365800.0
Molecule: CHEMBL272808, IC50: 786900.0
Molecule: CHEMBL272641, IC50: 67900.0
Molecule: CHEMBL405899, IC50: 129300.0
Molecule: CHEMBL1672028, IC50: 96600.0
Molecule: CHEMBL1672029, IC50: 103300.0
Molecule: CHEMBL2179266, IC50: 83120.0
Molecule: CHEMBL2179265, IC50: 65980.0
Molecule: CHEMBL2179268, IC50: 71460.0
Molecule: CHEMBL2179267, IC50: 125320.0


In [49]:
## Export Ligand SMILES for Docking

import pandas as pd
molecule = new_client.molecule
ligand_smiles = []
for mol_id in ligands:
    mol_data = molecule.get(mol_id)
    smiles = mol_data.get('molecule_structures', {}).get('canonical_smiles', None)
    if smiles:
        ligand_smiles.append({'name': mol_id, 'smiles': smiles})
if ligand_smiles:
    df = pd.DataFrame(ligand_smiles)
    df.to_csv('ligands_for_docking.csv', index=False)
    print('Exported ligands_for_docking.csv with', len(df), 'ligands.')
else:
    print('No valid SMILES found for ligands.')

In [50]:
# Step 3: Find known ligands in ChEMBL
from chembl_webresource_client.new_client import new_client
target = new_client.target
activity = new_client.activity
if acc:
    # Find ChEMBL target for UniProt accession
    targets = target.filter(target_components__accession=acc)
    targets = list(targets)
    if targets:
        chembl_id = targets[0]['target_chembl_id']
        print(f'ChEMBL target ID: {chembl_id}')
        # Find bioactive molecules for this target
        acts = activity.filter(target_chembl_id=chembl_id, standard_type='IC50')
        print('Top 5 ligands with IC50:')
        ligands = []
        for a in list(acts)[:5]:
            mol_id = a.get('molecule_chembl_id', 'NA')
            ic50 = a.get('standard_value', 'NA')
            units = a.get('standard_units', '')
            print(f'Molecule: {mol_id}, IC50: {ic50} {units}')
            ligands.append(mol_id)
    else:
        print('No ChEMBL target found for this UniProt accession.')
else:
    print('No UniProt accession available for ligand search.')

ChEMBL target ID: CHEMBL2354
Top 5 ligands with IC50:
Molecule: CHEMBL273030, IC50: 365800.0 nM
Molecule: CHEMBL272808, IC50: 786900.0 nM
Molecule: CHEMBL272641, IC50: 67900.0 nM
Molecule: CHEMBL405899, IC50: 129300.0 nM
Molecule: CHEMBL1672028, IC50: 96600.0 nM


## Step 4: Analyze a Ligand's Molecular Properties
Let's pick a ligand from the previous step (e.g., CHEMBL311039) and analyze its structure and properties using RDKit.

In [51]:
# Step 4: Analyze a ligand's properties
ligand_chembl_id = 'CHEMBL311039'  # Replace with a ligand from above if desired
from chembl_webresource_client.new_client import new_client
molecule = new_client.molecule
mol_data = molecule.get(ligand_chembl_id)
smiles = mol_data.get('molecule_structures', {}).get('canonical_smiles', None)
print(f'Ligand {ligand_chembl_id} SMILES:', smiles)
if smiles:
    mol = Chem.MolFromSmiles(smiles)
    mw = Descriptors.MolWt(mol)
    logp = Descriptors.MolLogP(mol)
    print(f'Molecular Weight: {mw:.2f}')
    print(f'LogP: {logp:.2f}')
else:
    print('No SMILES found for this ligand.')

Ligand CHEMBL311039 SMILES: CC12CCC(C1)C(C)(C)C2NS(=O)(=O)c1ccc(F)cc1
Molecular Weight: 311.42
LogP: 3.32


## Step 5: Export Ligands for Docking
Now that we have identified ligands from ChEMBL, let's export their SMILES to a CSV file for use in the batch docking pipeline.

In [52]:
# Export top ligands to CSV for docking pipeline
import pandas as pd
from chembl_webresource_client.new_client import new_client
molecule = new_client.molecule
ligand_smiles = []
for mol_id in ligands:  # 'ligands' from previous ChEMBL query
    mol_data = molecule.get(mol_id)
    smiles = mol_data.get('molecule_structures', {}).get('canonical_smiles', None)
    if smiles:
        ligand_smiles.append({'name': mol_id, 'smiles': smiles})
if ligand_smiles:
    df = pd.DataFrame(ligand_smiles)
    df.to_csv('ligands_for_docking.csv', index=False)
    print('Exported ligands_for_docking.csv with', len(df), 'ligands.')
else:
    print('No valid SMILES found for ligands.')

Exported ligands_for_docking.csv with 5 ligands.


## Step 6: Run Docking Pipeline Script from Notebook
You can now run the docking pipeline script on the exported CSV. The following cell demonstrates how to call the script using Python's `subprocess` module. Make sure you have prepared your receptor PDBQT file and set the correct docking box coordinates.

In [53]:
# Example: Run docking pipeline script from notebook
import subprocess
import sys
import os

# Set these paths and parameters as needed
smiles_csv = 'ligands_for_docking.csv'
receptor_pdbqt = 'receptor.pdbqt'  # Path to your prepared receptor
outdir = './docking_run'
center = '10.5,-3.2,8.9'  # Example center coordinates
size = '22,22,22'         # Example box size
exhaustiveness = '8'
cpus = '4'
docking_script = 'docking_pipeline.py'  # Path to your script
vina_exec = 'vina'  # or 'smina' if preferred

# Build the command
cmd = [sys.executable, docking_script,
       '--smiles_csv', smiles_csv,
       '--smiles_column', 'smiles',
       '--name_column', 'name',
       '--outdir', outdir,
       '--receptor', receptor_pdbqt,
       '--center', center,
       '--size', size,
       '--exhaustiveness', exhaustiveness,
       '--cpus', cpus,
       '--vina_exec', vina_exec]

# Print the command for review
print('Running docking pipeline command:')
print(' '.join(cmd))

# Run the docking pipeline (uncomment to execute)
# result = subprocess.run(cmd)
# print('Docking pipeline finished with return code:', result.returncode)

Running docking pipeline command:
/Users/justin/drug-discovery-ai/venv/bin/python docking_pipeline.py --smiles_csv ligands_for_docking.csv --smiles_column smiles --name_column name --outdir ./docking_run --receptor receptor.pdbqt --center 10.5,-3.2,8.9 --size 22,22,22 --exhaustiveness 8 --cpus 4 --vina_exec vina


---

**Summary:**
This workflow demonstrates how to start from a gene (e.g., APP), find the corresponding protein (UniProt), retrieve known small-molecule ligands (ChEMBL), and analyze their molecular properties (RDKit). This bridges genomics and cheminformatics, enabling rational drug discovery starting from genetic information.

## Link to Book Chapter

For more details and explanations, see [Chapter 1: Genomics to Molecules](../chapters/chapter1-genomics-to-molecules.qmd) in the book.

In [67]:
# --- Batch extraction using RCSB GraphQL API for ligands/ions ---
import requests
import pandas as pd
import time

graphql_url = "https://data.rcsb.org/graphql"

graphql_query = '''
query getEntry($id: String!) {
  entry(entry_id: $id) {
    nonpolymer_entities {
      rcsb_nonpolymer_entity_container_identifiers {
        entry_id
        entity_id
        nonpolymer_comp_id
      }
      nonpolymer_comp {
        chem_comp {
          id
          name
          type
        }
      }
    }
  }
}
'''

def fetch_nonpolymer_entities_graphql(pdb_id):
    variables = {"id": pdb_id}
    try:
        resp = requests.post(
            graphql_url,
            json={"query": graphql_query, "variables": variables},
            timeout=10
        )
        if resp.status_code == 200:
            data = resp.json()
            entities = data.get("data", {}).get("entry", {}).get("nonpolymer_entities", [])
            results = []
            for entity in entities:
                comp = entity.get("nonpolymer_comp", {}).get("chem_comp", {})
                comp_id = comp.get("id", "")
                comp_name = comp.get("name", "")
                comp_type = comp.get("type", "")
                results.append({
                    "pdb_id": pdb_id,
                    "entity_id": entity.get("rcsb_nonpolymer_entity_container_identifiers", {}).get("entity_id", ""),
                    "entity_name": comp_name,
                    "entity_type": comp_type,
                    "comp_id": comp_id
                })
            return results
        else:
            print(f"GraphQL error for {pdb_id}: status {resp.status_code}")
            return []
    except Exception as e:
        print(f"GraphQL exception for {pdb_id}: {e}")
        return []

all_graphql_results = []
empty_graphql_pdbs = []
for pdb_id in pdb_ids:
    entities = fetch_nonpolymer_entities_graphql(pdb_id)
    if entities:
        for ent in entities:
            print(f"PDB: {ent['pdb_id']}, Entity: {ent['entity_id']}, Name: {ent['entity_name']}, Type: {ent['entity_type']}, CompID: {ent['comp_id']}")
        all_graphql_results.extend(entities)
    else:
        print(f"PDB: {pdb_id} has NO nonpolymer entities (GraphQL)")
        empty_graphql_pdbs.append(pdb_id)
    time.sleep(0.1)  # be polite to the API

if all_graphql_results:
    df_graphql = pd.DataFrame(all_graphql_results)
    df_graphql.to_csv('batch_als_pdb_entities_graphql.csv', index=False)
    print('Exported batch_als_pdb_entities_graphql.csv with all non-protein entities (GraphQL) for your PDB list.')
else:
    print('No non-protein entities found in the PDB entries you provided (GraphQL).')

if empty_graphql_pdbs:
    print(f'PDB entries with NO nonpolymer entities (GraphQL): {empty_graphql_pdbs}')


PDB: 2RSQ, Entity: 2, Name: COPPER (I) ION, Type: non-polymer, CompID: CU1
GraphQL exception for 4RFX: 'NoneType' object is not iterable
PDB: 4RFX has NO nonpolymer entities (GraphQL)
GraphQL exception for 4RFX: 'NoneType' object is not iterable
PDB: 4RFX has NO nonpolymer entities (GraphQL)
PDB: 7WWT, Entity: 2, Name: ZINC ION, Type: non-polymer, CompID: ZN
PDB: 7WWT, Entity: 3, Name: COPPER (II) ION, Type: non-polymer, CompID: CU
PDB: 7WWT, Entity: 2, Name: ZINC ION, Type: non-polymer, CompID: ZN
PDB: 7WWT, Entity: 3, Name: COPPER (II) ION, Type: non-polymer, CompID: CU
PDB: 7WWY, Entity: 2, Name: ZINC ION, Type: non-polymer, CompID: ZN
PDB: 7WWY, Entity: 3, Name: COPPER (II) ION, Type: non-polymer, CompID: CU
PDB: 7WWY, Entity: 2, Name: ZINC ION, Type: non-polymer, CompID: ZN
PDB: 7WWY, Entity: 3, Name: COPPER (II) ION, Type: non-polymer, CompID: CU
PDB: 7WX0, Entity: 3, Name: COPPER (II) ION, Type: non-polymer, CompID: CU
PDB: 7WX0, Entity: 2, Name: ZINC ION, Type: non-polymer, Com

Looking over the PDB results you shared, here’s a structured assessment regarding potential relevance for drug discovery:

---

### 1️⃣ Key observations

* Many of your PDB entries contain **metal ions** (ZINC, COPPER, NICKEL, CALCIUM) as non-polymer entities.
* Some entries contain **small-molecule ligands** (e.g., benzamide derivatives, Cisplatin, glycerol, MES, butane-1,4-dithiol).
* Several entries contain **common crystallization additives or buffers** (sulfate, acetate, chloride, DMSO, glycerol).
* Some entries had **GraphQL errors** or no non-polymer entities, so those likely are **not useful for small-molecule binding studies**.

---

### 2️⃣ Metal ions in drug discovery

* **Zinc, Copper, Nickel ions** are often cofactors for enzymes and can be targeted by **metal-chelating drugs**.
* If a PDB shows a ligand coordinated to these metal ions, it could indicate a **druggable active site**, especially for **metalloenzymes**.
* However, metal ions alone (without a bound organic ligand) are **less informative** for direct drug discovery—they just indicate a potential metal-binding site.

---

### 3️⃣ Small-molecule ligands worth noting

* Examples:

  * **Cisplatin (3RE0)** → known chemotherapy drug, demonstrates metal-based drug binding.
  * **N-phenyl-2-selanylbenzamide / related selenyl/fluoranyl-benzamides (6Z3V, 6Z4H, 6Z4I, etc.)** → these are potential inhibitors and could be explored in **enzyme inhibition assays**.
  * **Butane-1,4-dithiol, DIMETHYL SULFOXIDE** → often crystallization additives, usually not drug candidates.

**Takeaway:** PDBs with bound **organic inhibitors** are the most promising for drug discovery; metal ions alone are secondary targets.

---

### 4️⃣ Recommendations for prioritization

1. **Focus on entries with both metal ions and bound inhibitors**:

   * Examples: 6Z3V, 6Z4H, 6Z4I, 5O3Y, 5O40. These show ligands that could be optimized for drug design.
2. **Ignore PDBs with only buffer/metal ions** (SO4, Cl, MES, GOL) unless targeting a **metal-binding enzyme**.
3. **Check binding site conservation**:

   * Use software like PyMOL, ChimeraX, or MOE to visualize ligand-metal interactions.
   * Determine if the metal-coordinating residues are conserved across species, indicating **drug relevance**.
4. **Validate known inhibitors**:

   * For example, selenyl-benzamides could be explored for **enzyme inhibition** if your target protein is a metalloenzyme.

---

✅ **Summary:**

* **Most promising PDBs** for drug discovery: those with **organic ligands bound to metal ions**.
* **Less promising**: entries with only metal ions or buffer molecules.
* If your goal is **hit identification**, focus on ligand-bound PDBs and consider metal-coordinating interactions as part of structure-based drug design.

---

If you want, I can **make a clean table of your PDBs ranked by potential drug discovery relevance**, highlighting which ones are likely worth pursuing first. This will make it much easier to focus your efforts.

Do you want me to do that?
