This Python script automates the preparation of RNA sequences for AlphaFold3 structure prediction through the AlphaFold Server API. It processes a multi-sequence FASTA file and generates the required JSON input format for batch submissions.

In [None]:
"""
Generate AlphaFold3-compatible JSON input from a multi-FASTA RNA file.

Each RNA sequence is converted into a JSON entry following the
'alphafoldserver' dialect and combined into a single JSON array.
"""

import os
import json
from Bio import SeqIO

# =========================
# JSON template for AF3
# =========================
AF3_TEMPLATE = {
    "name": None,
    "modelSeeds": ["1615246542"],
    "sequences": [
        {
            "rnaSequence": {
                "sequence": None,
                "count": 1
            }
        }
    ],
    "dialect": "alphafoldserver",
    "version": 1
}


def build_af3_entry(name: str, sequence: str) -> dict:
    """
    Build a single AlphaFold3 JSON entry for one RNA sequence.

    Parameters
    ----------
    name : str
        RNA identifier (from FASTA header)
    sequence : str
        RNA sequence (A/U/G/C)

    Returns
    -------
    dict
        JSON-compatible dictionary for AF3 input
    """
    entry = AF3_TEMPLATE.copy()
    entry["name"] = name
    entry["sequences"][0]["rnaSequence"]["sequence"] = sequence
    return entry


# =========================
# Input / Output paths
# =========================
input_fasta = "data/dataset1/dataset1.fasta"
output_base = "data/dataset1/Alpahfold3"
output_json = os.path.join(output_base, "dataset1.json")

# Ensure output directory exists
os.makedirs(output_base, exist_ok=True)

af3_entries = []

# =========================
# Parse FASTA and build JSON
# =========================
for record in SeqIO.parse(input_fasta, "fasta"):
    seq_id = record.id
    seq_str = str(record.seq).upper()

    # Create a directory for each RNA (optional, for future outputs)
    seq_dir = os.path.join(output_base, seq_id)
    os.makedirs(seq_dir, exist_ok=True)

    # Build AF3 JSON entry
    af3_entries.append(build_af3_entry(seq_id, seq_str))

print(f"Parsed {len(af3_entries)} RNA sequences.")

# =========================
# Write combined JSON file
# =========================
with open(output_json, "w") as f:
    json.dump(af3_entries, f, indent=2)

print(f"AF3 JSON file written to: {output_json}")


To evaluate AlphaFold3 predictions, we convert the output CIF files to PDB format using BioPython, as most assessment tools require PDB format for structural analysis.

In [6]:
from utils import convert_cif_to_pdb
convert_cif_to_pdb("../demo/fold_sars_model_0.cif","../demo/fold_sars_model_0.pdb") 

Saved PDB to ../demo/fold_sars_model_0.pdb


PDB structures generated from experiments or predictions often contain missing residues, atoms, hydrogens, or incomplete connectivity, which can lead to failures or inaccuracies in downstream analyses such as energy minimization, molecular simulations, and structure evaluation. A PDB fixing step ensures chemically complete and well-defined atomic models, improving compatibility, robustness, and reproducibility across structure-based workflows. In `utils.py`, we provide two utility functions for PDB preprocessing: **`fix_single_pdb`** and **`fix_pdb_directory_recursive`**.
 

In [7]:
from utils import fix_single_pdb

fix_single_pdb("../demo/fold_sars_model_0.pdb")

Processing PDB: ../demo/fold_sars_model_0.pdb -> ../demo/fold_sars_model_0.fixed.pdb
Finished fixing: ../demo/fold_sars_model_0.fixed.pdb


'../demo/fold_sars_model_0.fixed.pdb'

In [None]:
from utils import fix_pdb_directory_recursive

fix_pdb_directory_recursive("../datasets/dataset1/FARFAR2")