Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DiffDock v1.1 cannot read Amber residue names for the histidine amino acid residue #190

Open
polo9719 opened this issue Mar 4, 2024 · 1 comment

Comments

@polo9719
Copy link

polo9719 commented Mar 4, 2024

Amber can give different names to histidine amino acid by examining which protons are present : HID, HIE, or HIP instead of HIS.

This raises an issue when featurizing the protein in Diffdock because those residues are matched to the one letter name X instead of H.

three_to_one = {'ALA': 'A',

It can be easily fixed by modifying all HID, HIE and HIP by HIS.
Is it a good way to fix it ? If it is the case, may be it could be done automatically in the inference code.
Otherwise, is there a way to read the PDB file that takes into account those variants of amino acids ?

PS-1 : When running DiffDock v1 on the same protein, everything is running fine. That's why I suspect the match of those modified histidines to X coming from the new package Prody.

PS-2 : I had this issue specifically with histidine, but may be it also happens with others amino acids ?

@polo9719
Copy link
Author

FYI I added this pre-processing script to fix the issue

import argparse
from Bio.PDB import PDBParser, PDBIO


# Define a mapping based on your table
residue_renaming_map = {
    'HID': 'HIS',
    'HIE': 'HIS',
    'HIP': 'HIS',
    'GLH': 'GLU',
    'ASH': 'ASP',
    'CYM': 'CYS',
    'CYX': 'CYS',
    'LYN': 'LYS',
}


def rename_residues(input_filename, output_filename):
    parser = PDBParser()
    structure = parser.get_structure("structure", input_filename)
    
    for model in structure:
        for chain in model:
            for residue in chain:
                # Get the standard residue name if it needs to be renamed
                standard_res_name = residue_renaming_map.get(residue.get_resname())
                if standard_res_name:
                    residue.resname = standard_res_name
                # Handle N-terminal and C-terminal residues
                elif residue.get_resname().startswith("N"):
                    residue.resname = residue.get_resname()[1:]
                elif residue.get_resname().endswith("C"):
                    residue.resname = residue.get_resname()[:-1]

    io = PDBIO()
    io.set_structure(structure)
    io.save(output_filename)


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("input_file", type=str)
    parser.add_argument("output_file", type=str)

    args = parser.parse_args()
    
    rename_residues(
        args.input_file,
        args.output_file
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant