Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDBFixer API cannot fix mmCIF file #275

Closed
locitran opened this issue Jul 12, 2023 · 5 comments
Closed

PDBFixer API cannot fix mmCIF file #275

locitran opened this issue Jul 12, 2023 · 5 comments

Comments

@locitran
Copy link

locitran commented Jul 12, 2023

Hi all,
I am trying to use PDBFixer to fix mmCIF file, but it turns problem when reading it.

from pdbfixer import PDBFixer
from openmm.app import PDBFile

def pdbfixer(in_path, out_path):
    with open(in_path) as in_f:
        fixer = PDBFixer(pdbfile=in_f)
        fixer.findMissingResidues()
        chains = list(fixer.topology.chains())
        keys = fixer.missingResidues.keys()
        for key in keys:
            chain = chains[key[0]]
            if key[1] == 0 or key[1] == len(list(chain.residues())):
                del fixer.missingResidues[key]
        fixer.findNonstandardResidues()
        fixer.replaceNonstandardResidues()
        fixer.removeHeterogens(keepWater=False)
        fixer.findMissingAtoms()
        fixer.addMissingAtoms()
        with open(out_path, 'w') as out_f:
            PDBFile.writeFile(fixer.topology, fixer.positions, out_f, keepIds=True)

    
in_file = './4p42-assembly1.cif'
out_file = 'fix4p42.pdb'
pdbfixer(in_file, out_file)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 28
     26 in_file = '[./4p42-assembly1.cif](https://vscode-remote+ssh-002dremote-002b140-002e114-002e97-002e194.vscode-resource.vscode-cdn.net/mnt/Tsunami_HHD/newloci/NativeEnsembleWeb_copy/Rhapsody_project/scripts/4p42-assembly1.cif)'
     27 out_file = 'fix4p42.pdb'
---> 28 pdbfixer(in_file, out_file)

Cell In[3], line 9, in pdbfixer(in_path, out_path)
      7 def pdbfixer(in_path, out_path):
      8     with open(in_path) as in_f:
----> 9         fixer = PDBFixer(pdbfile=in_f)
     10         fixer.findMissingResidues()
     11         chains = list(fixer.topology.chains())

File [/mnt/Tsunami_HHD/newloci/anaconda3/lib/python3.10/site-packages/pdbfixer/pdbfixer.py:251](https://vscode-remote+ssh-002dremote-002b140-002e114-002e97-002e194.vscode-resource.vscode-cdn.net/mnt/Tsunami_HHD/newloci/anaconda3/lib/python3.10/site-packages/pdbfixer/pdbfixer.py:251), in PDBFixer.__init__(self, filename, pdbfile, pdbxfile, url, pdbid)
    248     file.close()
    249 elif pdbfile:
    250     # A file-like object has been specified.
--> 251     self._initializeFromPDB(pdbfile)
    252 elif pdbxfile:
    253     # A file-like object has been specified.
    254     self._initializeFromPDBx(pdbxfile)

File [/mnt/Tsunami_HHD/newloci/anaconda3/lib/python3.10/site-packages/pdbfixer/pdbfixer.py:284](https://vscode-remote+ssh-002dremote-002b140-002e114-002e97-002e194.vscode-resource.vscode-cdn.net/mnt/Tsunami_HHD/newloci/anaconda3/lib/python3.10/site-packages/pdbfixer/pdbfixer.py:284), in PDBFixer._initializeFromPDB(self, file)
    281 def _initializeFromPDB(self, file):
...
    743     self.residue_name_with_spaces += possible_fourth_character
    744 self.residue_name = self.residue_name_with_spaces.strip()

ValueError: Misaligned residue name: ATOM   1    N N   . ASP A 1 3   ? -52.691  -92.622  29.836  1.00 58.49  ? ?
@peastman
Copy link
Member

fixer = PDBFixer(pdbfile=in_f)

That needs to be pdbxfile=in_f. You're telling it to parse the PDBx/mmCIF file as a PDB file.

@locitran
Copy link
Author

locitran commented Jul 13, 2023

Thank you, Peter. It's working now :-)

May I post another problem when modeling the N/C-terminus by PDBFixer?
image

As you can see there is a very long tail at N/C-terminus. I see your codes have a short energy minimization, it's supposed to be ok with addMissingResidues inside structures. However, it's obviously to say that the result of fixing terminal residues or long continuous missing residues may not be reasonable. What do you think?

Best regards,

@peastman
Copy link
Member

It's common for proteins to have flexible tails. Because they don't have a fixed rigid conformation, they can't be resolved with crystallography and they're missing from crystal structures. PDBFixer is adding them stretched outward just because it's convenient, but don't take that literally. The whole point is that they're flexible and don't have a fixed conformation. As soon as you start simulating they'll begin moving around.

Sometimes people omit the tails from their simulations. You'll need to rely on your own biological knowledge to determine whether the tails are functionally important for your protein, or if they can be safely omitted.

@locitran
Copy link
Author

Thank you Peter, I got the your idea

@peastman
Copy link
Member

Ok, great. I'm closing this issue, since the question has been answered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants