# Four ways do replace the ChainID in a PDB file in Python

The chain ID is character 22 of ATOM and HETATM lines in a PDB file.
We must make sure that we don't loose any characters and nothing shifts within a line,
otherwise the file looses integity or atom names are interpreted wrongly (`CA` can mean C-alpha 
or Calcium depending on how it is aligned in the atom-name field).

## Task

Replace the chainID of all `ATOM` lines of chain `A`, with chainID `Z`.


## 0) using sed
```bash
$ sed 's/^ATOM\(.\{17\}\)A/ATOM\1Z/' 1AKI.pdb > chain_z_sed.pdb
$ egrep -C5 "^TER" chain_z_sed.pdb
ATOM    997  CB  LEU Z 129      39.635  12.335   6.646  1.00 26.31           C  
ATOM    998  CG  LEU Z 129      38.689  12.917   5.620  1.00 23.49           C  
ATOM    999  CD1 LEU Z 129      39.112  12.657   4.191  1.00 26.43           C  
ATOM   1000  CD2 LEU Z 129      37.310  12.325   5.886  1.00 25.15           C  
ATOM   1001  OXT LEU Z 129      43.232  12.675   6.905  1.00 34.20           O  
TER    1002      LEU A 129                                                      
HETATM 1003  O   HOH A 130      23.434  40.063  -6.661  1.00 19.48           O  
HETATM 1004  O   HOH A 131      31.994  26.416  -6.047  0.90 22.43           O  
HETATM 1005  O   HOH A 132      30.250  13.337   9.787  0.98 20.93           O  
HETATM 1006  O   HOH A 133      22.384  42.331  -8.165  0.90 21.85           O  
HETATM 1007  O   HOH A 134      29.239  27.621  -3.670  1.00 17.47           O  
```

## 1) using string slicing

In [1]:
with open('1AKI.pdb') as pdbfile:
    lines = []
    for line in pdbfile:
        if line.startswith('ATOM') and line[21] == 'A':
            line = line[:21] + 'Z' + line[22:]
        lines.append(line)

with open('chain_z_py1.pdb', 'w') as outfile:
    outfile.writelines(lines)

In [2]:
!egrep -C5 "^TER" chain_z_py1.pdb

ATOM    997  CB  LEU Z 129      39.635  12.335   6.646  1.00 26.31           C  
ATOM    998  CG  LEU Z 129      38.689  12.917   5.620  1.00 23.49           C  
ATOM    999  CD1 LEU Z 129      39.112  12.657   4.191  1.00 26.43           C  
ATOM   1000  CD2 LEU Z 129      37.310  12.325   5.886  1.00 25.15           C  
ATOM   1001  OXT LEU Z 129      43.232  12.675   6.905  1.00 34.20           O  
TER    1002      LEU A 129                                                      
HETATM 1003  O   HOH A 130      23.434  40.063  -6.661  1.00 19.48           O  
HETATM 1004  O   HOH A 131      31.994  26.416  -6.047  0.90 22.43           O  
HETATM 1005  O   HOH A 132      30.250  13.337   9.787  0.98 20.93           O  
HETATM 1006  O   HOH A 133      22.384  42.331  -8.165  0.90 21.85           O  
HETATM 1007  O   HOH A 134      29.239  27.621  -3.670  1.00 17.47           O  


## 2) using Python regular expresssions

In [3]:
import re
with open('1AKI.pdb') as pdbfile:
    lines = []
    for line in pdbfile:
        line = re.sub(r'^ATOM(.{17})A', r'ATOM\1Z', line)
        lines.append(line)

with open('chain_z_py2.pdb', 'w') as outfile:
    outfile.writelines(lines)

In [4]:
!egrep -C5 "^TER" chain_z_py2.pdb

ATOM    997  CB  LEU Z 129      39.635  12.335   6.646  1.00 26.31           C  
ATOM    998  CG  LEU Z 129      38.689  12.917   5.620  1.00 23.49           C  
ATOM    999  CD1 LEU Z 129      39.112  12.657   4.191  1.00 26.43           C  
ATOM   1000  CD2 LEU Z 129      37.310  12.325   5.886  1.00 25.15           C  
ATOM   1001  OXT LEU Z 129      43.232  12.675   6.905  1.00 34.20           O  
TER    1002      LEU A 129                                                      
HETATM 1003  O   HOH A 130      23.434  40.063  -6.661  1.00 19.48           O  
HETATM 1004  O   HOH A 131      31.994  26.416  -6.047  0.90 22.43           O  
HETATM 1005  O   HOH A 132      30.250  13.337   9.787  0.98 20.93           O  
HETATM 1006  O   HOH A 133      22.384  42.331  -8.165  0.90 21.85           O  
HETATM 1007  O   HOH A 134      29.239  27.621  -3.670  1.00 17.47           O  


## 3) using MDAnalysis
https://www.mdanalysis.org/

In [5]:
import MDAnalysis as mda

# Read Universe from PDB
u = mda.Universe('1AKI.pdb')

In [6]:
# explore the Universe object
print(u)
print(u.residues)
print(u.segments)

<Universe with 1079 atoms>
<ResidueGroup [<Residue LYS, 1>, <Residue VAL, 2>, <Residue PHE, 3>, ..., <Residue HOH, 205>, <Residue HOH, 206>, <Residue HOH, 207>]>
<SegmentGroup [<Segment A>]>


In [7]:
# loop over segments (chains)
for seg in u.segments:
    print("before:   segindex: {}  segid: {}".format(seg.segindex, seg.segid))

    # assign chainID to current segment
    seg.segid = 'Z'
    
    print("after:    segindex: {}  segid: {}\n".format(seg.segindex, seg.segid))


before:   segindex: 0  segid: A
after:    segindex: 0  segid: Z



In [8]:
# export fixed structure to PDB file
u.atoms.write("chain_z_mda.pdb")

## 4) using PDB module in Biopython

http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc150

In [1]:
from Bio import PDB

# Read structure from PDB
parser = PDB.PDBParser()
structure = parser.get_structure('1aki', '1AKI.pdb')

In [2]:
# explore the chains:
print(structure)
for chain in structure.get_chains():
    print(chain)
    print(chain.id)

<Structure id=1aki>
<Chain id=A>
A


In [3]:
for chain in structure.get_chains():
    # set chain-ID to 'Z'
    chain.id = 'Z'

In [4]:
# export fixed structure to PDB file
io = PDB.PDBIO()
io.set_structure(structure)
io.save('chain_z_biopython.pdb')