# Chapter 22 – Querying for Potential Redox-Regulated Enzymes
The focal point of this example lies within biochemistry, particularly the regulatory mechanisms of enzymes involving disulfide bond formation or cleavage near the protein surface. Thioredoxins, a group of proteins, play a pivotal role in catalyzing, e.g., reactions of the Calvin cycle. Interestingly, a comparison of primary structures reveals the absence of a consensus motif in most thioredoxin-regulated enzymes. To uncover potential enzyme targets for thioredoxin, the examination of protein structure data is essential. The approach in this project entails pinpointing cysteine sulfur atoms within a 3 Å radius, proximal to the surface. How is this achieved? We combine the power of Jmol, AWK, and shell programming. Essential to this quest is the need for an algorithm to compute the accessible surface area of macromolecules. Fortunately, Jmol boasts an integrated algorithm tailored for this purpose.

## Installations

Install Jmol with: `sudo apt install -y jmol`

## Download 1FRF and 1FRV

In [None]:
wget 'https://files.rcsb.org/download/1FRF.pdb'

In [None]:
wget 'https://files.rcsb.org/download/1FRV.pdb'

# Visual Inspection

Run `jmol` and use the following commands:
```
select all
spacefill 50
select cys and sulfur
color yellow
spacefill 250
monitor 3914 5273
```

## Computational Inspection
Requires the following script:
```
# save as distance.awk
# searches close cysteine sulfur atoms in a structure
# requires a structure file (*.pdb)
# usage: awk -f distance.awk structure.pdb

BEGIN{print "Cysteines in the Structure..."; ORS=""}

$1=="ATOM" && $4=="CYS" && $3=="SG" {
print $4$6", "
cys_x[$6]=$7; cys_y[$6]=$8; cys_z[$6]=$9
}

END{ ORS="\n"
for (key1 in cys_x) {
  for (key2 in cys_x) { 
      dx=cys_x[key1]-cys_x[key2]
      dy=cys_y[key1]-cys_y[key2]
      dz=cys_z[key1]-cys_z[key2]
      distance=sqrt(dx^2+dy^2+dz^2)
      if (distance < 3 && distance != 0 && key1<key2) {
        i++
        candidate[i]=key1"-"key2": "distance
      }
  }
}
print "\nCandidates ..."
for (keys in candidate) {print candidate[keys]}
}
```

In [1]:
awk -f distance.awk 1FRF.pdb

Cysteines in the Structure...
CYS17, CYS20, CYS98, CYS110, CYS114, CYS147, CYS187, CYS212, CYS218, CYS227, CYS245, CYS248, CYS72, CYS75, CYS86, CYS237, CYS259, CYS265, CYS436, CYS457, CYS543, CYS546, 
Candidates ...
75-546: 2.41976
259-436: 2.55824


In [2]:
awk -f distance.awk 1FRV.pdb

Cysteines in the Structure...
CYS17, CYS20, CYS70, CYS96, CYS112, CYS148, CYS188, CYS213, CYS219, CYS228, CYS246, CYS249, CYS65, CYS68, CYS83, CYS228, CYS283, CYS418, CYS494, CYS530, CYS533, CYS17, CYS20, CYS70, CYS96, CYS112, CYS148, CYS188, CYS213, CYS219, CYS228, CYS246, CYS249, CYS65, CYS68, CYS83, CYS228, CYS283, CYS418, CYS494, CYS530, CYS533, 
Candidates ...
65-530: 2.91756
68-533: 2.94816


Run in Jmol
```
load =1FRF
hide water                    # remove water molecules
spacefill off                 # reduce atom spheres
select cys436.sg, cys259.sg, cys75.sg, cys546.sg   # select SG atoms
spacefill 300
```

### Download PDB IDs
Download all IDs as described in the book with the following parameters:
- were derived from \textit{Escherichia} (either directly or via over-expression)
- have an X-ray resolution less than 2 Angström
- consist of one single protein chain

Save the IDs as comma-separated list in *ids.txt*.

### Download PDB Script

In [None]:
wget "https://www.rcsb.org/scripts/batch_download.sh"

In [None]:
chmod u+x batch_download.sh

In [None]:
mkdir EcoliStructures

In [None]:
cut -c 1-499 ids.txt > ids-100.txt # extract first 100 IDs

In [None]:
./batch_download.sh -f ids-100.txt -o EcoliStructures -p

In [None]:
for i in $(sed 's/,/ /g' ids-100.txt); do wget -P EcoliStructures2 "https://files.rcsb.org/download/$i.pdb" ; done

### Batch Analysis

In [None]:
for i in EcoliStructures2/*.pdb; do awk -f distance-batch.awk $i; done

In [None]:
for i in *.pdb; do awk -f distance-batch.awk $i; done

### Open in Jmol
Use the command `jmol -s EcoliStructures2/1B8J.pdb.script` to view a structure in Jmol.