# Querying for Potential Redox-Regulated Enzymes

## Installations

Install Jmol with: `sudo apt install -y jmol`

## Download 1FRF and 1FRV

In [2]:
wget 'https://files.rcsb.org/download/1FRF.pdb'

--2024-03-10 11:18:17--  https://files.rcsb.org/download/1FRF.pdb
Resolving files.rcsb.org (files.rcsb.org)... 128.6.159.245
Connecting to files.rcsb.org (files.rcsb.org)|128.6.159.245|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/octet-stream]
Saving to: ‘1FRF.pdb’

1FRF.pdb                [    <=>             ] 562.02K   745KB/s    in 0.8s    

2024-03-10 11:18:18 (745 KB/s) - ‘1FRF.pdb’ saved [575505]



In [3]:
wget 'https://files.rcsb.org/download/1FRV.pdb'

--2024-03-10 11:18:36--  https://files.rcsb.org/download/1FRV.pdb
Resolving files.rcsb.org (files.rcsb.org)... 128.6.159.245
Connecting to files.rcsb.org (files.rcsb.org)|128.6.159.245|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/octet-stream]
Saving to: ‘1FRV.pdb’

1FRV.pdb                [    <=>             ]   1.04M  1.11MB/s    in 0.9s    

2024-03-10 11:18:38 (1.11 MB/s) - ‘1FRV.pdb’ saved [1091718]



# Visual Inspection

Run `jmol` and use the following commands:
```
select all
spacefill 50
select cys and sulfur
color yellow
spacefill 250
monitor 3914 5273
```

## Computational Inspection
Requires the following script:
```
# save as distance.awk
# searches close cysteine sulfur atoms in a structure
# requires a structure file (*.pdb)
# usage: awk -f distance.awk structure.pdb

BEGIN{print "Cysteines in the Structure..."; ORS=""}

$1=="ATOM" && $4=="CYS" && $3=="SG" {
print $4$6", "
cys_x[$6]=$7; cys_y[$6]=$8; cys_z[$6]=$9
}

END{ ORS="\n"
for (key1 in cys_x) {
  for (key2 in cys_x) { 
      dx=cys_x[key1]-cys_x[key2]
      dy=cys_y[key1]-cys_y[key2]
      dz=cys_z[key1]-cys_z[key2]
      distance=sqrt(dx^2+dy^2+dz^2)
      if (distance < 3 && distance != 0 && key1<key2) {
        i++
        candidate[i]=key1"-"key2": "distance
      }
  }
}
print "\nCandidates ..."
for (keys in candidate) {print candidate[keys]}
}
```

In [8]:
awk -f distance.awk 1FRF.pdb

Cysteines in the Structure...
CYS17, CYS20, CYS98, CYS110, CYS114, CYS147, CYS187, CYS212, CYS218, CYS227, CYS245, CYS248, CYS72, CYS75, CYS86, CYS237, CYS259, CYS265, CYS436, CYS457, CYS543, CYS546, 
Candidates ...
259-436: 2.55824
546-75: 2.41976


In [9]:
awk -f distance.awk 1FRV.pdb

Cysteines in the Structure...
CYS17, CYS20, CYS70, CYS96, CYS112, CYS148, CYS188, CYS213, CYS219, CYS228, CYS246, CYS249, CYS65, CYS68, CYS83, CYS228, CYS283, CYS418, CYS494, CYS530, CYS533, CYS17, CYS20, CYS70, CYS96, CYS112, CYS148, CYS188, CYS213, CYS219, CYS228, CYS246, CYS249, CYS65, CYS68, CYS83, CYS228, CYS283, CYS418, CYS494, CYS530, CYS533, 
Candidates ...
530-65: 2.91756
533-68: 2.94816


Run in Jmol
```
load =1FRF
hide water                    # remove water molecules
spacefill off                 # reduce atom spheres
select cys436.sg, cys259.sg, cys75.sg, cys546.sg   # select SG atoms
spacefill 300
```

### Download PDB IDs
Download all IDs as described in the book with the following parameters:
- were derived from \textit{Escherichia} (either directly or via over-expression)
- have an X-ray resolution less than 2 Angström
- consist of one single protein chain

Save the IDs as comma-separated list in *ids.txt*.

### Download PDB Script

In [10]:
wget "https://www.rcsb.org/scripts/batch_download.sh"

--2024-03-14 11:38:53--  https://www.rcsb.org/scripts/batch_download.sh
Resolving www.rcsb.org (www.rcsb.org)... 128.6.159.248
Connecting to www.rcsb.org (www.rcsb.org)|128.6.159.248|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2360 (2.3K) [application/x-sh]
Saving to: ‘batch_download.sh’


2024-03-14 11:38:54 (38.8 MB/s) - ‘batch_download.sh’ saved [2360/2360]



In [11]:
chmod u+x batch_download.sh

In [12]:
mkdir EcoliStructures

In [18]:
cut -c 1-499 ids.txt > ids-100.txt # extract first 100 IDs

In [19]:
./batch_download.sh -f ids-100.txt -o EcoliStructures -p

Downloading https://files.rcsb.org/download/1A3A.pdb.gz to EcoliStructures/1A3A.pdb.gz
Downloading https://files.rcsb.org/download/1A40.pdb.gz to EcoliStructures/1A40.pdb.gz
Downloading https://files.rcsb.org/download/1A54.pdb.gz to EcoliStructures/1A54.pdb.gz
Downloading https://files.rcsb.org/download/1A62.pdb.gz to EcoliStructures/1A62.pdb.gz
Downloading https://files.rcsb.org/download/1A82.pdb.gz to EcoliStructures/1A82.pdb.gz
Downloading https://files.rcsb.org/download/1A9Y.pdb.gz to EcoliStructures/1A9Y.pdb.gz
Downloading https://files.rcsb.org/download/1A9Z.pdb.gz to EcoliStructures/1A9Z.pdb.gz
Downloading https://files.rcsb.org/download/1ABE.pdb.gz to EcoliStructures/1ABE.pdb.gz
Downloading https://files.rcsb.org/download/1ABF.pdb.gz to EcoliStructures/1ABF.pdb.gz
Downloading https://files.rcsb.org/download/1ACV.pdb.gz to EcoliStructures/1ACV.pdb.gz
Downloading https://files.rcsb.org/download/1AG9.pdb.gz to EcoliStructures/1AG9.pdb.gz
Downloading https://files.rcsb.org/download

In [20]:
for i in $(sed 's/,/ /g' ids-100.txt); do wget -P EcoliStructures2 "https://files.rcsb.org/download/$i.pdb" ; done

--2024-03-14 12:02:26--  https://files.rcsb.org/download/1A3A.pdb
Resolving files.rcsb.org (files.rcsb.org)... 128.6.159.245
Connecting to files.rcsb.org (files.rcsb.org)|128.6.159.245|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/octet-stream]
Saving to: ‘EcoliStructures2/1A3A.pdb’

1A3A.pdb                [    <=>             ] 431.97K   558KB/s    in 0.8s    

2024-03-14 12:02:28 (558 KB/s) - ‘EcoliStructures2/1A3A.pdb’ saved [442341]

--2024-03-14 12:02:28--  https://files.rcsb.org/download/1A40.pdb
Resolving files.rcsb.org (files.rcsb.org)... 128.6.159.245
Connecting to files.rcsb.org (files.rcsb.org)|128.6.159.245|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/octet-stream]
Saving to: ‘EcoliStructures2/1A40.pdb’

1A40.pdb                [   <=>              ] 255.81K   498KB/s    in 0.5s    

2024-03-14 12:02:29 (498 KB/s) - ‘EcoliStructures2/1A40.pdb’ saved [261954]

--2024-0

### Batch Analysis

In [23]:
for i in EcoliStructures2/*.pdb; do awk -f distance-batch.awk $i; done

EcoliStructures2/1ACV.pdb is a candidate:
30-33: 2.03644
EcoliStructures2/1B12.pdb is a candidate:
170-176: 2.02048
EcoliStructures2/1B8J.pdb is a candidate:
168-178: 2.03851
286-336: 2.01802
EcoliStructures2/1BT5.pdb is a candidate:
123-77: 2.05451
EcoliStructures2/1BTL.pdb is a candidate:
123-77: 2.01439
EcoliStructures2/1DJR.pdb is a candidate:
86-9: 2.06412


In [24]:
for i in *.pdb; do awk -f distance-batch.awk $i; done

1FRF.pdb is a candidate:
259-436: 2.55824
546-75: 2.41976
1FRV.pdb is a candidate:
530-65: 2.91756
533-68: 2.94816


### Open in Jmol
Use the command `jmol -s EcoliStructures2/1B8J.pdb.script` to open a 