# Assessment of nucleotide-binding site configurations in structures of P-loop NTPases

Pipeline of the analysis described in [@kozlovaUnitedDiversityPatterns2022] and [@kozlovaUnitedDiversityCommonality2022].
The routine uses BioPython PDB module to open PDBs and Pfam/Uniprot mappings to assign Superfamily information.

## Imports
Running the routine requires Biopython v.  1.79 and Pandas v. 1.2.2 to be installed.

In [159]:
import os
import pandas as pd

## 0. Input files & setup

### 0.1 Path definitions
Input files used are provided in this repo in the "ploop_input" folder

In [5]:
### Working directory ###
ploop_wdir="/home/servalli/Documents/projects/Ploop_autogenerated"

##subdirs
pdb_dir="PDB"
log_dir="logs"

if not os.path.isdir(ploop_wdir):
    os.mkdir(ploop_wdir)
    
for dir_p in [pdb_dir,log_dir]:
    if not os.path.isdir(os.path.join(ploop_wdir,dir_p)):
        os.mkdir(os.path.join(ploop_wdir,dir_p))


#######PATH DEFINITIONS
p_dir=os.path.join(ploop_wdir,pdb_dir) #PDB folder 

###Input files ###

ploop_in_dir="ploop_input" #folder with all input text files
ploop_chain_list_p=os.path.join(ploop_in_dir, "ploop_list.txt")  #Interpro mapping by PDB ID and chain, tab-delimited table: 121p	A	IPR027417

compound_dir="compound_lists" #folder with lists of PDBs with a given ligand, taken from RCSB PDB (no headers allowed!)
ion_cont_p="MG_MN_SR_CA.txt" #All structures containing Mg2+, Mn2+, Sr2+, Ca2+, retrieved from RCSB PDB
###Family mappings 
pf_dir="ploop_input/pf_ploop_assignment" #Subfolder with Pfam Mappings
pth_pdb_mapping="pdbmap"   #from ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/pdbmap.gz
pth_suprfam="ploop_pfam_to_superfam.csv" #Pfam domains manually assigned to major classes of P-loop NTPases
up_txt="uniprot-pdb.tab" ###Uniprot descriptions as table retrieved from Uniprot website 13/10/2020
up_mapping="pdb_chain_uniprot.lst" ###NEW MAPPING RETRIEVED 10/10/2020 from ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/text/pdb_chain_uniprot.lst
pf_mapping="pdb_chain_pfam.lst" ###RETRIEVED 13/10/2020 from ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/text/pdb_chain_pfam.lst
pf_hmmer="hmmer_pdb_all.txt" ###retrieved from http://www.rcsb.org/pdb/rest/hmmer?file=hmmer_pdb_all.txt used for Pfam domain coordinates
interpro_mapping="pdb_chain_interpro.tsv"
sites_checked="sites_checked_manually.txt" #Table listing sites we checked manually on the previous run, listing the information on Arg finger binding type

### 0.2 Setup for a run
Read input id lists, select PDBs to be downloaded:

In [6]:

d_ip_mapping=pd.read_csv(os.path.join(pf_dir, interpro_mapping), sep="\t", header=1, names=["pdbid","chain","ipr"])

ploop_all_chains=d_ip_mapping[d_ip_mapping.ipr=="IPR027417"].copy()
ploop_all_chains['ind_ch']=['_'.join([x, y]) for x, y in zip(ploop_all_chains['pdbid'], ploop_all_chains['chain'])]


Filter P-loop structures by ligand type (only nucleotide/nucleotide analog and some divalent cation bound).<br>
Structures with ADP/GDP are only selected if there is a mimic of γ-phosphate bound in the same structure.<br>
Nucleotide analogs analyzed include following compounds:
* ATP
* GTP
* ATP-γ-S (AGS)
* AMP-PNP (ANP)
* AMP-PCP (ACP)
* GTP-γ-S (GSP)
* GMP-PNP (GNP)
* GMP-PCP (GCP)
* ADP and GDP if complexed with:
    * BeF<sub>3</sub> (BEF)
    * MgF<sub>3</sub><sup>-</sup> (MGF)
    * AlF<sub>3</sub> (AF3)
    * AlF<sub>4</sub><sup>-</sup> (ALF)
    * VO<sub>4</sub><sup>3-</sup> (VO4)


In [36]:
#PDBs with ligands lists, taken from RCSB PDB website (9/10/20)

with open(os.path.join(ploop_in_dir,compound_dir,ion_cont_p),"r") as f: has_ion=set(f.readline().strip("\n").split(","))

#with open(os.path.join(ploop_in_dir,interpro_p),"r") as f:   interpro_ids={l.strip("\n").split("\t")[0].upper() for l in f.readlines()}
interpro_ids=set(ploop_all_chains.pdbid.str.upper().to_list())

trinuc_ids=["ATP","ACP","AGS","ANP","GTP","GNP","GSP","GCP"]
compound_pdb={}
for c_id in trinuc_ids:
    with open(os.path.join(ploop_in_dir,compound_dir,f"{c_id}.txt"),"r") as f:
        pdbs=set(f.readline().strip("\n").split(","))
        pdbs_suitable=sorted(list(pdbs&has_ion&interpro_ids))
        compound_pdb[c_id]=pdbs_suitable

gamma_ids=["MGF","ALF","AF3","BEF","VO4"]
gamma_list=[]
for gamma in gamma_ids:
    with open(os.path.join(ploop_in_dir,compound_dir,f"{gamma}.txt"),"r") as f: cur_gamma=f.readline().strip("\n").split(",")
    gamma_list.extend(cur_gamma)
gamma_list=set(gamma_list)
dinuc_ids=["ADP","GDP"]
for c_id in dinuc_ids:
    with open(os.path.join(ploop_in_dir,compound_dir,f"{c_id}.txt"),"r") as f:
        pdbs=set(f.readline().strip("\n").split(","))
        pdbs_suitable=sorted(list(pdbs&has_ion&interpro_ids&gamma_list))
        compound_pdb[c_id]=pdbs_suitable


## 1. Download structures

In [47]:
from pyploop.process_pdb import download_pdbs
downloaded=download_pdbs(compound_pdb, ploop_wdir, pdb_dir, log_dir)

In [168]:
print (f"Total {downloaded.shape[0]}, unique {downloaded.PDBID.unique().shape[0]}") 
downloaded.groupby("nucl").count()

Total 1615, unique 1566


Unnamed: 0_level_0,PDBID
nucl,Unnamed: 1_level_1
ACP,22
ADP,144
AGS,64
ANP,191
ATP,285
GCP,69
GDP,89
GNP,416
GSP,75
GTP,260


## 2. Calculate the distances

### For each PDB,
* Structure metadata was retrieved, low-resolution models (>5Å) skipped
* For each structure, surroundings of each nucleotide(/nucleotide-like compound) were analyzed.

#### Each nucleotide was processed in the following way:
* Find γ-phosphate or a moeity mimicking it.
* Find Mg2+ or other divalent cation (Ca, Mn, Sr...)
* Find Walker A Lys by proximity to beta-phosphate and sequence check.
* Find Walker A Ser\Thr by sequence, measure distance to Mg<sup>2+</sup>
* List availability of H2O in the structure
* Find Walker B Asp/Glu by distance in relation to [S/T]<sup>K+1</sup>, performing a hydrophobicity check 
for three preceding residues (not allowing E,D,S,T,Y,K,R,H). If no suitable Asp/Glu was found in 5Å from [S/T]<sup>K+1</sup>, the closest residue is listed. The distance to [S/T]<sup>K+1</sup> is recorded.  
* Find putative finger residues - list closest Arg, non-P-loop Lys, Asn residues (by proximity to β-phosphate)
* List all protein nitrogens in 4Å from γ-phosphate.

### 2.1 Definitions for distance calculations:

In [160]:

comps=["ADP","ATP","ANP","ACP","AGS","GNP","GCP","GSP","GTP","GDP"] #Nucleotide analogs types to be processed
comps_F=["GDP","ADP"] #nucleoside diphosphates
###Atom names
beta=["O1B","O2B","O3B","N3B","C3B"]
alpha=["O1A","O2A","O3A"] 
gamma_N=["O1G","O2G","O3G","S1G"]
gamma_F=["F1","F2","F3","F4","O1","O2","O3","O4"]
#Types of γ-phosphate mimics
gamma_types=["ALF","AF3","BEF","MGF","VO4"]


### 2.2 Calculate distances

In [61]:
from pyploop.process_pdb import process_pdb_dir
dists=process_pdb_dir(p_dir,comps, comps_F, alpha, beta,  gamma_F, gamma_N, gamma_types,ploop_all_chains)

ADP 144 structures of P-loop NTPases available
0 1br1 ADP
Gamma not found! 1h8e ADP A600, skipping
Gamma not found! 1h8e ADP B600, skipping
Gamma not found! 1h8e ADP C600, skipping
Gamma not found! 1h8e ADP E600, skipping
Gamma not found! 1ihu ADP A590, skipping
Gamma not found! 1vfz ADP A500, skipping
Gamma not found! 1w0j ADP A1511, skipping
Gamma not found! 1w0j ADP B1511, skipping
Gamma not found! 1w0j ADP C1511, skipping
50 3kql ADP
100 5lta ADP
Gamma not found! 6ap1 ADP D701, skipping
Gamma not found! 6ap1 ADP E701, skipping
Gamma not found! 6bmf ADP D501, skipping
Gamma not found! 6bmf ADP E501, skipping
Gamma not found! 6gej ADP T501, skipping
Gamma not found! 6gej ADP U502, skipping
Gamma not found! 6gej ADP V501, skipping
Gamma not found! 6gej ADP W501, skipping
Gamma not found! 6gej ADP X501, skipping
Gamma not found! 6gej ADP Y501, skipping
Gamma not found! 6gen ADP T501, skipping
Gamma not found! 6gen ADP U502, skipping
Gamma not found! 6gen ADP V501, skipping
Gamma not fo

In [62]:
dists_raw=dists.copy()
year=2022
spec="Ver5_hy_fallback"
dists.to_csv(os.path.join(ploop_wdir,f"{year}_{spec}_t1_raw_dist.tsv"),sep="\t")

This is what we get:

In [68]:
dists[dists.columns[8:]].head()


Unnamed: 0,PDBID,model,nuc_type,nuc_chain,nuc_id,nuc_gamma_moiety_type,nuc_gamma_moiety_chain,nuc_gamma_moiety_id,mg_in_site_4a_from_beta,AG-site,G-site,TYPE_ARG,LYS-site,asn_TYPE,Lys_res_ch,Lys_res_id,nz-alpha-atom,nz-alpha-dist,nz-gamma-atom,nz-gamma-dist,Arg_res_ch,Arg_res_id,nh1-alpha-atom,nh1-alpha-dist,nh1-gamma-atom,nh1-gamma-dist,nh2-alpha-atom,nh2-alpha-dist,nh2-gamma-atom,nh2-gamma-dist,ne-alpha-atom,ne-alpha-dist,ne-gamma-atom,ne-gamma-dist,asn_ID,asn_ne2-alpha-atom,asn_ne2-alpha-dist,asn_ne2-gamma-atom,asn_ne2-gamma-dist,Surr_N_BB,Surr_N_SCh,gly13-chain,gly13-type,gly13-id,nuc-to-g13-atom,dist-gly13,lys-ploop-info,ploopk-dist,resolution,method,pdbname,pdbdate,include,SerK+1-Mg,WB-Asp/Glu,WBD-SerK+1_dist,WBD-Mg,water_present,is_hydro,preceding_res
0,1br1,0,ADP,A,998,ALF,A,999,MG,,,,,,A,250,O1A,8.639585,F4,7.628103,A,247,O2A,10.024054,F3,5.33214,O2A,11.278693,F3,6.160625,O2A,10.615997,F3,6.615,A242,O2A,2.916646,F2,3.817497,GLY_A468..F1(2.55); SER_A246..F2(2.70),,A,GLY,180,F1,3.655413,A183,2.780986,3.5,x-ray diffraction,SMOOTH MUSCLE MYOSIN MOTOR DOMAIN-ESSENTIAL LI...,1998-08-26,,2.159976,ASP_A465,3.946227,3.889449,True,True,LIG
1,1br1,0,ADP,C,998,ALF,C,999,MG,,,,,,C,250,O1A,8.718588,F4,7.675519,C,247,O2A,9.955953,F3,5.308627,O2A,11.298747,F3,6.233797,O2A,10.617236,F3,6.68114,C242,O2A,2.969201,F2,3.955435,GLY_C468..F1(2.46); SER_C246..F2(2.75),,C,GLY,180,F1,3.738719,C183,2.751563,3.5,x-ray diffraction,SMOOTH MUSCLE MYOSIN MOTOR DOMAIN-ESSENTIAL LI...,1998-08-26,,2.166397,ASP_C465,3.922912,3.763648,True,True,LIG
2,1br1,0,ADP,E,998,ALF,E,999,MG,,,,,,E,250,O1A,8.767497,F4,7.666852,E,247,O2A,10.116271,F3,5.496902,O2A,11.47034,F3,6.434052,O2A,10.82166,F3,6.89429,E242,O2A,3.156481,F2,3.965111,GLY_E468..F1(2.48); SER_E179..F1(3.99); SER_E2...,,E,GLY,180,F1,3.665947,E183,2.710544,3.5,x-ray diffraction,SMOOTH MUSCLE MYOSIN MOTOR DOMAIN-ESSENTIAL LI...,1998-08-26,,2.076092,ASP_E465,3.938586,3.718373,True,True,LIG
3,1br1,0,ADP,G,998,ALF,G,999,MG,,,,,,G,250,O1A,8.785262,F4,7.654045,G,247,O2A,10.132792,F3,5.477052,O2A,11.498631,F3,6.433871,O2A,10.795943,F3,6.833364,G242,O2A,3.129764,F2,3.859841,GLY_G468..F1(2.53); SER_G246..F2(2.78),,G,GLY,180,F1,3.664732,G183,2.774285,3.5,x-ray diffraction,SMOOTH MUSCLE MYOSIN MOTOR DOMAIN-ESSENTIAL LI...,1998-08-26,,2.066263,ASP_G465,3.922849,3.771162,True,True,LIG
4,1br2,0,ADP,A,998,ALF,A,999,MG,,,,,,A,250,O1A,8.386499,F4,7.087786,A,247,O3A,9.803608,F3,4.609398,O3A,12.066981,F3,6.799509,O2A,11.023004,F3,6.514483,A242,O2A,2.941894,F2,3.716788,GLY_A468..F1(2.54); SER_A246..F2(2.58),,A,GLY,180,F3,3.980544,A183,2.752079,2.9,x-ray diffraction,SMOOTH MUSCLE MYOSIN MOTOR DOMAIN COMPLEXED WI...,1998-08-26,,2.013697,ASP_A465,2.795241,3.434139,True,True,LIG


## 3. Add Annotations

### 3.1 Read tables for annotation

In [164]:
#Mapping from PDB to Pfam
d_pfam_mapping=pd.read_csv(os.path.join(pf_dir, pf_mapping), sep="\t", header=1)
#Results of HMMer run against sequences from PDB, with coordinates
d_pfam_hmm=pd.read_csv(os.path.join(pf_dir, pf_hmmer), sep="\t", header=0)

d_pfam_hmm["PFAM"]=d_pfam_hmm.PFAM_ACC.str.split(".",expand=True)[0]
d_pfam_hmm[["start","end"]]=-10000000,10000000
isnumeric_start=d_pfam_hmm.PdbResNumStart.str.lstrip("-").str.isdigit()
isnumeric_end=d_pfam_hmm.PdbResNumEnd.str.lstrip("-").str.isdigit()
d_pfam_hmm.loc[isnumeric_start,"start"]=d_pfam_hmm.loc[isnumeric_start,"PdbResNumStart"].astype(int)
d_pfam_hmm.loc[isnumeric_end,"end"]=d_pfam_hmm.loc[isnumeric_end,"PdbResNumEnd"].astype(int)
d_pfam_hmm.sort_values(by="eValue", inplace=True)
col_ns_pdb_pf=['PDB_ID','CHAIN_ID',2,"PFAM_Name", "PFAM_ACC","Accesion", "Position"]
#Another Pfam-PDB mapping
pf_to_pdb = pd.read_csv(os.path.join(pf_dir, pth_pdb_mapping), sep="\t", header=None, names=col_ns_pdb_pf)
for col in pf_to_pdb:
	if col!=2:		
		pf_to_pdb[col] = pf_to_pdb[col].map(lambda x: x.strip(';'))		
#Pfam domains to P-loop NTPase classes (superfamilies)	
pf_to_suprf=pd.read_csv(os.path.join(pf_dir, pth_suprfam), sep=",", header=0)
accesions=list(pf_to_suprf['pf_id'])
pdbs_ploop=pf_to_pdb.loc[pf_to_pdb["PFAM_ACC"].isin(accesions)].copy()
pdbs_ploop['pdb_id']=pdbs_ploop['PDB_ID'].str.lower()
pdb_to_up={}

#Uniprot info & mapping to PDB
d_up_info=pd.read_csv(os.path.join(pf_dir, up_txt), sep="\t", header=0)
d_up_mapping=pd.read_csv(os.path.join(pf_dir, up_mapping), sep="\t", header=1)






### 3.2 Filter site table and add protein descriptions

In [120]:
from pyploop.table_features import mark_lowqual_sites, add_identifiers

<br>Assign Uniprot IDs and Superfamily data. Pfam mapping is used to assign sites to major classes of P-loop NTPases, for this Pfam domains of CL00023 were assigned to classes (i.e. "TRAFAC" or "AAA+")  manually, see example below.</br>

In [138]:
pf_to_suprf.head(10)

Unnamed: 0,n,pf_id,domain,syn,domain_name,superfamily
0,,PF06431,Polyoma_lg_T_C,,Polyomavirus large T antigen C-terminus,AAA/SF3
1,s,PF00004,AAA,,ATPase family associated with various cellular...,AAA+
2,s,PF13191,AAA_16,,AAA ATPase domain,AAA+
3,s,PF13238,AAA_18,,AAA domain,AAA+
4,s,PF13401,AAA_22,,AAA domain,AAA+
5,s,PF12775,AAA_7,,P-loop containing dynein motor region,AAA+
6,s,PF00308,Bac_DnaA,bac_dnaA;,Bacterial dnaA protein,AAA+
7,s,PF06144,DNA_pol3_delta,,"DNA polymerase III, delta subunit",AAA+
8,s,PF13177,DNA_pol3_delta2,,"DNA polymerase III, delta subunit",AAA+
9,s,PF01695,IstB_IS21,IstB;,IstB-like ATP binding protein,AAA+


In [121]:
add_identifiers(dists, d_up_mapping, d_up_info, d_pfam_hmm,
                 d_pfam_mapping, pf_to_suprf,
                 ploop_all_chains,pdbs_ploop)                           
                

No Uniprot ID,  5fhd A
No Uniprot ID,  5fhd B
No Uniprot ID,  5fhe A
No Uniprot ID,  4qc2 A
No Uniprot ID,  4qc2 B
No Uniprot ID,  6tdu AA
No Uniprot ID,  6tdu AB
No Uniprot ID,  6tdu AC
No Uniprot ID,  6tdu AF
No Uniprot ID,  6tdu BA
No Uniprot ID,  6tdu BB
No Uniprot ID,  6tdu BC
No Uniprot ID,  6tdu BF
No Uniprot ID,  6tdy A
No Uniprot ID,  6tdy B
No Uniprot ID,  6tdy C
No Uniprot ID,  6tdy F
No Uniprot ID,  6tdz B
No Uniprot ID,  6tdz C
No Uniprot ID,  6tdz A
No Uniprot ID,  6tdz D
No Uniprot ID,  6te0 C
No Uniprot ID,  6te0 A
No Uniprot ID,  6te0 B
No Uniprot ID,  6te0 E
No Uniprot ID,  6gz3 Ct
No Uniprot ID,  6gz4 Ct
No Uniprot ID,  6gz5 Ct
No Uniprot ID,  5it7 1
No Uniprot ID,  3wyf A
No Uniprot ID,  3wyf D
No Uniprot ID,  5xoj A


In [122]:
mark_lowqual_sites(dists)
dists.to_csv(os.path.join(ploop_wdir,f"{year}_{spec}_t2_with_family.tsv"),sep="\t")
dists2=dists.copy()

Annotated dataframe looks like this:

In [124]:
dists.head()

Unnamed: 0,superfamily,pfam_acc,pfam_domain,domain_name,Uniprot_Ac,Uniprot_Id,Protein_name_up,Gene_name_up,PDBID,model,nuc_type,nuc_chain,nuc_id,nuc_gamma_moiety_type,nuc_gamma_moiety_chain,nuc_gamma_moiety_id,mg_in_site_4a_from_beta,AG-site,G-site,TYPE_ARG,LYS-site,asn_TYPE,Lys_res_ch,Lys_res_id,nz-alpha-atom,nz-alpha-dist,nz-gamma-atom,nz-gamma-dist,Arg_res_ch,Arg_res_id,nh1-alpha-atom,nh1-alpha-dist,nh1-gamma-atom,nh1-gamma-dist,nh2-alpha-atom,nh2-alpha-dist,nh2-gamma-atom,nh2-gamma-dist,ne-alpha-atom,ne-alpha-dist,ne-gamma-atom,ne-gamma-dist,asn_ID,asn_ne2-alpha-atom,asn_ne2-alpha-dist,asn_ne2-gamma-atom,asn_ne2-gamma-dist,Surr_N_BB,Surr_N_SCh,gly13-chain,gly13-type,gly13-id,nuc-to-g13-atom,dist-gly13,lys-ploop-info,ploopk-dist,resolution,method,pdbname,pdbdate,include,SerK+1-Mg,WB-Asp/Glu,WBD-SerK+1_dist,WBD-Mg,water_present,is_hydro,preceding_res,pfam_comm
0,TRAFAC,PF00063,Myosin_head,Myosin head (motor domain),P10587,MYH11_CHICK,Myosin-11,MYH11,1br1,0,ADP,A,998,ALF,A,999,MG,,,,,,A,250,O1A,8.639585,F4,7.628103,A,247,O2A,10.024054,F3,5.33214,O2A,11.278693,F3,6.160625,O2A,10.615997,F3,6.615,A242,O2A,2.916646,F2,3.817497,GLY_A468..F1(2.55); SER_A246..F2(2.70),,A,GLY,180,F1,3.655413,A183,2.780986,3.5,x-ray diffraction,SMOOTH MUSCLE MYOSIN MOTOR DOMAIN-ESSENTIAL LI...,1998-08-26,,2.159976,ASP_A465,3.946227,3.889449,True,True,LIG,Ok
1,TRAFAC,PF00063,Myosin_head,Myosin head (motor domain),P10587,MYH11_CHICK,Myosin-11,MYH11,1br1,0,ADP,C,998,ALF,C,999,MG,,,,,,C,250,O1A,8.718588,F4,7.675519,C,247,O2A,9.955953,F3,5.308627,O2A,11.298747,F3,6.233797,O2A,10.617236,F3,6.68114,C242,O2A,2.969201,F2,3.955435,GLY_C468..F1(2.46); SER_C246..F2(2.75),,C,GLY,180,F1,3.738719,C183,2.751563,3.5,x-ray diffraction,SMOOTH MUSCLE MYOSIN MOTOR DOMAIN-ESSENTIAL LI...,1998-08-26,,2.166397,ASP_C465,3.922912,3.763648,True,True,LIG,Ok
2,TRAFAC,PF00063,Myosin_head,Myosin head (motor domain),P10587,MYH11_CHICK,Myosin-11,MYH11,1br1,0,ADP,E,998,ALF,E,999,MG,,,,,,E,250,O1A,8.767497,F4,7.666852,E,247,O2A,10.116271,F3,5.496902,O2A,11.47034,F3,6.434052,O2A,10.82166,F3,6.89429,E242,O2A,3.156481,F2,3.965111,GLY_E468..F1(2.48); SER_E179..F1(3.99); SER_E2...,,E,GLY,180,F1,3.665947,E183,2.710544,3.5,x-ray diffraction,SMOOTH MUSCLE MYOSIN MOTOR DOMAIN-ESSENTIAL LI...,1998-08-26,,2.076092,ASP_E465,3.938586,3.718373,True,True,LIG,Ok
3,TRAFAC,PF00063,Myosin_head,Myosin head (motor domain),P10587,MYH11_CHICK,Myosin-11,MYH11,1br1,0,ADP,G,998,ALF,G,999,MG,,,,,,G,250,O1A,8.785262,F4,7.654045,G,247,O2A,10.132792,F3,5.477052,O2A,11.498631,F3,6.433871,O2A,10.795943,F3,6.833364,G242,O2A,3.129764,F2,3.859841,GLY_G468..F1(2.53); SER_G246..F2(2.78),,G,GLY,180,F1,3.664732,G183,2.774285,3.5,x-ray diffraction,SMOOTH MUSCLE MYOSIN MOTOR DOMAIN-ESSENTIAL LI...,1998-08-26,,2.066263,ASP_G465,3.922849,3.771162,True,True,LIG,Ok
4,TRAFAC,PF00063,Myosin_head,Myosin head (motor domain),P10587,MYH11_CHICK,Myosin-11,MYH11,1br2,0,ADP,A,998,ALF,A,999,MG,,,,,,A,250,O1A,8.386499,F4,7.087786,A,247,O3A,9.803608,F3,4.609398,O3A,12.066981,F3,6.799509,O2A,11.023004,F3,6.514483,A242,O2A,2.941894,F2,3.716788,GLY_A468..F1(2.54); SER_A246..F2(2.58),,A,GLY,180,F3,3.980544,A183,2.752079,2.9,x-ray diffraction,SMOOTH MUSCLE MYOSIN MOTOR DOMAIN COMPLEXED WI...,1998-08-26,,2.013697,ASP_A465,2.795241,3.434139,True,True,LIG,Ok


### 3.3 Lastly, Assign interaction type to finger residues

In [134]:
#Load table with sites we have checked manually
d_sites_checked=pd.read_csv(os.path.join(ploop_in_dir,sites_checked),sep="\t")
d_sites_checked.head()

Unnamed: 0,PDB,nuc,arg,argtype
0,1kof,B501,B124,NH2 weak*
1,1qhx,A501,A133,NH2 weak*
2,1wq1,R167,G789,FORK*
3,2bc9,A593,A48,FORK*
4,2g83,B355,B178,NH1 weak*


In [128]:
from pyploop.describe_finger import *

In [129]:
#Assign types only to "high-quality" sites
dists_selected=dists2[dists2.include.isna()].copy()
assign_arg_type(dists_selected, d_sites_checked)
assign_mono_type(dists_selected)
assign_type(dists_selected)
dists_selected.to_csv(os.path.join(ploop_wdir,f"{year}_{spec}_t3_dists_with_type_.tsv"), sep="\t")

In [132]:
dists_selected.tail()

Unnamed: 0,superfamily,pfam_acc,pfam_domain,domain_name,Uniprot_Ac,Uniprot_Id,Protein_name_up,Gene_name_up,PDBID,model,nuc_type,nuc_chain,nuc_id,nuc_gamma_moiety_type,nuc_gamma_moiety_chain,nuc_gamma_moiety_id,mg_in_site_4a_from_beta,AG-site,G-site,TYPE_ARG,LYS-site,asn_TYPE,Lys_res_ch,Lys_res_id,nz-alpha-atom,nz-alpha-dist,nz-gamma-atom,nz-gamma-dist,Arg_res_ch,Arg_res_id,nh1-alpha-atom,nh1-alpha-dist,nh1-gamma-atom,nh1-gamma-dist,nh2-alpha-atom,nh2-alpha-dist,nh2-gamma-atom,nh2-gamma-dist,ne-alpha-atom,ne-alpha-dist,ne-gamma-atom,ne-gamma-dist,asn_ID,asn_ne2-alpha-atom,asn_ne2-alpha-dist,asn_ne2-gamma-atom,asn_ne2-gamma-dist,Surr_N_BB,Surr_N_SCh,gly13-chain,gly13-type,gly13-id,nuc-to-g13-atom,dist-gly13,lys-ploop-info,ploopk-dist,resolution,method,pdbname,pdbdate,include,SerK+1-Mg,WB-Asp/Glu,WBD-SerK+1_dist,WBD-Mg,water_present,is_hydro,preceding_res,pfam_comm
3662,TRAFAC,PF02263,GBP,"Guanylate-binding protein, N-terminal domain",Q8WXF7,ATLA1_HUMAN,Atlastin-1,ATL1 GBP3 SPG3A,6b9f,0,GDP,B,506,ALF,B,507,MG,ARG,NONE,NH2.,NONE,NONE,B,78,O3A,9.275437,F1,9.620522,B,77,O1A,4.430065,F4,4.947354,O1A,2.835662,F4,2.957734,O3A,4.683665,F1,2.908955,B177,O3A,7.003891,F3,7.95662,GLU_B119..F1(3.91); THR_B120..F2(3.59); GLY_B1...,,B,ARG,77,F1,2.910028,B80,2.570031,1.9,x-ray diffraction,Human ATL1 mutant - F151S bound to GDPAlF4-,2017-10-10,,2.148698,ASP_B146,2.667157,4.004481,True,True,MLL,Ok
3663,TRAFAC,PF00350,Dynamin_N,Dynamin family,G0SFF0,G0SFF0_CHATD,Putative sorting protein,CTHT_0061810,6djq,0,GDP,A,1001,ALF,A,1002,MG,NONE,NONE,NONE.,NONE,NONE,A,240,O3A,6.848152,F3,8.733605,A,71,O1A,7.048779,F4,10.52467,O1A,7.509537,F1,11.5825,O1A,5.268142,F4,9.685854,B217,O3A,9.367575,F3,7.095841,VAL_A76..F1(2.59); THR_A77..F1(2.98); GLY_A173...,,A,SER,53,F3,2.8908,A56,2.572303,3.1,x-ray diffraction,Vps1 GTPase-BSE fusion complexed with GDP.AlF4-,2018-05-25,,2.087833,ASP_A170,2.357399,3.649189,True,False,VLT,Ok
3664,TRAFAC,PF00350,Dynamin_N,Dynamin family,G0SFF0,G0SFF0_CHATD,Putative sorting protein,CTHT_0061810,6djq,0,GDP,B,1001,ALF,B,1002,MG,NONE,NONE,NONE.,NONE,NONE,B,240,O3A,5.68091,F3,8.498302,B,71,O2A,6.883111,F1,11.194972,O2A,8.333436,F1,12.124759,O2A,6.16356,F4,10.001845,A217,O3A,9.179136,F3,7.005194,THR_B77..F1(3.01); VAL_B76..F1(2.72); GLY_B173...,,B,SER,53,F3,2.925606,B56,2.678724,3.1,x-ray diffraction,Vps1 GTPase-BSE fusion complexed with GDP.AlF4-,2018-05-25,,2.138869,ASP_B170,2.477235,3.904188,True,False,VLT,Ok
3665,TRAFAC,PF00350,Dynamin_N,Dynamin family,G0SFF0,G0SFF0_CHATD,Putative sorting protein,CTHT_0061810,6djq,0,GDP,C,1001,ALF,C,1002,MG,NONE,NONE,NONE.,NONE,NONE,C,240,O3A,7.143929,F3,8.903456,C,71,O1A,7.163232,F4,10.982876,O1A,8.035327,F1,11.978857,O1A,5.742301,F1,10.138892,D217,O3A,9.486238,F3,7.20348,THR_C77..F1(3.10); VAL_C76..F1(2.58); GLY_C173...,,C,SER,53,F3,3.113955,C56,2.577422,3.1,x-ray diffraction,Vps1 GTPase-BSE fusion complexed with GDP.AlF4-,2018-05-25,,2.35699,ASP_C170,2.457356,4.060771,True,False,VLT,Ok
3666,TRAFAC,PF00350,Dynamin_N,Dynamin family,G0SFF0,G0SFF0_CHATD,Putative sorting protein,CTHT_0061810,6djq,0,GDP,D,1001,ALF,D,1002,MG,NONE,NONE,NONE.,NONE,NONE,D,240,O3A,6.579934,F3,8.465226,D,71,O2A,6.032832,F1,10.871549,O2A,7.717821,F1,11.878605,O2A,5.749921,F4,9.888748,C217,O3A,9.348441,F3,7.189288,VAL_D76..F1(2.74); THR_D77..F1(2.96); GLY_D173...,,D,SER,53,F3,2.800873,D56,2.705607,3.1,x-ray diffraction,Vps1 GTPase-BSE fusion complexed with GDP.AlF4-,2018-05-25,,2.096805,ASP_D170,2.442405,3.636164,True,False,VLT,Ok


#### <i> A more compact version of the table for Supplementary </i>

In [154]:
dists_selected.index.name="Site_Id"
dists_selected.tail(5)

Unnamed: 0_level_0,superfamily,pfam_acc,pfam_domain,domain_name,Uniprot_Ac,Uniprot_Id,Protein_name_up,Gene_name_up,PDBID,model,nuc_type,nuc_chain,nuc_id,nuc_gamma_moiety_type,nuc_gamma_moiety_chain,nuc_gamma_moiety_id,mg_in_site_4a_from_beta,AG-site,G-site,TYPE_ARG,LYS-site,asn_TYPE,Lys_res_ch,Lys_res_id,nz-alpha-atom,nz-alpha-dist,nz-gamma-atom,nz-gamma-dist,Arg_res_ch,Arg_res_id,nh1-alpha-atom,nh1-alpha-dist,nh1-gamma-atom,nh1-gamma-dist,nh2-alpha-atom,nh2-alpha-dist,nh2-gamma-atom,nh2-gamma-dist,ne-alpha-atom,ne-alpha-dist,ne-gamma-atom,ne-gamma-dist,asn_ID,asn_ne2-alpha-atom,asn_ne2-alpha-dist,asn_ne2-gamma-atom,asn_ne2-gamma-dist,Surr_N_BB,Surr_N_SCh,gly13-chain,gly13-type,gly13-id,nuc-to-g13-atom,dist-gly13,lys-ploop-info,ploopk-dist,resolution,method,pdbname,pdbdate,include,SerK+1-Mg,WB-Asp/Glu,WBD-SerK+1_dist,WBD-Mg,water_present,is_hydro,preceding_res,pfam_comm
Site_Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1
3662,TRAFAC,PF02263,GBP,"Guanylate-binding protein, N-terminal domain",Q8WXF7,ATLA1_HUMAN,Atlastin-1,ATL1 GBP3 SPG3A,6b9f,0,GDP,B,506,ALF,B,507,MG,ARG,NONE,NH2.,NONE,NONE,B,78,O3A,9.275437,F1,9.620522,B,77,O1A,4.430065,F4,4.947354,O1A,2.835662,F4,2.957734,O3A,4.683665,F1,2.908955,B177,O3A,7.003891,F3,7.95662,GLU_B119..F1(3.91); THR_B120..F2(3.59); GLY_B1...,,B,ARG,77,F1,2.910028,B80,2.570031,1.9,x-ray diffraction,Human ATL1 mutant - F151S bound to GDPAlF4-,2017-10-10,,2.148698,ASP_B146,2.667157,4.004481,True,True,MLL,Ok
3663,TRAFAC,PF00350,Dynamin_N,Dynamin family,G0SFF0,G0SFF0_CHATD,Putative sorting protein,CTHT_0061810,6djq,0,GDP,A,1001,ALF,A,1002,MG,NONE,NONE,NONE.,NONE,NONE,A,240,O3A,6.848152,F3,8.733605,A,71,O1A,7.048779,F4,10.52467,O1A,7.509537,F1,11.5825,O1A,5.268142,F4,9.685854,B217,O3A,9.367575,F3,7.095841,VAL_A76..F1(2.59); THR_A77..F1(2.98); GLY_A173...,,A,SER,53,F3,2.8908,A56,2.572303,3.1,x-ray diffraction,Vps1 GTPase-BSE fusion complexed with GDP.AlF4-,2018-05-25,,2.087833,ASP_A170,2.357399,3.649189,True,False,VLT,Ok
3664,TRAFAC,PF00350,Dynamin_N,Dynamin family,G0SFF0,G0SFF0_CHATD,Putative sorting protein,CTHT_0061810,6djq,0,GDP,B,1001,ALF,B,1002,MG,NONE,NONE,NONE.,NONE,NONE,B,240,O3A,5.68091,F3,8.498302,B,71,O2A,6.883111,F1,11.194972,O2A,8.333436,F1,12.124759,O2A,6.16356,F4,10.001845,A217,O3A,9.179136,F3,7.005194,THR_B77..F1(3.01); VAL_B76..F1(2.72); GLY_B173...,,B,SER,53,F3,2.925606,B56,2.678724,3.1,x-ray diffraction,Vps1 GTPase-BSE fusion complexed with GDP.AlF4-,2018-05-25,,2.138869,ASP_B170,2.477235,3.904188,True,False,VLT,Ok
3665,TRAFAC,PF00350,Dynamin_N,Dynamin family,G0SFF0,G0SFF0_CHATD,Putative sorting protein,CTHT_0061810,6djq,0,GDP,C,1001,ALF,C,1002,MG,NONE,NONE,NONE.,NONE,NONE,C,240,O3A,7.143929,F3,8.903456,C,71,O1A,7.163232,F4,10.982876,O1A,8.035327,F1,11.978857,O1A,5.742301,F1,10.138892,D217,O3A,9.486238,F3,7.20348,THR_C77..F1(3.10); VAL_C76..F1(2.58); GLY_C173...,,C,SER,53,F3,3.113955,C56,2.577422,3.1,x-ray diffraction,Vps1 GTPase-BSE fusion complexed with GDP.AlF4-,2018-05-25,,2.35699,ASP_C170,2.457356,4.060771,True,False,VLT,Ok
3666,TRAFAC,PF00350,Dynamin_N,Dynamin family,G0SFF0,G0SFF0_CHATD,Putative sorting protein,CTHT_0061810,6djq,0,GDP,D,1001,ALF,D,1002,MG,NONE,NONE,NONE.,NONE,NONE,D,240,O3A,6.579934,F3,8.465226,D,71,O2A,6.032832,F1,10.871549,O2A,7.717821,F1,11.878605,O2A,5.749921,F4,9.888748,C217,O3A,9.348441,F3,7.189288,VAL_D76..F1(2.74); THR_D77..F1(2.96); GLY_D173...,,D,SER,53,F3,2.800873,D56,2.705607,3.1,x-ray diffraction,Vps1 GTPase-BSE fusion complexed with GDP.AlF4-,2018-05-25,,2.096805,ASP_D170,2.442405,3.636164,True,False,VLT,Ok


In [155]:
from pyploop.table_features import format_for_excel


In [156]:
df_dist_formatted_for_excel=format_for_excel(dists_selected)
df_dist_formatted_for_excel.to_csv(os.path.join(ploop_wdir,f"{year}_{spec}_dists_FOR_XLSX.tsv"), sep="\t")
df_dist_formatted_for_excel.head(15)

Unnamed: 0_level_0,superfamily,pfam_acc,pfam_domain,Uniprot_Id,Protein_name_up,PDBID,resolution,method,nucleotide_type,nucleotide_id,model,mg_in_site_4a_from_beta,AG-site,G-site,lys_id,LYS-site,nz-alpha-atom,nz-alpha-dist,nz-gamma-atom,nz-gamma-dist,arg_id,TYPE_ARG,nh1-alpha-atom,nh1-alpha-dist,nh1-gamma-atom,nh1-gamma-dist,nh2-alpha-atom,nh2-alpha-dist,nh2-gamma-atom,nh2-gamma-dist,ne-alpha-atom,ne-alpha-dist,ne-gamma-atom,ne-gamma-dist,asn_ID,asn_TYPE,asn_ne2-alpha-atom,asn_ne2-alpha-dist,asn_ne2-gamma-atom,asn_ne2-gamma-dist,Surr_N_BB,Surr_N_SCh,gly13,nuc-to-g13-atom,dist-gly13,lys-ploop-info,ploopk-dist,SerK+1-Mg,WB-Asp/Glu,WBD-SerK+1_dist,WBD-Mg,is_hydro,preceding_res,water_present
Site_Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1
0,TRAFAC,PF00063,Myosin_head,MYH11_CHICK,Myosin-11,1br1,3.5,x-ray diffraction,ADP-ALF,A998,0,MG,ASN,NONE,A250,NONE,O1A,8.639585,F4,7.628103,A247,NONE.,O2A,10.024054,F3,5.33214,O2A,11.278693,F3,6.160625,O2A,10.615997,F3,6.615,A242,AG_weak,O2A,2.916646,F2,3.817497,GLY_A468..F1(2.55); SER_A246..F2(2.70),,GLY A180,F1,3.655413,A183,2.780986,2.159976,ASP_A465,3.946227,3.889449,True,LIG,True
1,TRAFAC,PF00063,Myosin_head,MYH11_CHICK,Myosin-11,1br1,3.5,x-ray diffraction,ADP-ALF,C998,0,MG,ASN,NONE,C250,NONE,O1A,8.718588,F4,7.675519,C247,NONE.,O2A,9.955953,F3,5.308627,O2A,11.298747,F3,6.233797,O2A,10.617236,F3,6.68114,C242,AG_weak,O2A,2.969201,F2,3.955435,GLY_C468..F1(2.46); SER_C246..F2(2.75),,GLY C180,F1,3.738719,C183,2.751563,2.166397,ASP_C465,3.922912,3.763648,True,LIG,True
2,TRAFAC,PF00063,Myosin_head,MYH11_CHICK,Myosin-11,1br1,3.5,x-ray diffraction,ADP-ALF,E998,0,MG,ASN,NONE,E250,NONE,O1A,8.767497,F4,7.666852,E247,NONE.,O2A,10.116271,F3,5.496902,O2A,11.47034,F3,6.434052,O2A,10.82166,F3,6.89429,E242,AG_weak,O2A,3.156481,F2,3.965111,GLY_E468..F1(2.48); SER_E179..F1(3.99); SER_E2...,,GLY E180,F1,3.665947,E183,2.710544,2.076092,ASP_E465,3.938586,3.718373,True,LIG,True
3,TRAFAC,PF00063,Myosin_head,MYH11_CHICK,Myosin-11,1br1,3.5,x-ray diffraction,ADP-ALF,G998,0,MG,ASN,NONE,G250,NONE,O1A,8.785262,F4,7.654045,G247,NONE.,O2A,10.132792,F3,5.477052,O2A,11.498631,F3,6.433871,O2A,10.795943,F3,6.833364,G242,AG_weak,O2A,3.129764,F2,3.859841,GLY_G468..F1(2.53); SER_G246..F2(2.78),,GLY G180,F1,3.664732,G183,2.774285,2.066263,ASP_G465,3.922849,3.771162,True,LIG,True
4,TRAFAC,PF00063,Myosin_head,MYH11_CHICK,Myosin-11,1br2,2.9,x-ray diffraction,ADP-ALF,A998,0,MG,ASN,NONE,A250,NONE,O1A,8.386499,F4,7.087786,A247,NONE.,O3A,9.803608,F3,4.609398,O3A,12.066981,F3,6.799509,O2A,11.023004,F3,6.514483,A242,AG_weak,O2A,2.941894,F2,3.716788,GLY_A468..F1(2.54); SER_A246..F2(2.58),,GLY A180,F3,3.980544,A183,2.752079,2.013697,ASP_A465,2.795241,3.434139,True,LIG,True
5,TRAFAC,PF00063,Myosin_head,MYH11_CHICK,Myosin-11,1br2,2.9,x-ray diffraction,ADP-ALF,B998,0,MG,ASN,NONE,B250,NONE,O1A,8.40833,F4,7.23611,B247,NONE.,O3A,9.711628,F3,4.545388,O3A,11.985205,F3,6.734874,O2A,10.912102,F3,6.481107,B242,AG_weak,O2A,2.807305,F2,3.755559,GLY_B468..F1(2.43); SER_B246..F2(2.62),,GLY B180,F3,4.008591,B183,2.634789,2.177655,ASP_B465,2.802485,3.405205,True,LIG,True
6,TRAFAC,PF00063,Myosin_head,MYH11_CHICK,Myosin-11,1br2,2.9,x-ray diffraction,ADP-ALF,C998,0,MG,ASN,NONE,C250,NONE,O1A,8.46063,F4,7.15581,C247,NONE.,O3A,9.787507,F3,4.580737,O3A,12.043881,F3,6.769961,O2A,11.005755,F3,6.470978,C242,AG_weak,O2A,2.85783,F2,3.6093,GLY_C468..F1(2.62); SER_C246..F2(2.55),,GLY C180,F3,3.942289,C183,2.83058,2.067558,ASP_C465,2.815989,3.559766,True,LIG,True
7,TRAFAC,PF00063,Myosin_head,MYH11_CHICK,Myosin-11,1br2,2.9,x-ray diffraction,ADP-ALF,D998,0,MG,ASN,NONE,D250,NONE,O1A,8.38424,F4,7.226902,D247,NONE.,O3A,9.689243,F3,4.517084,O3A,11.947335,F3,6.694661,O2A,10.901466,F3,6.465443,D242,AG_weak,O2A,2.823901,F2,3.815986,GLY_D468..F1(2.40); SER_D246..F2(2.62),,GLY D180,F3,4.026449,D183,2.592428,2.151943,ASP_D465,2.802868,3.383194,True,LIG,True
8,TRAFAC,PF00063,Myosin_head,MYH11_CHICK,Myosin-11,1br2,2.9,x-ray diffraction,ADP-ALF,E998,0,MG,ASN,NONE,E250,NONE,O1A,8.411736,F4,7.183222,E247,NONE.,O2A,9.792457,F3,4.501697,O2A,12.063276,F3,6.689701,O2A,10.941878,F3,6.426057,E242,AG_weak,O2A,2.87551,F2,3.830085,GLY_E468..F1(2.46); SER_E246..F2(2.57),,GLY E180,F3,4.025989,E183,2.768176,2.075293,ASP_E465,2.807995,3.319242,True,LIG,True
9,TRAFAC,PF00063,Myosin_head,MYH11_CHICK,Myosin-11,1br2,2.9,x-ray diffraction,ADP-ALF,F998,0,MG,ASN,NONE,F250,NONE,O1A,8.376718,F4,7.21244,F247,NONE.,O2A,9.781414,F3,4.509281,O2A,12.049294,F3,6.697526,O2A,10.925579,F3,6.439191,F242,AG_weak,O2A,2.883935,F2,3.827353,GLY_F468..F1(2.47); SER_F246..F2(2.57),,GLY F180,F3,4.008728,F183,2.715567,2.071716,ASP_F465,2.809554,3.334862,True,LIG,True


In [179]:
pdbs_ploop

Unnamed: 0,PDB_ID,CHAIN_ID,2,PFAM_Name,PFAM_ACC,Accesion,Position,pdb_id
367838,5HCI,A,,ATP_bind_1,PF03029,P47122,8-258,5hci
367839,5HCI,D,,ATP_bind_1,PF03029,P47122,8-258,5hci
367840,5HCI,E,,ATP_bind_1,PF03029,P47122,8-258,5hci
367841,5HCI,F,,ATP_bind_1,PF03029,P47122,8-258,5hci
367842,5HCI,C,,ATP_bind_1,PF03029,P47122,8-258,5hci
...,...,...,...,...,...,...,...,...
388775,4ETP,B,,Microtub_bd,PF16796,Q12045,354-494,4etp
388776,4GKQ,A,,Microtub_bd,PF16796,Q6FSG8,303-440,4gkq
388777,4GKQ,B,,Microtub_bd,PF16796,Q6FSG8,311-440,4gkq
388778,4GKP,A,,Microtub_bd,PF16796,Q6FSG8,325-440,4gkp
