# Working with PDBePISA interface lists/reports in Jupyter Basics

Usually you'll want to get some data from PDBePISA and analyze it. For the first examples in this series of notebooks, I'll cover how to bring in a file listing interface details for a macromolecular complex and then progress through using that in combination with Python to analyze the results and ultimately compare the results to a different structure.  
The first few notebooks are introductory and targeted at biologists who are familiar with structure files; however, may not be used to using Python and Jupyter notebooks for convenient & reproducible analysis.

This particular notebook largely parallels the first one available from launches from [pdbsum-binder](https://github.com/fomightez/pdbsum-binder). And in fact, it covers dealing with data from complexes as well and so you may be interested in it for the scientific content as well as the fact it is using similar data via a script to take advantage of Jupyter/Python/Pandas.

If you are familiar with using Python scripts in Jupyter & want to quickly get an overview of what the script `pisa_interface_list_to_df.py` demonstrated in the next few notebooks can do, see the notebook [Single notebook converting a variety of formats of interface lists/reports to test handling by pisa_interface_list_to_df.py](notebooks/tests_of_pisa_interface_list_to_df.py.ipynb) as it is along the lines of a 'quick-start' for the script.

-----

<div class="alert alert-block alert-warning">
<p>If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!</p>

<p>
    Some tips:
    <ul>
        <li>Code cells have boxes around them.</li>
        <li>To run a code cell, click on the cell and either click the <i class="fa-play fa"></i> button on the toolbar above, or then hit <b>Shift+Enter</b>. The <b>Shift+Enter</b> combo will also move you to the next cell, so it's a quick way to work through the notebook. Selecting from the menu above the toolbar, <b>Cell</b> > <b>Run All</b> is a shortcut to trigger attempting to run all the cells in the notebook.</li>
        <li>While a cell is running a <b>*</b> appears in the square brackets next to the cell. Once the cell has finished running the asterisk will be replaced with a number.</li>
        <li>In most cases you'll want to start from the top of notebook and work your way down running each cell in turn. Later cells might depend on the results of earlier ones.</li>
        <li>To edit a code cell, just click on it and type stuff. Remember to run the cell once you've finished editing.</li>
    </ul>
</p>
</div>

----

## Demonstrating the script to make a dataframe from PDBePISA interface lists/reports

When you go to https://www.ebi.ac.uk/pdbe/pisa/pistart.html  and press `Launch PDBePISA` button and then enter your favorite PDB identifier code and then press the 'Interfaces' button, you'll get an  Interface report/list in the form of a table with the information for structure of the complex linked to that code. The Interface report/list you see looks great; however, that table isn't set up for easily using to downstream analysis.  
The script that will be demonstrated here overcomes that issue and provides you with a Pandas dataframe that represents the data you'd see at the site. Pandas dataframes are computational objects you can easily use in subsequent analysis or save into forms you can easily bring into Excel.

If you haven't encountered Pandas dataframes before I suggest you see the first two notebooks that come up with you launch a session from my [blast-binder](https://github.com/fomightez/blast-binder) site. Those first two notebooks cover using the dataframe containing BLAST results some. 


### Preparation: What is needed to use the script?

Three things are needed:

- An environment where you can run the script.

- The script itself.

- A PDB identifier code for an entry at the Protein Datan Bank.

These demonstration notebooks provide the first two requirments right in your browser.  

So if you got here by clicking `launch binder` at some point recently, **the only thing you really need is the PDB identifier code for the complex of interest to you.** 

Once you see how use of the script works, you can change the demonstration PDB identifier codes used here to get the relevant information for your favorite proteins.

You'll want to download any useful information you suggest to your local computer as this session provided via MyBinder.org is temporary. The demonstration will cover doing that once an example is generated. 

Subsequent Jupyter notebooks in this series will cover using this with a lot of PDB identifier codes to scale up analysis of a lot of complexes.

Let's finish the preparation and then begin with an example.

### Preparation: Fetch the script.

The script is stored on Github and running the next cell will bring a copy of it into the working directory here. (It is not included in the repository where this launches from to insure you always get the most current version, which is assumed to be the best available at the time.)

In [1]:
# Get a file if not yet retrieved / check if file exists
import os
file_needed = "pisa_interface_list_to_df.py"
if not os.path.isfile(file_needed):
    !curl -OL https://raw.githubusercontent.com/fomightez/structurework/master/pdbepisa-utilities/{file_needed}

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 42992  100 42992    0     0   228k      0 --:--:-- --:--:-- --:--:--  228k


### Using the script: Using the script as you would on the command line.

First will cover an example of using the script, much like you would on the command line in a terminal interface on your computer. If that doesn't yet mean much, that's okay because an interface much like that is provided right here.

If you've run the cells above, we have the script now. To process the structure, run the next command where we use Python to run the script `pisa_interface_list_to_df.py` and tell it we want it to process the information for the structure that corresponds to the PDB indentifier code [4fgf](https://www.rcsb.org/structure/4FGF) by providing that as text after calling the script with `%run` in front of it.

In [2]:
%run pisa_interface_list_to_df.py 4fgf

Output()

Note that if gives you feedback at the end that says:

```text
RESULTING DATAFRAME is stored as ==> 
'4fgf_PISAinterface_summary_pickled_df.pkl'
```

Where is the result? 

We'll get to that soon, but I wanted to point out a few things first.

Note that the script doesn't care about the case of the PDB code. You could have used the following and it would work:

```shell
%run pisa_interface_list_to_df.py 4FGF
```

I prefer lower case as it is often easier to read PDB codes written that way because there is less confusion that way. For example, `4FG0` and `4FGO` look very similar when written in some contexts, whereas with `4fgO` and `4fgo` the difference can be more discernible.

Part of the script running is retrieving the information corresponding to the provided PDB identification code. There is a way discussed below to actually feed the script the text that you may have already copied from the PISA interface report/list table. While that may not be necessary for most researchers interested in this script, it may be of interest if you need to use this offline and already have the text.

*What about the result?*

The script saves the dataframe produced in a binarym compact 'pickled' format that is portable and that Python can recognize amd read. We'll read that form of it back in to view it here. (Below, we'll demonstrate using the main function right in the notebook and then we'll see can skip this step if we prefer.)

We'll use a Python package called [Pandas](https://pandas.pydata.org/) that is for Python data analysis. Much of what it does is bring the concept of dataframes or panel data into Python. Anyone who has used the language R will know in that language dataframes are part of the core list of data types. Python is a broad language and doesn't have that data type built in because a lot of Python users don't need it. Pandas is already installed in this environment, and so we can use it here to bring in use of this data type in the Python available here.  
Specfically, the import statement at the start of the cell below does that.    
Once Pandas is imported as `pd` we use it to read in the 'pickled' dataframe and then display it.

In [3]:
import pandas as pd
df = pd.read_pickle("4fgf_PISAinterface_summary_pickled_df.pkl")
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Chain 1,Chain 1,Chain 1,Chain 1,x,Chain 2,Chain 2,Chain 2,Chain 2,Chain 2,Chain 2,Interface,Interface,Interface,Interface,Interface,Interface,Interface
Unnamed: 0_level_1,row #,Chain label,Number_InterfacingAtoms,Number_InterfacingResidues,Surface (Å$^2$),Unnamed: 6_level_1,Chain label,SymOp,SymID,Number_InterfacingAtoms,Number_InterfacingResidues,Surface (Å$^2$),Area (Å$^2$),Solvation free energy gain,Solvation gain P-value,Hydrogen bonds,Salt Bridges,Disuflides,CSS
0,1,A,31,9,6453,x,A,"x-1,y,z",1_455,38,15,6453,340.7,-1.4,0.359,4,2,0,0.0
1,2,A,22,6,6453,x,A,"x,y-1,z+1",1_546,34,11,6453,257.9,2.2,0.685,1,3,0,0.0
2,3,A,22,8,6453,x,A,"x,y-1,z",1_545,17,6,6453,167.5,-0.7,0.438,2,0,0,0.0
3,4,A,19,4,6453,x,A,"x,y,z-1",1_554,17,5,6453,146.3,0.0,0.477,2,4,0,0.0
4,5,[BME]A:149,4,1,208,◊,A,"x,y,z",1_555,13,6,6453,107.4,-0.6,0.358,2,0,0,0.073
5,6,[SO4]A:147,5,1,185,f,A,"x,y,z",1_555,18,5,6453,101.4,-14.0,0.853,5,0,0,0.783
6,7,[BME]A:148,4,1,208,◊,A,"x,y,z",1_555,10,4,6453,72.4,-2.5,0.139,1,0,0,0.0
7,8,A,10,4,6453,f,[BME]A:148,"x,y-1,z",1_545,4,1,208,63.1,0.5,0.465,0,0,0,0.0
8,9,A,4,2,6453,x,A,"x-1,y,z+1",1_456,6,3,6453,53.9,1.4,0.794,1,2,0,0.0
9,10,[SO4]A:147,4,1,185,f,[BME]A:148,"x,y-1,z",1_545,4,1,208,37.7,-3.5,0.747,0,0,0,0.167


Jupyter has a viewing mode for dataframes built right in and that is why the dataframe renders nicely above.  
Jupyter also has a convenience feature that it will try to display any Python object that is on the last line of a cell. That's why we can just reference that dataframe that's been assigned on that last line of the cell.

You can see the resulting dataframe above. You'll need to scroll to the right to see it all unless you have a really wide screen. There's a lot of columns.

It probably looks a lot like what you'd see display as a table if you went to https://www.ebi.ac.uk/pdbe/pisa/pistart.html, pressed `Launch PDBePISA` button, and brought up the interfaces page for the PDB indentifier code [4fgf](https://www.rcsb.org/structure/4FGF). However, they only look to yield similar tables. The rendering of the dataframe hides that it is a full computational object. We'll explore this more in the second notebook in the series to show some of the power having the data in a dataframe provides. First though I'll add cover running the script some other ways.

#### What about on your machine without Jupyter (or IPython)?

**IMPORTANTLY:**  
On your own machine, outside of Jupyter (or IPython), you'd replace `%run` with `python` (or perhaps `python3`, depending on your Python installation) if you wanted to run the script on your typical command line. So using the script in a terminal, would look something like the following after the prompt:

```shell
python pisa_interface_list_to_df.py 4fgf
```

That is telling your system to use Python to run the script and the rest is the name of the script and then the PDB code to pass to the script. Substitute the PDB code of your favorite complex.

Note: you'd have to have the script placed in that working directory. If it's not already there, you may be able to use the following command to get it:

```shell
curl -OL https://raw.githubusercontent.com/fomightez/structurework/master/pdbepisa-utilities/pisa_interface_list_to_df.py
```

If that fails, try the following to use `wget`:

```shell
wget https://raw.githubusercontent.com/fomightez/structurework/master/pdbepisa-utilities/pisa_interface_list_to_df.py
```

Both `wget` and `curl` do pretty much the same thing; however, some machines only have one or the other installed.

*OPTIONAL*: Jupyter actually has a terminal as part of it, and so if you wanted you could use `python pisa_interface_list_to_df.py 3gcb` to run the script in this active session. You can get the terminal by pressing the Jupyter icon in the upper left side and then opening a termnal from the page that comes.  You'd need to use `cd` to change into the `notebooks` directory where this notebook is currently being run. 

**That covers the basics of using the script.**  
If you are interested in what you can do with the dataframe continue on to the next notebook in the series, [Making dataframes dervied from PDBePISA interface lists/reports clearer by adding protein names](working_with_dataframes_and_making_clearer.ipynb), which includes some light coverage of how to work with the dataframe along with making it better by replacing chain designations with clear names. Before that though other ways to run the script are covered here to produce dataframes, including one that uses local text from the table. Additionally, beside showing how the script can use copied text the end of the notebooks provides a sense of what is happening in the script by also featuring the route to fetching the data corresponding to the proivided PDB code. Feel free to skip the sections below if you are already happy with what has been shown and seek to work with the produced dataframe.

----

### Using the script: Using the main function imported into Python

The script was written so the main function can be imported into Python and then used by providing the PDB identifier code in a call to the function.

This is going to rely on approaches very similar to those illustrated [here](https://github.com/fomightez/patmatch-binder/blob/6f7630b2ee061079a72cd117127328fd1abfa6c7/notebooks/PatMatch%20with%20more%20Python.ipynb#Passing-results-data-into-active-memory-without-a-file-intermediate) and [here](https://github.com/fomightez/patmatch-binder/blob/6f7630b2ee061079a72cd117127328fd1abfa6c7/notebooks/Sending%20PatMatch%20output%20directly%20to%20Python.ipynb##Running-Patmatch-and-passing-the-results-to-Python-without-creating-an-output-file-intermediate).

This Jupyter notebook session is an active Python session and so this can be demonstrated here.

Running the next line will import the main function of the script.

In [4]:
from pisa_interface_list_to_df import pisa_interface_list_to_df

Since it is imported into the current notebook namespace, we can use the main function here and assign the output to a variable and display the result without needing to read in the file intermediate.

In [5]:
dfb = pisa_interface_list_to_df('1trn')
dfb

Output()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Chain 1,Chain 1,Chain 1,Chain 1,x,Chain 2,Chain 2,Chain 2,Chain 2,Chain 2,Chain 2,Interface,Interface,Interface,Interface,Interface,Interface,Interface
Unnamed: 0_level_1,Id,row #,Chain label,Number_InterfacingAtoms,Number_InterfacingResidues,Surface (Å$^2$),Unnamed: 7_level_1,Chain label,SymOp,SymID,Number_InterfacingAtoms,Number_InterfacingResidues,Surface (Å$^2$),Area (Å$^2$),Solvation free energy gain,Solvation gain P-value,Hydrogen bonds,Salt Bridges,Disuflides,CSS
0,1.0,1.0,A,68.0,19.0,9795.0,x,A,"-y,x,z",3_555,81.0,20.0,9795,726.0,-3.3,0.416,8,1,0,0.0
1,,2.0,B,79.0,19.0,9608.0,x,B,"-y+1,x,z",3_655,65.0,20.0,9608,698.3,-2.2,0.553,10,2,0,0.0
2,,,,,,,,,,,,,**_Average:_**,712.1,-2.7,0.484,9,2,0,0.0
3,2.0,3.0,B,26.0,11.0,9608.0,◊,A,"x,y,z",1_555,29.0,13.0,9795,238.1,-1.8,0.399,2,0,0,0.0
4,3.0,4.0,A,27.0,12.0,9795.0,◊,B,"x,y,z-1",1_554,25.0,8.0,9608,216.4,-3.3,0.13,3,0,0,0.0
5,4.0,5.0,B,26.0,9.0,9608.0,◊,A,"-y+1,x,z+1",3_656,19.0,6.0,9795,185.2,-0.8,0.505,3,1,0,0.0
6,5.0,6.0,[ISP]A:301,7.0,1.0,264.0,cf,A,"x,y,z",1_555,30.0,14.0,9795,173.7,1.5,0.627,3,0,0,0.1
7,,7.0,[ISP]B:301,7.0,1.0,265.0,cf,B,"x,y,z",1_555,29.0,13.0,9608,163.6,1.1,0.586,3,0,0,0.1
8,,,,,,,,,,,,,**_Average:_**,168.6,1.3,0.606,3,0,0,0.1
9,6.0,8.0,B,14.0,8.0,9608.0,◊,A,"-y+1,x,z",3_655,10.0,6.0,9795,103.1,2.1,0.878,2,2,0,0.0


Note the file intermediate still gets made so that it can be saved and the stored dataframe read in and used elsewhere without needing to run the script again from the complete start. **Download and keep that pickled dataframe since you'll find it convenient for reading and getting back into an analysis without need for rerunning earlier steps again.** COVER ACCESSING THE PRODUCED FILES HERE???

Beyond eliminating the need to read in the intermediate file, using the function allows addition options to be set. 

Some additional options exist that can be added in calls to the function. Most people won't need these. The PDB code is not optional; however, when calling the function these are additional arguments (followed by their defaults) that can be passed when calling `pisa_interface_list_to_df()`:

- return_df (True), 
- pickle_df (True)
- return_pdb_code(False), 
- adv_debugging (False)
 
Provide `return_df = False` to not return the dataframe.
Provide `return_pdb_code = True` to also return the PDB code as a string along with anything else returned by the function.
Provide `pickle_df= False` to not save the pickled dataframe file.
Provide `adv_debugging = True` to get an expanded traceback that is better for debugging development since it reports local variables in the trackeback.

As an example, here is a call to get the PDB code back in addition to the dataframe:

```python
pdb_code, df = pisa_interface_list_to_df('1trn', return_pdb_code = True)
```

That covers using the core function of the script to produce a dataframe of data withe the PDB code. This may be how you prefer to use the script. Either option exists.

------

## Using local data from PDBePISA that you obtained from copying text off a page to make a dataframe with the script

The script should also be useable to convert text hand-copied for most interface reports/ the list of interactions from PDBePISA for as it was developed to do that in a nascent form.

This section will illustrate an example of that.

Got PISA Interface List by hand copying it from 'Interfaces' page I got when analysing that PDB entry at https://www.ebi.ac.uk/msd-srv/prot_int/cgi-bin/piserver , which you get to by pressing `Launch PDBePISA` at https://www.ebi.ac.uk/pdbe/pisa/pistart.html .

After launch, entered `6agb` and brought up the Interface report by pressing 'Interfaces' near bottom. And then used the mouse to highlight the table and copied the text.  
Then pasted that between the sets of ticks below to then run that cell and assign that string to the variable `s`.

In [6]:
s =''' ## 	 Structure 1 	 × 	 Structure 2 	 interface 
 area, Å2 	 ΔiG 
 kcal/mol 	 ΔiG 
 P-value 	 NHB 	 NSB 	 NDS 	 CSS 
 NN 	 «» 	 Range 	 iNat 	 iNres 	 Surface Å2 	 Range 	 iNat 	 iNres 	 Surface Å2 
 1 		B	 726 	 187 	 45544 	 ◊ 	A	 880 	 117 	 67556 	  7353.3 	  -72.7 	 0.996 	 95 	 0 	 0 	 0.000 
 2 		G	 238 	 65 	 9059 	 ◊ 	A	 266 	 32 	 67556 	  2325.0 	  -24.6 	 0.739 	 23 	 0 	 0 	 0.000 
 3 		D	 244 	 65 	 17023 	 ◊ 	J	 222 	 56 	 16869 	  2290.1 	  -20.0 	 0.174 	 18 	 3 	 0 	 0.000 
 4 		C	 207 	 52 	 12572 	 ◊ 	K	 222 	 58 	 10410 	  2114.0 	  -25.4 	 0.157 	 15 	 0 	 0 	 0.000 
 5 		D	 173 	 48 	 17023 	 ◊ 	A	 231 	 40 	 67556 	  1691.5 	  -18.9 	 0.920 	 24 	 0 	 0 	 0.000 
 6 		D	 154 	 39 	 17023 	 ◊ 	K	 151 	 34 	 10410 	  1475.6 	  -13.5 	 0.211 	 13 	 2 	 0 	 0.000 
 7 		E	 123 	 34 	 9604 	 ◊ 	I	 129 	 30 	 13215 	  1257.1 	  -17.9 	 0.058 	 7 	 0 	 0 	 0.000 
 8 		B	 130 	 39 	 45544 	 ◊ 	G	 126 	 33 	 9059 	  1234.7 	  -10.2 	 0.217 	 10 	 8 	 0 	 0.000 
 9 		E	 106 	 30 	 9604 	 ◊ 	A	 138 	 22 	 67556 	  1186.0 	  -9.8 	 0.916 	 23 	 0 	 0 	 0.000 
 10 		F	 119 	 32 	 9964 	 ◊ 	A	 134 	 23 	 67556 	  1120.8 	  -10.9 	 0.755 	 14 	 0 	 0 	 0.000 
 11 		B	 88 	 21 	 45544 	 ◊ 	E	 123 	 33 	 9604 	  1013.5 	  -16.5 	 0.026 	 7 	 1 	 0 	 0.000 
 12 		B	 119 	 30 	 45544 	 ◊ 	D	 84 	 18 	 17023 	  917.6 	  -11.5 	 0.106 	 4 	 0 	 0 	 0.000 
 13 		H	 94 	 26 	 9292 	 ◊ 	J	 81 	 20 	 16869 	  852.1 	  -7.2 	 0.289 	 9 	 0 	 0 	 0.000 
 14 		H	 81 	 20 	 9292 	 ◊ 	I	 96 	 25 	 13215 	  841.1 	  -8.9 	 0.214 	 4 	 4 	 0 	 0.000 
 15 		F	 91 	 27 	 9964 	 ◊ 	G	 94 	 26 	 9059 	  835.7 	  -9.9 	 0.100 	 13 	 0 	 0 	 0.000 
 16 		K	 93 	 27 	 10410 	 ◊ 	A	 93 	 17 	 67556 	  775.2 	  -9.1 	 0.750 	 5 	 0 	 0 	 0.000 
 17 		E	 69 	 17 	 9604 	 ◊ 	J	 62 	 15 	 16869 	  619.6 	  0.1 	 0.662 	 8 	 4 	 0 	 0.000 
 18 		I	 56 	 17 	 13215 	 ◊ 	A	 54 	 10 	 67556 	  475.6 	  -1.9 	 0.778 	 6 	 0 	 0 	 0.000 
 19 		F	 28 	 9 	 9964 	 ◊ 	I	 33 	 14 	 13215 	  316.5 	  -2.0 	 0.361 	 2 	 2 	 0 	 0.000 
 20 		J	 30 	 9 	 16869 	 ◊ 	A	 34 	 12 	 67556 	  307.0 	  -4.7 	 0.449 	 7 	 0 	 0 	 0.000 
 21 		H	 30 	 7 	 9292 	 ◊ 	A	 41 	 9 	 67556 	  304.2 	  -6.2 	 0.537 	 6 	 0 	 0 	 0.000 
 22 		E	 35 	 12 	 9604 	 ◊ 	H	 35 	 10 	 9292 	  283.0 	  -4.2 	 0.275 	 4 	 0 	 0 	 0.000 
 23 		C	 22 	 6 	 12572 	 ◊ 	D	 26 	 7 	 17023 	  226.6 	  -3.6 	 0.280 	 3 	 0 	 0 	 0.000 
 24 		B	 22 	 6 	 45544 	 ◊ 	I	 25 	 7 	 13215 	  207.1 	  -3.1 	 0.245 	 1 	 0 	 0 	 0.000 
 25 		C	 17 	 8 	 12572 	 ◊ 	A	 28 	 5 	 67556 	  190.7 	  -2.2 	 0.677 	 1 	 0 	 0 	 0.000 
 26 		B	 19 	 6 	 45544 	 ◊ 	F	 19 	 4 	 9964 	  171.3 	  -0.8 	 0.428 	 3 	 0 	 0 	 0.000 
 27 		E	 18 	 5 	 9604 	 ◊ 	G	 20 	 6 	 9059 	  155.9 	  -1.8 	 0.279 	 5 	 0 	 0 	 0.000 
 28 		K	 2 	 2 	 10410 	 ◊ 	[ZN]K:201	 1 	 1 	 98 	  49.0 	  -39.1 	 0.000 	 0 	 0 	 0 	 0.000 
 29 		E	 1 	 1 	 9604 	 ◊ 	F	 1 	 1 	 9964 	  7.7 	  0.2 	 0.797 	 0 	 0 	 0 	 0.000'''

Next, we use some Jupyter magic commad `%store` to save the string as a file.

In [7]:
%store s >"6agb_interface_list.txt"

Writing 's' (str) to file '6agb_interface_list.txt'.


We can see the content assigned  has been saved as a file by running the next cell to see the first few lines.

In [8]:
!head 6agb_interface_list.txt

 ## 	 Structure 1 	 × 	 Structure 2 	 interface 
 area, Å2 	 ΔiG 
 kcal/mol 	 ΔiG 
 P-value 	 NHB 	 NSB 	 NDS 	 CSS 
 NN 	 «» 	 Range 	 iNat 	 iNres 	 Surface Å2 	 Range 	 iNat 	 iNres 	 Surface Å2 
 1 		B	 726 	 187 	 45544 	 ◊ 	A	 880 	 117 	 67556 	  7353.3 	  -72.7 	 0.996 	 95 	 0 	 0 	 0.000 
 2 		G	 238 	 65 	 9059 	 ◊ 	A	 266 	 32 	 67556 	  2325.0 	  -24.6 	 0.739 	 23 	 0 	 0 	 0.000 
 3 		D	 244 	 65 	 17023 	 ◊ 	J	 222 	 56 	 16869 	  2290.1 	  -20.0 	 0.174 	 18 	 3 	 0 	 0.000 
 4 		C	 207 	 52 	 12572 	 ◊ 	K	 222 	 58 	 10410 	  2114.0 	  -25.4 	 0.157 	 15 	 0 	 0 	 0.000 
 5 		D	 173 	 48 	 17023 	 ◊ 	A	 231 	 40 	 67556 	  1691.5 	  -18.9 	 0.920 	 24 	 0 	 0 	 0.000 


Now if we call the script to run it with the PDB code 6agb, it will first check if there's a file `6agb_interface_list.txt` in the working directory and use that if there is.

In [9]:
%run pisa_interface_list_to_df.py 6agb

In [10]:
import pandas as pd
dfl = pd.read_pickle("6agb_PISAinterface_summary_pickled_df.pkl")

In [11]:
dfl

Unnamed: 0_level_0,Unnamed: 1_level_0,Chain 1,Chain 1,Chain 1,Chain 1,x,Chain 2,Chain 2,Chain 2,Chain 2,Interface,Interface,Interface,Interface,Interface,Interface,Interface
Unnamed: 0_level_1,row #,Chain label,Number_InterfacingAtoms,Number_InterfacingResidues,Surface (Å$^2$),Unnamed: 6_level_1,Chain label,Number_InterfacingAtoms,Number_InterfacingResidues,Surface (Å$^2$),Area (Å$^2$),Solvation free energy gain,Solvation gain P-value,Hydrogen bonds,Salt Bridges,Disuflides,CSS
0,1,B,726,187,45544,◊,A,880,117,67556,7353.3,-72.7,0.996,95,0,0,0.0
1,2,G,238,65,9059,◊,A,266,32,67556,2325.0,-24.6,0.739,23,0,0,0.0
2,3,D,244,65,17023,◊,J,222,56,16869,2290.1,-20.0,0.174,18,3,0,0.0
3,4,C,207,52,12572,◊,K,222,58,10410,2114.0,-25.4,0.157,15,0,0,0.0
4,5,D,173,48,17023,◊,A,231,40,67556,1691.5,-18.9,0.92,24,0,0,0.0
5,6,D,154,39,17023,◊,K,151,34,10410,1475.6,-13.5,0.211,13,2,0,0.0
6,7,E,123,34,9604,◊,I,129,30,13215,1257.1,-17.9,0.058,7,0,0,0.0
7,8,B,130,39,45544,◊,G,126,33,9059,1234.7,-10.2,0.217,10,8,0,0.0
8,9,E,106,30,9604,◊,A,138,22,67556,1186.0,-9.8,0.916,23,0,0,0.0
9,10,F,119,32,9964,◊,A,134,23,67556,1120.8,-10.9,0.755,14,0,0,0.0


--------

The script `pisa_interface_list_to_df.py` demonstrated above will get the interface reports/ the list of interactions from PDBePISA, if you already don't have it, and then generate a Pandas dataframe with that information. In the next few notebooks I expand on working with the dataframes that `pisa_interface_list_to_df.py` and cover scaling up use of it for many PDB codes. 

If first you want to see the script `pisa_interface_list_to_df.py` put through its paces, check out the notebook [Single notebook converting a variety of formats of interface lists/reports to test handling by pisa_interface_list_to_df.py](tests_of_pisa_interface_list_to_df.py.ipynb). It's meant as more of a 'quick-start' for those introduced to the script already or for those very familiar with running scripts in Python using Jupyter.

The next notebook in the series is specifically, [Making dataframes dervied from PDBePISA interface lists/reports clearer by adding protein names and filtering to nucleic acid chains](working_with_dataframes_and_making_clearer.ipynb). Feel free to skip onto that notebook now as the final short section of this notebook is more meant for those looking to access PDBePISA data directly and/or are curious how the script demonstrated above did that.   
  


The script `pisa_interface_list_to_df.py` is capable of getting informatoin from the PDBePISA interfaces pages and making a Pandas dataframe with that. However, you may just want a smaller piece of information, or you seek information elsewhere on the interactions page and are wondering how you can get that given the script is somehow accessing related data?  
Or maybe you are just wondering more about what is going on in the retrieval step of the script demonstrated above?  Then read on...

### Retrieving interface reports/ the list of interactions by hand

Say you are used to pressing `Launch PDBePISA` at [PDBePISA](https://www.ebi.ac.uk/pdbe/pisa/pistart.html) and then using then entering PDB indentifier code for the Protein Data Bank entry in which you are interested. That's great when you are interested in one or two. But when you want to scale up, you need to be able to do that programmatically.

PDBePISA offers three utilities helpful for accessing PDBePISA programmatically.

There's the [PDBe REST API - Programmatic access to PDBe data](https://www.ebi.ac.uk/pdbe/api/doc/pisa.html) for certain queries such as getting returned the "number of interfaces for a given pdbid/assemblyid."

There's also URLs that you can use to get XML of interfaces and description of assemblies, and even "Coordinate (PDB-formatted) files of macromolecular assemblies:", as described [here](https://www.ebi.ac.uk/pdbe/pisa/pi_download.html) under 'Download PISA Data'.

Finally, on **yet another page** of the PDBePISA documentation about getting data from the service, they describe under a page entitled [Linking to PISA](https://www.ebi.ac.uk/pdbe/pisa/pi_link.html), how "PISA may run queries launched from any Web site. Simply make hyperlink with the following URL." They go on to show the link and how you can add a signaling token to specify the corresponding table to get and the PDB identifier:

```text
 Token 	 Description 
qi	PISA retrieves precalculated results for the given PDB code and displays the corresponding interface table
qs	PISA retrieves precalculated results for the given PDB code and displays the corresponding structure table
qa	PISA retrieves precalculated results for the given PDB code and displays the corresponding assembly table
```

Thus `http://www.ebi.ac.uk/pdbe/pisa/cgi-bin/piserver?qi=1stm` will get interface table of a PDB entry 1stm.

**Specifically**, the `qi=1stm` at the end is the part coming from the tokens and PDB identifier.

This approach can be used to retrieve the HTML for the same page you'd get if you went to `Launch PDBePISA` at [PDBePISA](https://www.ebi.ac.uk/pdbe/pisa/pistart.html) and then entered the PDB indentifier code for 1stm.


Putting that into action in Jupyter (& command line magic by prefacing Unix commands with an exclamation point) to fetch for the example the interactions list in a text:

In [12]:
!curl -o 1stm.txt -L http://www.ebi.ac.uk/pdbe/pisa/cgi-bin/piserver?qi=1stm

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 75207    0 75207    0     0   7689      0 --:--:--  0:00:09 --:--:-- 15471


That got the HTML and saved it as `1stm.txt`.

Let's look at the top part that some. We'll show the first 20 lines of what was retrieved by running the next cell.

In [13]:
!head -20 1stm.txt


<!doctype html>
<!-- paulirish.com/2008/conditional-stylesheets-vs-css-hacks-answer-neither/ -->
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en"> <![endif]-->
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en"> <![endif]-->
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en"> <![endif]-->
<!-- Consider adding an manifest.appcache: h5bp.com/d/Offline -->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en"> <!--<![endif]-->
    <head>
        <meta charset="utf-8">

        <!-- Use the .htaccess and remove these lines to avoid edge case issues.
             More info: h5bp.com/b/378 -->
        <!-- <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> --> <!-- Not yet implemented -->

        <title>PDBe &lt; PISA &lt; EMBL-EBI</title>
        <meta name="description" content="EMBL-EBI"><!-- Describe what this page is about -->
        <meta name="keywords" content="bioinformatics, europe, institute"><!-- A few keywords that relate to 

Using Python you can read and parse this HTML. That's what the script does when you give it a PDB code for data you want.  

In the end it takes the parsed table and formats it into a Pandas dataframe.

--------

Continue on with the next notebook in the series, [Making dataframes dervied from PDBePISA interface lists/reports clearer by adding protein names and filtering to nucleic acid chains](working_with_dataframes_and_making_clearer.ipynb). In the next three notebooks, I cover accessing the dataframes contents in useful ways, enhancing the dataframes with the names of the macromolecules in place of the character designations of the chains so that the interface table dataframe is more informative, and scaling up make dataframes for a lot of PDB identifiers.  
Go to the index page and click through to notebooks after the next in the series if you prefer.

------

-----