# Python API tutorial

## Installation

### Getting `plinder`

Due to dependencies that are not installable via `pip`, `plinder` is currently not
available at PyPI.
You can download the official
[_GitHub_ repository](https://github.com/plinder-org/plinder/)
instead, for example via `git`.

```console
$ git clone https://github.com/plinder-org/plinder.git
```

### Creating the Conda environment

The most convenient way to install the aforementioned extra dependencies is a _Conda_
environment.
If you have not _Conda_ installed yet, we recommend its installation via
[miniforge](https://github.com/conda-forge/miniforge).
Afterwards the environment can be created from the `environment.yml` in the local
repository clone.

:::{note}
We currently only support a Linux environment.
`plinder` uses `openstructure` for some of its functionality and is available from the
`aivant` conda channel using `conda install aivant::openstructure`, but it is only built
targeting Linux architectures.
For Windows and MacOS users, please see the relevant
[_Docker_](#docker-target) resources.
:::

```console
$ mamba env create -f environment.yml
$ mamba activate plinder
```

### Installing `plinder`

Now `plinder` can be installed into the created environment:

```console
$ pip install .
```

(docker-target)=
### Alternative: Using a Docker container

We also publish the `plinder` project as a docker container as alternative to the
_Conda_-based installation, to ensure the highest level of compatibility with
non-Linux platforms.
See the relevant docker resources here for more details:

- `docker-compose.yml`: defines a `base` image, the `plinder` "app" and a `test`
  container
- `dockerfiles/base/`: contains the files for the `base` image
- `dockerfiles/main/`: contains the files for the `plinder` "app" image

### Configure dataset environment variables

We need to set environment variables to point to the release and iteration of choice.
For the sake of demonstration, this will be set to point to a smaller tutorial example
dataset, which are `PLINDER_RELEASE=2024-06` and `PLINDER_ITERATION=tutorial`.

:::{note}
The version used for the preprint is `PLINDER_RELEASE=2024-04` and
`PLINDER_ITERATION=v1`, while the current version with updated annotations to be used
for the MLSB challenge is`PLINDER_RELEASE=2024-06` and `PLINDER_ITERATION=v2`.
:::

In [1]:
import os
from pathlib import Path

release = "2024-06"
iteration = "tutorial"
os.environ["PLINDER_RELEASE"] = release
os.environ["PLINDER_ITERATION"] = iteration
os.environ["PLINDER_REPO"] =  str(Path.home()/"plinder-org/plinder")
os.environ["PLINDER_LOCAL_DIR"] =  str(Path.home()/".local/share/plinder")
os.environ["GCLOUD_PROJECT"] = "plinder"
version = f"{release}/{iteration}"

As alternative these variables could also be set from terminal via `export` (*UNIX*) or
`set` (*Windows*).

## Overview

The user-facing subpackage of `plinder` is {mod}`plinder.core`.
This provides access to the underlying utility functions for accessing the dataset,
split and annotations.
It provides access to five top-level functions:

:::{currentmodule} plinder.core
:::

- {func}`get_config()`: access *PLINDER* global configuration
- {func}`get_plindex()`: access full annotation table
- {func}`get_manifest`: map *PLINDER* system to PDB ID
- {func}`get_split`: access full split table

:::{currentmodule} plinder
:::

In addition, it provides access to the data class {class}`PlinderSystem` for
reconstituting a *PLINDER* system from its `system_id`.

To supplement these data, {mod}`plinder.core.scores` provides functionality for
querying metrics, such as protein/ligand similarity and cluster identity.

## Getting the configuration

At first we get the configuration to check that all parameters are correctly set. 
In the snippet below, we will check, if the local and remote *PLINDER* paths point to
the expected location.

In [2]:
import plinder.core.utils.config

cfg = plinder.core.get_config()
print(f"local cache directory: {cfg.data.plinder_dir}")
print(f"remote data directory: {cfg.data.plinder_remote}")

local cache directory: /Users/yusuf/.local/share/plinder/2024-06/tutorial
remote data directory: gs://plinder/2024-06/tutorial


## Query annotations

:::{currentmodule} plinder.core
:::

### Full dataset

The annotation table is also called *PLINDER index* or *PLINDEX* in short.
{func}`get_plindex()` loads the entire annotation table as
[`pandas`](https://pandas.pydata.org) data frame.
A description of all columns is available in the
[Dataset Reference](#annotation-table-target).

In [3]:
from plinder.core import get_plindex
annotation_df = get_plindex()
annotation_df.head()

2024-08-26 12:05:49,299 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 0.14s
2024-08-26 12:05:49,436 | plinder.core.index.utils:49 | INFO : reading /Users/yusuf/.local/share/plinder/2024-06/tutorial/index/annotation_table.parquet
2024-08-26 12:06:01,231 | plinder.core.index.utils.get_plindex:24 | INFO : runtime succeeded: 12.92s


Unnamed: 0,entry_pdb_id,entry_release_date,entry_oligomeric_state,entry_determination_method,entry_keywords,entry_pH,entry_resolution,entry_rfree,entry_r,entry_clashscore,...,ligand_interacting_ligand_chains_UniProt,system_ligand_chains_PANTHER,ligand_interacting_ligand_chains_Pfam,ligand_neighboring_ligand_chains_Pfam,ligand_interacting_ligand_chains_PANTHER,ligand_neighboring_ligand_chains_PANTHER,system_ligand_chains_SCOP2,system_ligand_chains_SCOP2B,pli_qcov__100__strong__component,protein_lddt_qcov_weighted_sum__100__strong__component
0,3grt,1997-02-12,dimeric,X-RAY DIFFRACTION,OXIDOREDUCTASE,8.0,2.5,,0.17,12.9,...,,,,,,,,,c243140,c635
1,3grt,1997-02-12,dimeric,X-RAY DIFFRACTION,OXIDOREDUCTASE,8.0,2.5,,0.17,12.9,...,,,,,,,,,c169758,c635
2,3grt,1997-02-12,dimeric,X-RAY DIFFRACTION,OXIDOREDUCTASE,8.0,2.5,,0.17,12.9,...,,,,,,,,,c242976,c635
3,3grt,1997-02-12,dimeric,X-RAY DIFFRACTION,OXIDOREDUCTASE,8.0,2.5,,0.17,12.9,...,,,,,,,,,c173553,c635
4,1grx,1993-10-01,monomeric,SOLUTION NMR,ELECTRON TRANSPORT,,,,,,...,,,,,,,,,c186761,c167274


### Query specific columns 

:::{currentmodule} plinder.core.scores
:::

To query the annotations table for specific columns or filter by specific criteria, use
{func}`query_index()`.
The function could be called without any argument to yield a table of `system_id` and
`entry_pdb_id`.
However, the function could be called by passing `columns` argument, which is a list of
[column names](#annotation-table-target). 

In [4]:
from plinder.core.scores import query_index
# Get system_id and entry_pdb_id columns
query_index()

2024-08-26 12:06:02,451 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 0.14s


Unnamed: 0,system_id,entry_pdb_id
0,3grt__1__1.A_2.A__1.B,3grt
1,3grt__1__1.A_2.A__1.C,3grt
2,3grt__1__1.A_2.A__2.B,3grt
3,3grt__1__1.A_2.A__2.C,3grt
4,1grx__1__1.A__1.B,1grx
...,...,...
1357899,4lpn__1__10.A_24.A_3.A__24.X,4lpn
1357900,2lp3__1__1.A__1.C,2lp3
1357901,2lp3__1__1.A__1.D,2lp3
1357902,2lp3__1__1.B__1.E,2lp3


In [36]:
# Get specific columns from the annotation table
cols_of_interest = ["system_id", "entry_pdb_id", "entry_release_date", "entry_oligomeric_state",
"entry_clashscore", "entry_resolution"]
query_index(columns=cols_of_interest)

2024-08-26 12:25:33,442 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 0.14s


Unnamed: 0,system_id,entry_pdb_id,entry_release_date,entry_oligomeric_state,entry_clashscore,entry_resolution
0,3grt__1__1.A_2.A__1.B,3grt,1997-02-12,dimeric,12.90,2.50
1,3grt__1__1.A_2.A__1.C,3grt,1997-02-12,dimeric,12.90,2.50
2,3grt__1__1.A_2.A__2.B,3grt,1997-02-12,dimeric,12.90,2.50
3,3grt__1__1.A_2.A__2.C,3grt,1997-02-12,dimeric,12.90,2.50
4,1grx__1__1.A__1.B,1grx,1993-10-01,monomeric,,
...,...,...,...,...,...,...
1357899,4lpn__1__10.A_24.A_3.A__24.X,4lpn,2013-07-16,24-meric,3.34,1.66
1357900,2lp3__1__1.A__1.C,2lp3,2012-01-31,dimeric,,
1357901,2lp3__1__1.A__1.D,2lp3,2012-01-31,dimeric,,
1357902,2lp3__1__1.B__1.E,2lp3,2012-01-31,dimeric,,


### Query annotations with specific filters

We could also pass additional `filters`, where each filter is a logical comparison
of a column name with some given value.
Only those rows, that fulfill all conditions, are returned.
See the description of
[`pandas.read_parquet()`]https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html
for more information on the filter syntax.

In [38]:
# Query for single-ligand systems
filters = [("system_num_ligand_chains", "==", "1")]
query_index(columns=cols_of_interest, filters=filters)

2024-08-26 12:26:49,582 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 0.15s


Unnamed: 0,system_id,entry_pdb_id,entry_release_date,entry_oligomeric_state,entry_clashscore,entry_resolution
0,3grt__1__1.A_2.A__1.B,3grt,1997-02-12,dimeric,12.90,2.50
1,3grt__1__1.A_2.A__1.C,3grt,1997-02-12,dimeric,12.90,2.50
2,3grt__1__1.A_2.A__2.B,3grt,1997-02-12,dimeric,12.90,2.50
3,3grt__1__1.A_2.A__2.C,3grt,1997-02-12,dimeric,12.90,2.50
4,1grx__1__1.A__1.B,1grx,1993-10-01,monomeric,,
...,...,...,...,...,...,...
809504,4lpn__1__10.A_24.A_3.A__24.X,4lpn,2013-07-16,24-meric,3.34,1.66
809505,2lp3__1__1.A__1.C,2lp3,2012-01-31,dimeric,,
809506,2lp3__1__1.A__1.D,2lp3,2012-01-31,dimeric,,
809507,2lp3__1__1.B__1.E,2lp3,2012-01-31,dimeric,,


## Inspect manifest table

The manifest table shows the mapping of each PLINDER system ID to their respective PDB
entry.

In [7]:
from plinder.core import get_manifest
get_manifest()

Unnamed: 0,system_id,entry_pdb_id
0,3grt__1__1.A_2.A__1.B,3grt
1,3grt__1__1.A_2.A__1.C,3grt
2,3grt__1__1.A_2.A__2.B,3grt
3,3grt__1__1.A_2.A__2.C,3grt
4,1grx__1__1.A__1.B,1grx
...,...,...
1357899,4lpn__1__10.A_24.A_3.A__24.X,4lpn
1357900,2lp3__1__1.A__1.C,2lp3
1357901,2lp3__1__1.A__1.D,2lp3
1357902,2lp3__1__1.B__1.E,2lp3


## Query protein similarity
The are three kinds of similarity datasets we provide:
- Similarity between ligand bound structures (`holo`)
- Similarity between ligand bound and unbound protein structures (`apo`)
- Similarity between ligand bound and Alphafold predicted structures (`pred`)
Any of these could be specified with {func}`query_protein_similarity()`

In [8]:

from plinder.core.scores import query_protein_similarity
query_protein_similarity(
    search_db="apo",
    filters=[("similarity", ">", "50")]
)

2024-08-26 12:06:16,744 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 0.20s
2024-08-26 12:06:16,926 | plinder.core.scores.protein.query_protein_similarity:24 | INFO : runtime succeeded: 1.35s


Unnamed: 0,query_system,target_system,protein_mapping,mapping,protein_mapper,source,metric,similarity
0,1b5d__1__1.A_1.B__1.D,1b49_A,1.A:0.A,1.A:0.A,foldseek,mmseqs,protein_qcov_weighted_max,100
1,1b5d__1__1.A_1.B__1.D,1b49_A,1.A:0.A,1.A:0.A,foldseek,mmseqs,protein_qcov_max,100
2,1b5d__1__1.A_1.B__1.D,1b49_A,1.A:0.A,1.A:0.A,foldseek,both,protein_fident_weighted_max,100
3,1b5d__1__1.A_1.B__1.D,1b49_A,1.A:0.A,1.A:0.A,foldseek,both,protein_fident_max,100
4,1b5d__1__1.A_1.B__1.D,1b49_A,1.A:0.A,1.A:0.A,foldseek,mmseqs,protein_fident_qcov_weighted_max,100
...,...,...,...,...,...,...,...,...
49671,4n7m__1__1.A_1.B__1.C,7usq__1__1.C_1.D__1.F,1.B:1.C;1.A:1.D,1.A:1.D,foldseek,foldseek,protein_fident_max,51
49672,4n7m__1__1.A_1.B__1.C,8dgz__1__1.A__1.C,1.B:1.A,1.B:1.A,foldseek,foldseek,protein_seqsim_qcov_weighted_max,51
49673,4n7m__1__1.A_1.B__1.C,8dgz__1__1.A__1.C,1.B:1.A,1.B:1.A,foldseek,foldseek,protein_seqsim_qcov_max,51
49674,4n7m__1__1.A_1.B__1.C,8dj3__1__1.A_1.B__1.C,1.B:1.B;1.A:1.A,1.B:1.A,foldseek,foldseek,protein_seqsim_qcov_weighted_max,51


## Working with PLINDER system 
To reconstitute PLINDER systems directly from a set of IDs(system, PDB or two middle character of PDB ID), use
{func}`load_systems()`.
This will give you access to a `PlinderSystem`. With this, you have access to all class {class}`Entry` and class {class}`System` attributes and property, as well as the the structures of the components of the systems.


### Load systems from string ids
Here, we can load systems directly from a given (or list of) id(s). This will yield the class {class}`PlinderSystem`.<br>
{func}`load_systems()` has all optional arguments with the following signature:
```
load_systems(
    *,
    system_ids: Optional[Union[str, list[str]]] = None,
    pdb_ids: Optional[Union[str, list[str]]] = None,
    two_char_codes: Optional[Union[str, list[str]]] = None,
    cfg: Optional[DictConfig] = None,
)
```
Users can choose the granularity level of input to the function. Either with `str` or `list` of `system_ids` or `pdb_ids` or `two_char_codes` (two middle character of PDBID). With `pdb_ids`, users get all PLINDER systems c orresponding to the set of `pdb_ids` passed while `two_char_codes` loads all PLINDER systems with `pdb_ids` corresponding to a set of `two_char_codes`.

In [23]:
from plinder.core.system.utils import load_systems
plinder_systems = load_systems(
    system_ids=["7eek__1__1.A__1.I", "4agi__1__1.C__1.W"]
)

### Get entry-level annotation 
Here, we will list the accessible categories of entry annotations and specifically look at how one might access the oligomeric state of a given system

In [46]:
# Get entry-level annotations highest level categories
plinder_systems["4agi__1__1.C__1.W"].entry.keys()

dict_keys(['pdb_id', 'release_date', 'oligomeric_state', 'determination_method', 'keywords', 'pH', 'resolution', 'chains', 'ligand_like_chains', 'systems', 'covalent_bonds', 'chain_to_seqres', 'validation', 'pass_criteria', 'water_chains', 'symmetry_mate_contacts'])

In [48]:
# Get oligomeric state of a given system
plinder_systems["4agi__1__1.C__1.W"].entry["oligomeric_state"]

'dimeric'

### Get system-level annotation 
Here, we will list the accessible categories of systems annotations and specifically look at how one might access the smiles of the first ligand of a given system

In [50]:
# Get system-level annotations highest level categories
plinder_systems["4agi__1__1.C__1.W"].system.keys()

dict_keys(['pdb_id', 'biounit_id', 'ligands', 'ligand_validation', 'pocket_validation', 'pass_criteria'])

In [26]:
# Show ligand smiles of the first ligand of a given system
plinder_systems["4agi__1__1.C__1.W"].system['ligands'][0]['smiles']

[GSPath('gs://plinder/2024-06/tutorial/entries/ag.zip')]


2024-08-26 12:10:29,758 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 0.65s
2024-08-26 12:10:29,905 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 0.15s
2024-08-26 12:10:29,906 | plinder.core.index.utils:156 | INFO : loading entries from 1 zips
2024-08-26 12:10:29,917 | plinder.core.index.utils:171 | INFO : loaded 1 entries
2024-08-26 12:10:29,918 | plinder.core.index.utils.load_entries:24 | INFO : runtime succeeded: 2.40s


'C[Se][C@@H]1O[C@@H](C)[C@@H](O)[C@@H](O)[C@@H]1O'

### Get ligand structure path
This could be helpful for loading the structures for training a model or performing other calculations that requires ligand structural information.

In [51]:
plinder_systems["4agi__1__1.C__1.W"].ligands

{'1.W': '/Users/yusuf/.local/share/plinder/2024-06/tutorial/systems/4agi__1__1.C__1.W/ligand_files/1.W.sdf'}

### Show path for receptors pdbs
Same use case as above, but this point to the receptor pdb structure.

In [28]:
plinder_systems["4agi__1__1.C__1.W"].receptor_pdb

'/Users/yusuf/.local/share/plinder/2024-06/tutorial/systems/4agi__1__1.C__1.W/receptor.pdb'

### Inspect apo and predicted annotations
For users interested in using apo and predicted structures in model training, the snippet below maps holo system ids `reference_system_id` to apo or predicted ids `id` as well as their similarity measures. Another way to acess the information directly will be to use {func}`query_links()` described under **Working with apo/predicted structures**

In [52]:
plinder_systems["4agi__1__1.C__1.W"].linked_structures

Unnamed: 0,reference_system_id,id,pocket_fident,pocket_lddt,protein_fident_qcov_weighted_sum,protein_fident_weighted_sum,protein_lddt_weighted_sum,target_id,sort_score,receptor_file,...,posebusters_volume_overlap_with_inorganic_cofactors,posebusters_volume_overlap_with_waters,fraction_reference_proteins_mapped,fraction_model_proteins_mapped,lddt,bb_lddt,per_chain_lddt_ave,per_chain_bb_lddt_ave,filename,kind
0,4agi__1__1.C__1.W,4uou_B,100.0,100.0,100.0,100.0,99.0,4uou,2.4,/plinder/2024-06/assignments/apo/4agi__1__1.C_...,...,True,True,1.0,1.0,0.972682,0.994065,0.965793,0.990783,/Users/yusuf/.local/share/plinder/2024-06/tuto...,apo
1,4agi__1__1.C__1.W,4uou_C,100.0,99.0,100.0,100.0,99.0,4uou,2.4,/plinder/2024-06/assignments/apo/4agi__1__1.C_...,...,True,True,1.0,1.0,0.973562,0.994687,0.966137,0.991653,/Users/yusuf/.local/share/plinder/2024-06/tuto...,apo
2,4agi__1__1.C__1.W,4uou_D,100.0,100.0,100.0,100.0,99.0,4uou,2.4,/plinder/2024-06/assignments/apo/4agi__1__1.C_...,...,True,True,1.0,1.0,0.973604,0.994235,0.966834,0.990844,/Users/yusuf/.local/share/plinder/2024-06/tuto...,apo
3,4agi__1__1.C__1.W,4uou_A,100.0,99.0,100.0,100.0,99.0,4uou,2.4,/plinder/2024-06/assignments/apo/4agi__1__1.C_...,...,True,True,1.0,1.0,0.967257,0.9948,0.961169,0.991704,/Users/yusuf/.local/share/plinder/2024-06/tuto...,apo
4,4agi__1__1.C__1.W,Q4WW81_A,100.0,100.0,99.0,99.0,100.0,Q4WW81,98.57,/plinder/2024-06/assignments/pred/4agi__1__1.C...,...,True,True,1.0,1.0,0.982275,0.998587,0.977748,0.997611,/Users/yusuf/.local/share/plinder/2024-06/tuto...,pred


### Get path to linked structures 
The snippet below points to the location where the linked structures are stored. Once users derive what `reference_system_id` to `id` mapping from above. They can use that information to load apo structure from the archive shown below.

In [33]:
plinder_systems["4agi__1__1.C__1.W"].linked_archive

PosixPath('/Users/yusuf/.local/share/plinder/2024-06/tutorial/linked_structures')

### Other useful properties
For a more comprehensive list of attributes and properties, see class {class}`PlinderSystem`

## Working with split data

### Get split dataframe
The split table sorts each PLINDER system into a cluster and defines the split it is
part of.
To access the splits, use {func}`get_split()`.

In [16]:
from plinder.core import get_split
split_df = get_split()
split_df

2024-08-26 12:06:43,189 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 0.16s
2024-08-26 12:06:43,328 | plinder.core.split.utils:40 | INFO : reading /Users/yusuf/.local/share/plinder/2024-06/tutorial/splits/split.parquet
2024-08-26 12:06:43,541 | plinder.core.split.utils.get_split:24 | INFO : runtime succeeded: 1.31s


Unnamed: 0,system_id,uniqueness,split,cluster,cluster_for_val_split,system_pass_validation_criteria,system_pass_statistics_criteria,system_proper_num_ligand_chains,system_proper_pocket_num_residues,system_proper_num_interactions,system_proper_ligand_max_molecular_weight,system_has_binding_affinity,system_has_apo_or_pred
0,101m__1__1.A__1.C_1.D,101m__A__C_D_c188899,train,c14,c0,True,True,1,27,20,616.177293,False,False
1,102m__1__1.A__1.C,102m__A__C_c237197,train,c14,c0,True,True,1,26,20,616.177293,False,True
2,103m__1__1.A__1.C_1.D,103m__A__C_D_c252759,train,c14,c0,False,True,1,26,16,616.177293,False,False
3,104m__1__1.A__1.C_1.D,104m__A__C_D_c274687,train,c14,c0,False,True,1,27,21,616.177293,False,False
4,105m__1__1.A__1.C_1.D,105m__A__C_D_c221688,train,c14,c0,False,True,1,28,20,616.177293,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
409721,9xia__1__2.A_4.A__4.B_4.D,9xia__A_A__B_D_c20731,train,c256,c126,False,False,1,23,6,178.084124,False,False
409722,9xim__1__1.A_1.B__1.E_1.F_1.G,9xim__A_B__E_F_G_c240203,train,c256,c126,False,False,1,21,6,150.052823,False,False
409723,9xim__1__1.A_1.B__1.H_1.I_1.J,9xim__A_B__H_I_J_c313183,train,c256,c126,False,False,1,19,5,150.052823,False,False
409724,9xim__1__1.C_1.D__1.K_1.L_1.M,9xim__C_D__K_L_M_c215891,train,c256,c126,False,False,1,20,3,150.052823,False,False


### Inspect test set system ids
Show the system ids of the test set

In [54]:
split_df[split_df.split == "test"].system_id.to_list()

['1b5d__1__1.A_1.B__1.D',
 '1s2g__1__1.A_2.C__1.D',
 '4agi__1__1.C__1.W',
 '4n7m__1__1.A_1.B__1.C',
 '7eek__1__1.A__1.I']

## Working with apo/predicted structures

### Load links table
This shows apo and predicted structure table with all the similarity data. This similarity data includes protein and pocket similarity (see description [here](docs/eval.md)), as well as Posebusters evaluation of upon transplatation into apo/predicted structure. 

In [18]:
from plinder.core.scores import query_links
links = query_links()
links

2024-08-26 12:06:49,841 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 0.18s
2024-08-26 12:06:54,217 | plinder.core.scores.links.query_links:24 | INFO : runtime succeeded: 5.54s


Unnamed: 0,reference_system_id,id,pocket_fident,pocket_lddt,protein_fident_qcov_weighted_sum,protein_fident_weighted_sum,protein_lddt_weighted_sum,target_id,sort_score,receptor_file,...,posebusters_volume_overlap_with_inorganic_cofactors,posebusters_volume_overlap_with_waters,fraction_reference_proteins_mapped,fraction_model_proteins_mapped,lddt,bb_lddt,per_chain_lddt_ave,per_chain_bb_lddt_ave,filename,kind
0,6pl9__1__1.A__1.C,2vb1_A,100.0,86.0,100.0,100.0,96.0,2vb1,0.65,/plinder/2024-06/assignments/apo/6pl9__1__1.A_...,...,True,True,1.0,1.0,0.903772,0.968844,0.890822,0.959674,/Users/yusuf/.local/share/plinder/2024-06/tuto...,apo
1,6ahh__1__1.A__1.G,2vb1_A,100.0,98.0,100.0,100.0,95.0,2vb1,0.65,/plinder/2024-06/assignments/apo/6ahh__1__1.A_...,...,True,True,1.0,1.0,0.894349,0.962846,0.883217,0.954721,/Users/yusuf/.local/share/plinder/2024-06/tuto...,apo
2,5b59__1__1.A__1.B,2vb1_A,100.0,91.0,100.0,100.0,96.0,2vb1,0.65,/plinder/2024-06/assignments/apo/5b59__1__1.A_...,...,True,True,1.0,1.0,0.903266,0.962318,0.890656,0.955258,/Users/yusuf/.local/share/plinder/2024-06/tuto...,apo
3,3ato__1__1.A__1.B,2vb1_A,100.0,99.0,100.0,100.0,95.0,2vb1,0.65,/plinder/2024-06/assignments/apo/3ato__1__1.A_...,...,True,True,1.0,1.0,0.890530,0.954696,0.879496,0.946326,/Users/yusuf/.local/share/plinder/2024-06/tuto...,apo
4,6mx9__1__1.A__1.K,2vb1_A,100.0,98.0,100.0,100.0,95.0,2vb1,0.65,/plinder/2024-06/assignments/apo/6mx9__1__1.A_...,...,True,True,1.0,1.0,0.904116,0.964309,0.892434,0.955853,/Users/yusuf/.local/share/plinder/2024-06/tuto...,apo
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
597774,6x3q__1__1.A__1.B,A8AWU7_A,100.0,79.0,99.0,99.0,88.0,A8AWU7,38.90,/plinder/2024-06/assignments/pred/6x3q__1__1.A...,...,True,True,1.0,1.0,0.815736,0.877814,0.806444,0.871054,/Users/yusuf/.local/share/plinder/2024-06/tuto...,pred
597775,8st5__1__1.A__1.B,A8AWU7_A,100.0,95.0,99.0,99.0,88.0,A8AWU7,38.90,/plinder/2024-06/assignments/pred/8st5__1__1.A...,...,True,True,1.0,1.0,0.814876,0.885938,0.814176,0.881858,/Users/yusuf/.local/share/plinder/2024-06/tuto...,pred
597776,6efd__1__1.A__1.B,A8AWU7_A,100.0,81.0,99.0,99.0,87.0,A8AWU7,38.90,/plinder/2024-06/assignments/pred/6efd__1__1.A...,...,True,True,1.0,1.0,0.814404,0.879823,0.810680,0.872417,/Users/yusuf/.local/share/plinder/2024-06/tuto...,pred
597777,8st6__1__1.A__1.D,A8AWU7_A,100.0,80.0,99.0,99.0,88.0,A8AWU7,38.90,/plinder/2024-06/assignments/pred/8st6__1__1.A...,...,True,True,1.0,1.0,0.816566,0.884372,0.813010,0.877505,/Users/yusuf/.local/share/plinder/2024-06/tuto...,pred


### Extract apo structures linked
Get a list \<PDB_ID\>_\<chain_id\> of apo structures corresponding to a given system id

In [40]:
links[(links.reference_system_id ==  "4agi__1__1.C__1.W") & (links.kind ==  "apo")].id.to_list()

['4uou_B', '4uou_C', '4uou_D', '4uou_A']