# General

The code of the package is in the dir 'src', and importing everything from it imports the main elements defined in `src/\_\_init__.py`.

The main elements are: `db` (the database), `PDB` and `Site` (objects to work with)

In [1]:
from src import *

The "connection" to the database is "opened" with:

In [2]:
db.init('database.db')

To explore the information/tables on the database:

In [3]:
db.get_tables()

['pdb', 'site']

Each table of the database corresponds to a Python class, `PDB` and `Site` which are used to work with the information in the database. All of them are child classes of the `Structure` class and have a few things in common: they are defined by atoms and/or residues, have a cif file, and can be visualized with a viewer.

Rows are accessed/iterated through the class:

In [4]:
f"There are \
{ len(PDB.select()) } \
PDBs \
and \
{ len(Site.select()) } \
Sites in the database"

'There are 3078 PDBs and 3219 Sites in the database'

Each class or object has a number of attributes ("columns" of information in the table) and associated methods to manage the information. The attributes are described in detail in the docs of each class (i.e., `help(PDB)` and `help(Site)`), and as an example:

In [5]:
print(
    [k for k in PDB.__dict__.keys() if not k.startswith("_")]
)
print(
    [k for k in Site.__dict__.keys() if not k.startswith("_")]
)

['entry_id', 'cif', 'atoms', 'assembly', 'minimal_pdb', 'view', 'DoesNotExist', 'orthosites', 'sites']
['pdb', 'modulator', 'site', 'info', 'related_sites', 'updated', 'assembly_site', 'nonredundant_site_pdb', 'nonredundant_site', 'id', 'pdb_id', 'DoesNotExist']


These main Classes are defined in the main module of the package. To explore all the modules the `src` at the base-level can be imported:

In [6]:
import src

In [7]:
print(src.allodb.__name__, src.allodb.__doc__)

src.allodb 
Main module of the database

Defines the Classes/Database tables that structure the database, inheriting and using methods from the utils and siteutils (and cifutils) modules.



## Creation

New rows are created from each table/Class, and tests can be run without modifying the database by using an `atomic` wrapper. No changes to the database will be introduced unless the code inside the wrapper succeeds, and the method `rollback` can be used at any moment too.

In [8]:
with db.atomic() as transaction:
    pdb = PDB.create(entry_id="1znf", _cif_hash="0")
    print(pdb, pdb.entry_id)
    transaction.rollback()

1znf 1znf


# PDB

In [9]:
print(PDB.__doc__)


    Class/Objects of PDB have to inherit from peewee's "Model", and they inherit some common methods from utils.Structure.
    
    Each PDB is defined by its entry_id and the hash of the cif file with which it was created.
    


Iterating the PDB class, a PDB object can be obtained to explore:

In [10]:
pdb = PDB.select()[0]
pdb

<PDB: 5lvp>

They can also be retrieved by the specific PDB ID (`entry_id`):

In [11]:
pdb = PDB.get(entry_id = '5lvp')
pdb

<PDB: 5lvp>

The `.get` method will raise an error if the query is not found in the database, although there are more types of `.get` methods (e.g., `.get_or_none`):

In [12]:
PDB.get(entry_id = "1znf")

PDBDoesNotExist: <Model: PDB> instance matching query does not exist:
SQL: SELECT "t1"."entry_id", "t1"."_cif_hash" FROM "pdb" AS "t1" WHERE ("t1"."entry_id" = ?) LIMIT ? OFFSET ?
Params: ['1znf', 1, 0]

## Attributes

As seen above, PDB class objects have many attributes. However, only two of them are stored in the database, which are needed to create a new row/object: `entry_id` and `_cif_hash`. These two are defined in the database Python module as Class-level attributes.

The rest are methods of the Class that use the information from the database-stored columns, i.e. `entry_id` and are able to perform some functions, information retrievals, and/or transformations to return specific data. These methods have the `@property` (or similar) decorator applied to them, which makes them accessible as attributes. In most cases, they are stored in cache so that the functions that produce the results that are shown only have to be run once per runtime.

Moreover, the class PDB inherits from other base classes (available inspecting the \_\_mro__ magic method) (besides peewee's Model class and others) that also have some defined methods that work in the same way.

In [13]:
PDB.__mro__

(<Model: PDB>,
 <Model: Model>,
 <Model: Model>,
 <Model: _metaclass_helper_>,
 peewee.Node,
 src.utils.Structure,
 object)

In [14]:
print(
    [k for k in PDB.__dict__.keys() if not k.startswith("_")]
)
print(
    [k for k in src.utils.Structure.__dict__.keys() if not k.startswith("_")]
)

pdb.__data__

['entry_id', 'cif', 'atoms', 'assembly', 'minimal_pdb', 'view', 'DoesNotExist', 'orthosites', 'sites']
['atoms', 'residues', 'cif', 'view']


{'entry_id': '5lvp',
 '_cif_hash': '21f418365b876c844c6124be956253f1cc46acfd6c9e9dda8fa77540a91b5ec3'}

In [15]:
pdb.entry_id

'5lvp'

The _cif_hash is used to be able to compare the .cif with which the database entry was created and the .cif that is being used _a posteriori_ by the Class' methods, as if some information changes w.r.t. the database the source can be identified. See the `Cif` class and derived in the `utils` module:

In [16]:
pdb._cif_hash

'21f418365b876c844c6124be956253f1cc46acfd6c9e9dda8fa77540a91b5ec3'

The `entry_id` is used to retrieve or download the corresponding .cif file and make it into an object with which its data can be accessed, manipulated... See the `Cif` class and derived in the `utils` module:

In [17]:
print(src.utils.PDBCif.__doc__)
pdb.cif


    Child class of Cif that defines its own ._text private attribute pointing to the disk-saved original .cif file from the PDB (SIFTS updated .cif.gz file from PDBe) (or is downlaoded if it doesn't exist

    pdb : allodb.PDB object
        PDB object for which to get the PDBe SIFTS-updated .cif.gz file (saved in allodb.datapath or retrieved online otherwise)
    update : bool
        Whether to update the saved .cif.gz file if it has changed w.r.t. the saved .cif.gz file (according to the hash of the saved DB entry (PDB._cif_hash) vs. the hash of the online file) or not
    save : bool
        Whether to save the downloaded .cif.gz file if it cannot be retrieved from allodb.datapath (saved in allodb.datapath); is True if update is True
    


<src.utils.PDBCif at 0x7ee455f1ae30>

In [18]:
pdb.cif.data

{'_entry': {'id': ['5LVP']},
 '_citation': {'abstract': ['?'],
  'abstract_id_CAS': ['?'],
  'book_id_ISBN': ['?'],
  'book_publisher': ['?'],
  'book_publisher_city': ['?'],
  'book_title': ['?'],
  'coordinate_linkage': ['?'],
  'country': ['US'],
  'database_id_Medline': ['?'],
  'details': ['?'],
  'id': ['primary'],
  'journal_abbrev': ['Cell Chem Biol'],
  'journal_id_ASTM': ['?'],
  'journal_id_CSD': ['?'],
  'journal_id_ISSN': ['2451-9456'],
  'journal_full': ['?'],
  'journal_issue': ['?'],
  'journal_volume': ['23'],
  'language': ['?'],
  'page_first': ['1193'],
  'page_last': ['1205'],
  'title': ['Bidirectional Allosteric Communication between the ATP-Binding Site and the Regulatory PIF Pocket in PDK1 Protein Kinase.'],
  'year': ['2016'],
  'database_id_CSD': ['?'],
  'pdbx_database_id_DOI': ['10.1016/j.chembiol.2016.06.017'],
  'pdbx_database_id_PubMed': ['27693059'],
  'unpublished_flag': ['?']},
 '_citation_author': {'citation_id': ['primary',
   'primary',
   'primary

Table of atoms in the structure as it appears on the .cif file (`_atom_site`):

In [19]:
pdb.atoms

Unnamed: 0,group_PDB,id,type_symbol,label_atom_id,label_alt_id,label_comp_id,label_asym_id,label_entity_id,label_seq_id,pdbx_PDB_ins_code,...,auth_seq_id,auth_comp_id,auth_asym_id,auth_atom_id,pdbx_PDB_model_num,pdbx_label_index,pdbx_sifts_xref_db_name,pdbx_sifts_xref_db_acc,pdbx_sifts_xref_db_num,pdbx_sifts_xref_db_res
0,ATOM,1,N,N,.,ARG,A,1,27,?,...,75,ARG,A,N,1,27,UNP,O15530,75,R
1,ATOM,2,C,CA,.,ARG,A,1,27,?,...,75,ARG,A,CA,1,27,UNP,O15530,75,R
2,ATOM,3,C,C,.,ARG,A,1,27,?,...,75,ARG,A,C,1,27,UNP,O15530,75,R
3,ATOM,4,O,O,.,ARG,A,1,27,?,...,75,ARG,A,O,1,27,UNP,O15530,75,R
4,ATOM,5,C,CB,.,ARG,A,1,27,?,...,75,ARG,A,CB,1,27,UNP,O15530,75,R
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9533,HETATM,9534,O,O,.,HOH,R,5,.,?,...,661,HOH,D,O,1,661,?,?,?,?
9534,HETATM,9535,O,O,.,HOH,R,5,.,?,...,662,HOH,D,O,1,662,?,?,?,?
9535,HETATM,9536,O,O,.,HOH,R,5,.,?,...,663,HOH,D,O,1,663,?,?,?,?
9536,HETATM,9537,O,O,.,HOH,R,5,.,?,...,664,HOH,D,O,1,664,?,?,?,?


Table of unique residues from the structure obtained from `.atoms`:

In [20]:
pdb.residues

Unnamed: 0,label_comp_id,label_asym_id,label_entity_id,label_seq_id,pdbx_PDB_ins_code,auth_seq_id,auth_comp_id,auth_asym_id,pdbx_PDB_model_num,pdbx_label_index,pdbx_sifts_xref_db_name,pdbx_sifts_xref_db_acc,pdbx_sifts_xref_db_num,pdbx_sifts_xref_db_res
0,ARG,A,1,27,?,75,ARG,A,1,27,UNP,O15530,75,R
5,LYS,A,1,28,?,76,LYS,A,1,28,UNP,O15530,76,K
10,LYS,A,1,29,?,77,LYS,A,1,29,UNP,O15530,77,K
19,ARG,A,1,30,?,78,ARG,A,1,30,UNP,O15530,78,R
30,PRO,A,1,31,?,79,PRO,A,1,31,UNP,O15530,79,P
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9533,HOH,R,5,.,?,661,HOH,D,1,661,?,?,?,?
9534,HOH,R,5,.,?,662,HOH,D,1,662,?,?,?,?
9535,HOH,R,5,.,?,663,HOH,D,1,663,?,?,?,?
9536,HOH,R,5,.,?,664,HOH,D,1,664,?,?,?,?


Visualization of the structure (in this case, the crystalographic model):

In [21]:
pdb.view()

Viz(bg_color='#F7F7F7', color_data={'data': [{'color': 'white'}], 'nonSelectedColor': None, 'keepColors': Fals…

The attributes `assembly` and `minimal_pdb` are objects of child classes of the `Structure` class with different uses, see the `utils` module:

In [22]:
print(src.utils.Assembly.__doc__)
pdb.assembly


    Structure-type object that reconstructs and stores the assembly from the information in the PDB Cif.
    
    The main method _get_assembly provides an ._atoms DataFrame where reptitions of the asymmetric unit have remapped, unique auth_asym_id and label_asym_id fields (chain identifiers), to avoid visualization and other processing problems with common suites for protein/biomolecular structural data and take into account all "repeated" atoms in their transformed coordinates. _get_assembly also generates the ._repetitions attribute that stores a DataFrame with the number of times each chain/molecule (unique label_asym_id) is repeated.

    pdb : PDB
    assembly_id : int
        Assembly ID to reconstruct, as PDBs can have many assemblies and the annotated allosteric modulators might be in any of them
    


<src.utils.Assembly at 0x7ee43f697790>

In [23]:
print(
    "Number of residues in the PDB model", len(pdb.residues), "\n"
    "Number of residues in the PDB assembly", len(pdb.assembly.residues),
)
pdb.assembly.residues

Number of residues in the PDB model 1428 
Number of residues in the PDB assembly 360


Unnamed: 0,label_comp_id,label_asym_id,label_entity_id,label_seq_id,pdbx_PDB_ins_code,auth_seq_id,auth_comp_id,auth_asym_id,pdbx_PDB_model_num,pdbx_label_index,pdbx_sifts_xref_db_name,pdbx_sifts_xref_db_acc,pdbx_sifts_xref_db_num,pdbx_sifts_xref_db_res
0,ARG,A,1,27,?,75,ARG,A,1,27,UNP,O15530,75,R
5,LYS,A,1,28,?,76,LYS,A,1,28,UNP,O15530,76,K
10,LYS,A,1,29,?,77,LYS,A,1,29,UNP,O15530,77,K
19,ARG,A,1,30,?,78,ARG,A,1,30,UNP,O15530,78,R
30,PRO,A,1,31,?,79,PRO,A,1,31,UNP,O15530,79,P
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2374,HOH,O,5,.,?,671,HOH,A,1,671,?,?,?,?
2375,HOH,O,5,.,?,672,HOH,A,1,672,?,?,?,?
2376,HOH,O,5,.,?,673,HOH,A,1,673,?,?,?,?
2377,HOH,O,5,.,?,674,HOH,A,1,674,?,?,?,?


Assembly visualization:

In [24]:
pdb.assembly.view()

Viz(bg_color='#F7F7F7', color_data={'data': [{'color': 'white'}], 'nonSelectedColor': None, 'keepColors': Fals…

The `.view` method of PDB objects is the only instance in which the method has an extra keyword to specify the assembly to visualize, as defined in the original PDBe cif (pdb.assembly is the reconstruction of the assembly by selecting or remapping repetitions of the crystallographic model):

In [25]:
v = pdb.view(assembly_id='1')
v

Viz(assembly_id='1', bg_color='#F7F7F7', color_data={'data': [{'color': 'white'}], 'nonSelectedColor': None, '…

In [63]:
print(src.utils.Minimal_pdb.__doc__)
pdb.minimal_pdb


    Structure-type class to make objects that will only contain the minimum amount necessary of "molecules" (i.e., label_entity_id) from the object from which it is intialized in order to comprise a complete protein chain(s) + the ligand(s) it(they) is(are) directly interacting with.

    TODO: should this be deprecated/deleted?
    


<src.utils.Minimal_pdb at 0x7f46546ae2c0>

In [26]:
pdb.minimal_pdb.view()

Viz(bg_color='#F7F7F7', color_data={'data': [{'color': 'white'}], 'nonSelectedColor': None, 'keepColors': Fals…

Finally, every object of class `Site` that is created associated to this PDB/entry_id is back-referenced and added to the PDB's `.sites` attribute. It is not a list per se, but it is a query that retrieves all the sites associated to the PDB from the database when it is executed, and therefore it can be transformed to a list to inspect them:

In [27]:
pdb.sites

<peewee.ModelSelect at 0x7ee437ffd330>

In [28]:
list(pdb.sites)

[<Site: 5>]

In [30]:
v = pdb.view()
v

Viz(bg_color='#F7F7F7', color_data={'data': [{'color': 'white'}], 'nonSelectedColor': None, 'keepColors': Fals…

In [31]:
v.color_sites(pdb.sites)

# Site

In [56]:
print(Site.__doc__)


    Class/Objects of Site have to inherit from peewee's "Model", and they inherit some common methods from siteutils.BaseSite.
    
    Each Site is defined by, mainly, the pdb it belongs to, the annotated modulator molecule, and a list of residues of the site retrieved upon creation.

    In addition, it contains an information dictionary of the data source and fixes, the last date in which it was updated, and a dictionary containing
    information about related_sites (other molecules of the annotated modulator(s) present in the structure that can be equivalent (they form the same site 
    in the same protein entity but a different chain), or nonequivalent (situated elsewhere and not annotated as modulator/allosteric site)).
    


Iterating the Site class, a Site object can be obtained to explore:

In [32]:
site = Site.select()[0]
site

<Site: 5>

They can also be retrieved by the specific ID. In this case, they don't have any informative unique identifier like the PBD ID, so they just receive a numeric ID by order of creation (`id`):

In [33]:
site = Site.get(id = 5)
site

<Site: 5>

More complicated queries can be performed taking advantage of `peewee`'s functionalities (see its documentation), and executing them by coercing them to a list:

In [34]:
list(
    Site.select().where(Site.id.in_(range(10)))
)

[<Site: 5>, <Site: 6>, <Site: 7>, <Site: 8>, <Site: 9>]

In [35]:
list(
    Site.select().where(Site.pdb.in_(["5lvp", "1znf"]))
)

[<Site: 5>]

In [36]:
list(
    Site.select().where(Site.pdb == '5lvp')
)

[<Site: 5>]

## Attributes

As seen above, like PDB, Site class objects have many attributes, and again not all of them are stored in the database. Again, they are defined in the database Python module as Class-level attributes and the main and required attributes are `pdb` and `modulator`, which define the PDB the Site belongs to, and the modulator molecule(s) that define it. `updated` is a date, e.g. when it was created, and `info` and `related_sites` can store extra information (or can be left empty).

The attribute `site` is filled automatically upon creation of a Site, using the information from the associated PDB and the modulator that forms the site (see the function `_process_site` in the main module, and the `siteuitls.get_site_res` function).

The rest are again methods (with `@property` and cached) of the Class that use the stored attributes to return specific information, and again the class Site inherits from other base classes that also have some defined methods.

In [37]:
Site.__mro__

(<Model: Site>,
 <Model: Model>,
 <Model: Model>,
 <Model: _metaclass_helper_>,
 peewee.Node,
 src.siteutils.BaseAllosite,
 src.siteutils.BaseSite,
 src.utils.Structure,
 object)

In [38]:
print(
    [k for k in Site.__dict__.keys() if not k.startswith("_")]
)
print(
    [k for k in src.siteutils.BaseSite.__dict__.keys() if not k.startswith("_")]
)
print(
    [k for k in src.utils.Structure.__dict__.keys() if not k.startswith("_")]
)

site.__data__

['pdb', 'modulator', 'site', 'info', 'related_sites', 'updated', 'assembly_site', 'nonredundant_site_pdb', 'nonredundant_site', 'id', 'pdb_id', 'DoesNotExist']
['residues', 'protein_residues']
['atoms', 'residues', 'cif', 'view']


{'id': 5,
 'pdb': '5lvp',
 'modulator': {'label_asym_id': ['F', 'G']},
 'site': {'label_asym_id': ['A',
   'A',
   'A',
   'A',
   'A',
   'A',
   'A',
   'A',
   'A',
   'A',
   'A',
   'A',
   'A',
   'A',
   'A',
   'A',
   'A',
   'A',
   'A',
   'A',
   'A',
   'A',
   'A',
   'A',
   'B',
   'B',
   'B',
   'B',
   'B',
   'B',
   'B',
   'B',
   'B',
   'B',
   'B',
   'B',
   'B',
   'B',
   'B',
   'B',
   'B',
   'B',
   'B',
   'B',
   'B',
   'B',
   'B',
   'B',
   'F',
   'F',
   'F',
   'F',
   'F',
   'F',
   'F',
   'F',
   'G',
   'G',
   'G',
   'G',
   'G',
   'G',
   'G',
   'G',
   'O',
   'P'],
  'auth_seq_id': ['75',
   '76',
   '77',
   '115',
   '118',
   '119',
   '123',
   '124',
   '127',
   '128',
   '131',
   '134',
   '135',
   '136',
   '144',
   '145',
   '146',
   '147',
   '148',
   '149',
   '150',
   '155',
   '156',
   '157',
   '74',
   '75',
   '76',
   '77',
   '113',
   '115',
   '118',
   '119',
   '124',
   '127',
   '128',
   '131',
   '134

### database-stored

In [29]:
site.id

5

In [30]:
site.updated

datetime.date(2024, 3, 21)

In [31]:
site.pdb

<PDB: 5lvp>

Independently of what is passed as `modulator` at the moment of creation, the identifiers of the modulator molecule(s) that are stored are standardized through the `utils.simplify_residues` function to retrieve the minimal fields needed for their unequivocal identification (see its docs):

In [32]:
site.modulator

{'label_asym_id': ['F', 'G']}

At the moment of creation, the information of the passed pdb and modulator molecule(s) is used to retrieved the site formed by said modulator(s) and is stored, again, in a minimal and standardized way through `utils.simplify_residues`:

In [72]:
print(src.siteutils.get_site_res.__name__, src.siteutils.get_site_res.__doc__)
site.site

get_site_res 
    Function to, given a site (BaseSite (or child class)-type object and its parent structure in attribute .pdb), return a standardized list of residues (with utils.simplify_residues) from the parent structure (from proteins and any other elements) that define the site (according to the passed threshold distance) with the Python interface of open-source PyMOL
    


{'label_asym_id': ['A',
  'A',
  'A',
  'A',
  'A',
  'A',
  'A',
  'A',
  'A',
  'A',
  'A',
  'A',
  'A',
  'A',
  'A',
  'A',
  'A',
  'A',
  'A',
  'A',
  'A',
  'A',
  'A',
  'A',
  'B',
  'B',
  'B',
  'B',
  'B',
  'B',
  'B',
  'B',
  'B',
  'B',
  'B',
  'B',
  'B',
  'B',
  'B',
  'B',
  'B',
  'B',
  'B',
  'B',
  'B',
  'B',
  'B',
  'B',
  'F',
  'F',
  'F',
  'F',
  'F',
  'F',
  'F',
  'F',
  'G',
  'G',
  'G',
  'G',
  'G',
  'G',
  'G',
  'G',
  'O',
  'P'],
 'auth_seq_id': ['75',
  '76',
  '77',
  '115',
  '118',
  '119',
  '123',
  '124',
  '127',
  '128',
  '131',
  '134',
  '135',
  '136',
  '144',
  '145',
  '146',
  '147',
  '148',
  '149',
  '150',
  '155',
  '156',
  '157',
  '74',
  '75',
  '76',
  '77',
  '113',
  '115',
  '118',
  '119',
  '124',
  '127',
  '128',
  '131',
  '134',
  '135',
  '136',
  '145',
  '146',
  '147',
  '148',
  '149',
  '150',
  '155',
  '156',
  '157',
  '8',
  '9',
  '10',
  '11',
  '12',
  '13',
  '14',
  '15',
  '8',
  '9',
  '1

Info about the modulator, about the interacting chains of the PDB that form the site, and about the source of the information/annotation. **Don't forget to appropriately reference the information sources.**

In [33]:
site.info

{'modulator_info': [{'modulator': [{'label_asym_id': 'F'}],
   'label_entity_id': '2',
   'type': 'polymer',
   'pdbx_description': 'hydrophobic-motif peptide of PKB/Akt',
   'polymer_type': 'polypeptide(L)',
   'length': 8},
  {'modulator': [{'label_asym_id': 'G'}],
   'label_entity_id': '2',
   'type': 'polymer',
   'pdbx_description': 'hydrophobic-motif peptide of PKB/Akt',
   'polymer_type': 'polypeptide(L)',
   'length': 8}],
 'interacting_chains_info': [{'label_entity_id': '1',
   'interacting_chains': {'label_asym_id': ['A', 'B']},
   'polymer_type': 'polypeptide(L)',
   'Uniprot': ['O15530']}],
 'source': {'allosteric_database': [{'entry': [{'target_id': 'ASD00060000_1',
      'target_gene': 'PDPK1',
      'organism': 'Homo sapiens',
      'pdb_uniprot': 'O15530',
      'allosteric_pdb': '5LVP',
      'modulator_serial': None,
      'modulator_alias': 'HM-peptide',
      'modulator_chain': 'E',
      'modulator_class': 'Pep',
      'modulator_feature': 'Activator',
      'modul

In [34]:
site.info.keys()

dict_keys(['modulator_info', 'interacting_chains_info', 'source'])

Information about other molecules of the modulator that bind elsewhere in the structure forming an equivalent (and thus not annotated) or non-equivalent (which can make a different Site or if is not an allosteric site according to the source is thus not annotated) (identified during creation, see the `creation` module):

In [35]:
site.related_sites

{'equivalent': [{'other_site': {'label_asym_id': ['E']},
   'res_of_other_in_site': 1.0,
   'res_of_site_in_other': 0.875},
  {'other_site': {'label_asym_id': ['H']},
   'res_of_other_in_site': 0.9583333333333334,
   'res_of_site_in_other': 0.9583333333333334}],
 'nonequivalent': []}

### others

The `.site` attribute together with the `.modulator` define all of the residues that are present in the site when viewed as a structure, and provide the necessary information to define other properties such as `.atoms` and `.residues` (taken from the structural information of the parent PDB), `.atoms`, `.modulator_residues`, `.protein_residues` (if the site is formed on a protein), and even a `.cif` with only the residues/atoms/structure of the site:

In [73]:
site.residues

Unnamed: 0,label_comp_id,label_asym_id,label_entity_id,label_seq_id,pdbx_PDB_ins_code,auth_seq_id,auth_comp_id,auth_asym_id,pdbx_PDB_model_num,pdbx_label_index,pdbx_sifts_xref_db_name,pdbx_sifts_xref_db_acc,pdbx_sifts_xref_db_num,pdbx_sifts_xref_db_res
0,ARG,A,1,27,?,75,ARG,A,1,27,UNP,O15530,75,R
1,LYS,A,1,28,?,76,LYS,A,1,28,UNP,O15530,76,K
2,LYS,A,1,29,?,77,LYS,A,1,29,UNP,O15530,77,K
3,LYS,A,1,67,?,115,LYS,A,1,67,UNP,O15530,115,K
4,ILE,A,1,70,?,118,ILE,A,1,70,UNP,O15530,118,I
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61,TYR,G,2,13,?,13,TYR,G,1,13,?,?,?,?
62,SER,G,2,14,?,14,SER,G,1,14,?,?,?,?
63,ALA,G,2,15,?,15,ALA,G,1,15,?,?,?,?
64,HOH,O,5,.,?,607,HOH,A,1,607,?,?,?,?


In [74]:
site.modulator_residues # multiple residues because it is a peptide

Unnamed: 0,label_comp_id,label_asym_id,label_entity_id,label_seq_id,pdbx_PDB_ins_code,auth_seq_id,auth_comp_id,auth_asym_id,pdbx_PDB_model_num,pdbx_label_index,pdbx_sifts_xref_db_name,pdbx_sifts_xref_db_acc,pdbx_sifts_xref_db_num,pdbx_sifts_xref_db_res
0,PHE,F,2,8,?,8,PHE,F,1,8,?,?,?,?
1,PRO,F,2,9,?,9,PRO,F,1,9,?,?,?,?
2,GLN,F,2,10,?,10,GLN,F,1,10,?,?,?,?
3,PHE,F,2,11,?,11,PHE,F,1,11,?,?,?,?
4,SEP,F,2,12,?,12,SEP,F,1,12,?,?,?,?
5,TYR,F,2,13,?,13,TYR,F,1,13,?,?,?,?
6,SER,F,2,14,?,14,SER,F,1,14,?,?,?,?
7,ALA,F,2,15,?,15,ALA,F,1,15,?,?,?,?
8,PHE,G,2,8,?,8,PHE,G,1,8,?,?,?,?
9,PRO,G,2,9,?,9,PRO,G,1,9,?,?,?,?


In [39]:
site.protein_residues # even though the modulator is a peptide, only the "target"/host protein's residues are included

Unnamed: 0,label_comp_id,label_asym_id,label_entity_id,label_seq_id,pdbx_PDB_ins_code,auth_seq_id,auth_comp_id,auth_asym_id,pdbx_PDB_model_num,pdbx_label_index,pdbx_sifts_xref_db_name,pdbx_sifts_xref_db_acc,pdbx_sifts_xref_db_num,pdbx_sifts_xref_db_res
2,ARG,A,1,27,?,75,ARG,A,1,27,UNP,O15530,75,R
3,ARG,A,1,83,?,131,ARG,A,1,83,UNP,O15530,131,R
4,ARG,A,1,88,?,136,ARG,A,1,88,UNP,O15530,136,R
5,ARG,B,1,27,?,75,ARG,B,1,27,UNP,O15530,75,R
6,ARG,B,1,83,?,131,ARG,B,1,83,UNP,O15530,131,R
7,ARG,B,1,88,?,136,ARG,B,1,88,UNP,O15530,136,R
8,GLN,A,1,102,?,150,GLN,A,1,102,UNP,O15530,150,Q
9,GLN,B,1,102,?,150,GLN,B,1,102,UNP,O15530,150,Q
12,ILE,A,1,70,?,118,ILE,A,1,70,UNP,O15530,118,I
13,ILE,A,1,71,?,119,ILE,A,1,71,UNP,O15530,119,I


And a site is also a child class of `Structure` and thus has a .cif file with its information and can be visualized:

In [76]:
print(site.cif.text)

data_Site
#
_entry.id       Site

#
loop_
_atom_site.label_comp_id                 
_atom_site.label_asym_id                 
_atom_site.label_entity_id               
_atom_site.label_seq_id                  
_atom_site.pdbx_PDB_ins_code             
_atom_site.auth_seq_id                   
_atom_site.auth_comp_id                  
_atom_site.auth_asym_id                  
_atom_site.pdbx_PDB_model_num            
_atom_site.pdbx_label_index              
_atom_site.pdbx_sifts_xref_db_name       
_atom_site.pdbx_sifts_xref_db_acc        
_atom_site.pdbx_sifts_xref_db_num        
_atom_site.pdbx_sifts_xref_db_res        
_atom_site.group_PDB                     
_atom_site.id                            
_atom_site.type_symbol                   
_atom_site.label_atom_id                 
_atom_site.label_alt_id                  
_atom_site.Cartn_x                       
_atom_site.Cartn_y                       
_atom_site.Cartn_z                       
_atom_site.occupancy              

In [40]:
site.view()

Viz(bg_color='#F7F7F7', color_data={'data': [{'color': 'white'}], 'nonSelectedColor': None, 'keepColors': Fals…

In [41]:
v = site.view()
v

Viz(bg_color='#F7F7F7', color_data={'data': [{'color': 'white'}], 'nonSelectedColor': None, 'keepColors': Fals…

In [42]:
v.color_site(site)

### site-derived

Other properties are `.assembly_site`, `.nonredundant_site_pdb` and derived from it `.nonredundant_site`, and `.minimal_site_pdb` (see their docs), which are all also `Structure` objects themselves with @property-decorated (and cached) methods that are able to transform and return different information:

In [77]:
print(site.assembly_site.__doc__)
site.assembly_site


    BaseSite-type class to obtain and manage the site formed by an annotated modulator(s) in the structure of its PDB assembly
    


<src.siteutils.AssemblySite at 0x7f4652bcba60>

In [82]:
print(
    "Number of site residues in PDB model:", len(site.residues), "\n"
    "Number of site residues in PDB assembly:", len(site.assembly_site.residues)
)
site.assembly_site.residues

Number of site residues in PDB model: 66 
Number of site residues in PDB assembly: 33


Unnamed: 0,label_comp_id,label_asym_id,label_entity_id,label_seq_id,pdbx_PDB_ins_code,auth_seq_id,auth_comp_id,auth_asym_id,pdbx_PDB_model_num,pdbx_label_index,pdbx_sifts_xref_db_name,pdbx_sifts_xref_db_acc,pdbx_sifts_xref_db_num,pdbx_sifts_xref_db_res
0,ARG,A,1,27,?,75,ARG,A,1,27,UNP,O15530,75,R
1,LYS,A,1,28,?,76,LYS,A,1,28,UNP,O15530,76,K
2,LYS,A,1,29,?,77,LYS,A,1,29,UNP,O15530,77,K
3,LYS,A,1,67,?,115,LYS,A,1,67,UNP,O15530,115,K
4,ILE,A,1,70,?,118,ILE,A,1,70,UNP,O15530,118,I
5,ILE,A,1,71,?,119,ILE,A,1,71,UNP,O15530,119,I
6,LYS,A,1,75,?,123,LYS,A,1,75,UNP,O15530,123,K
7,VAL,A,1,76,?,124,VAL,A,1,76,UNP,O15530,124,V
8,VAL,A,1,79,?,127,VAL,A,1,79,UNP,O15530,127,V
9,THR,A,1,80,?,128,THR,A,1,80,UNP,O15530,128,T


In [43]:
site.assembly_site.view()

Viz(bg_color='#F7F7F7', color_data={'data': [{'color': 'white'}], 'nonSelectedColor': None, 'keepColors': Fals…

In [44]:
v = pdb.assembly.view()
v

Viz(bg_color='#F7F7F7', color_data={'data': [{'color': 'white'}], 'nonSelectedColor': None, 'keepColors': Fals…

In [45]:
v.color_site(site.assembly_site.)

In [46]:
print(site.nonredundant_site_pdb.__doc__)
site.nonredundant_site_pdb


    Structure-type class to obtain a structure with the passed site modulator(s) and the minimum non-redundant amount of protein chains that directly interact with it. Redundancy in a site is defined as different protein chains (different label_asym_id) of the same entity (label_entity_id) interacting with the modulator(s) through the same site in their surface (the same residue names and sequence IDs (% of intersection above the utils.threshold limit)).
    


<src.siteutils.Nonredundant_Site_pdb at 0x7ee43502f7f0>

In [47]:
print(
    "Number of residues in PDB model (not only site):", len(site.pdb.residues), "\n"
    "Number of residues in nonredundant_site_pdb made from the site:", len(site.nonredundant_site_pdb.residues)
)
site.nonredundant_site_pdb.residues

Number of residues in PDB model (not only site): 1428 
Number of residues in nonredundant_site_pdb made from the site: 292


Unnamed: 0,label_comp_id,label_asym_id,label_entity_id,label_seq_id,pdbx_PDB_ins_code,auth_seq_id,auth_comp_id,auth_asym_id,pdbx_PDB_model_num,pdbx_label_index,pdbx_sifts_xref_db_name,pdbx_sifts_xref_db_acc,pdbx_sifts_xref_db_num,pdbx_sifts_xref_db_res
0,PHE,F,2,8,?,8,PHE,F,1,8,?,?,?,?
1,PRO,F,2,9,?,9,PRO,F,1,9,?,?,?,?
2,GLN,F,2,10,?,10,GLN,F,1,10,?,?,?,?
3,PHE,F,2,11,?,11,PHE,F,1,11,?,?,?,?
4,SEP,F,2,12,?,12,SEP,F,1,12,?,?,?,?
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2170,PRO,A,1,307,?,355,PRO,A,1,307,UNP,O15530,355,P
2177,PRO,A,1,308,?,356,PRO,A,1,308,UNP,O15530,356,P
2184,LYS,A,1,309,?,357,LYS,A,1,309,UNP,O15530,357,K
2193,LEU,A,1,310,?,358,LEU,A,1,310,UNP,O15530,358,L


In [48]:
print(site.nonredundant_site.__doc__)
site.nonredundant_site

BaseAllosite-type object with the (nonredundant) site that the annotated modulator forms in the present structure


<src.siteutils.BaseAllosite at 0x7ee43c1babf0>

In [49]:
print(
    "Number of site residues in PDB model:", len(site.residues), "\n"
    "Number of site residues in PDB assembly:", len(site.assembly_site.residues), "\n"
    "Number of site residues in PDB model's nonredundant site (without modulator residues):", len(site.nonredundant_site.residues)
)
site.nonredundant_site.residues

Number of site residues in PDB model: 66 
Number of site residues in PDB assembly: 33 
Number of site residues in PDB model's nonredundant site (without modulator residues): 40


Unnamed: 0,label_comp_id,label_asym_id,label_entity_id,label_seq_id,pdbx_PDB_ins_code,auth_seq_id,auth_comp_id,auth_asym_id,pdbx_PDB_model_num,pdbx_label_index,pdbx_sifts_xref_db_name,pdbx_sifts_xref_db_acc,pdbx_sifts_xref_db_num,pdbx_sifts_xref_db_res
0,PHE,F,2,8,?,8,PHE,F,1,8,?,?,?,?
1,PRO,F,2,9,?,9,PRO,F,1,9,?,?,?,?
2,GLN,F,2,10,?,10,GLN,F,1,10,?,?,?,?
3,PHE,F,2,11,?,11,PHE,F,1,11,?,?,?,?
4,SEP,F,2,12,?,12,SEP,F,1,12,?,?,?,?
5,TYR,F,2,13,?,13,TYR,F,1,13,?,?,?,?
6,SER,F,2,14,?,14,SER,F,1,14,?,?,?,?
7,ALA,F,2,15,?,15,ALA,F,1,15,?,?,?,?
8,PHE,G,2,8,?,8,PHE,G,1,8,?,?,?,?
9,PRO,G,2,9,?,9,PRO,G,1,9,?,?,?,?


In [50]:
v = site.nonredundant_site_pdb.view()
v

Viz(bg_color='#F7F7F7', color_data={'data': [{'color': 'white'}], 'nonSelectedColor': None, 'keepColors': Fals…

In [51]:
v.color_site(site.nonredundant_site) # the redundancy is eliminated only among the target/host protein chains

Additionally, since `.assembly_site` is a Site-type object itself, it also has derived methods/properties:

In [52]:
print(
    "Number of site residues in PDB model:", len(site.residues), "\n"
    "Number of site residues in PDB assembly:", len(site.assembly_site.residues), "\n"
    "Number of site residues in PDB model's nonredundant site (without modulator residues):", len(site.nonredundant_site.residues), "\n"
    "Number of site residues in PDB assembly's nonredundant site (without modulator residues):", len(site.assembly_site.nonredundant_site.residues)
)
site.assembly_site.nonredundant_site.residues

Number of site residues in PDB model: 66 
Number of site residues in PDB assembly: 33 
Number of site residues in PDB model's nonredundant site (without modulator residues): 40 
Number of site residues in PDB assembly's nonredundant site (without modulator residues): 32


Unnamed: 0,label_comp_id,label_asym_id,label_entity_id,label_seq_id,pdbx_PDB_ins_code,auth_seq_id,auth_comp_id,auth_asym_id,pdbx_PDB_model_num,pdbx_label_index,pdbx_sifts_xref_db_name,pdbx_sifts_xref_db_acc,pdbx_sifts_xref_db_num,pdbx_sifts_xref_db_res
0,PHE,F,2,8,?,8,PHE,F,1,8,?,?,?,?
1,PRO,F,2,9,?,9,PRO,F,1,9,?,?,?,?
2,GLN,F,2,10,?,10,GLN,F,1,10,?,?,?,?
3,PHE,F,2,11,?,11,PHE,F,1,11,?,?,?,?
4,SEP,F,2,12,?,12,SEP,F,1,12,?,?,?,?
5,TYR,F,2,13,?,13,TYR,F,1,13,?,?,?,?
6,SER,F,2,14,?,14,SER,F,1,14,?,?,?,?
7,ALA,F,2,15,?,15,ALA,F,1,15,?,?,?,?
8,ARG,A,1,27,?,75,ARG,A,1,27,UNP,O15530,75,R
9,LYS,A,1,28,?,76,LYS,A,1,28,UNP,O15530,76,K


In [53]:
print(site.minimal_site_pdb.__doc__)
site.minimal_site_pdb


    Structure-type class to make objects that will only contain the minimum amount necessary of "molecules" (i.e., label_entity_id) from the object from which it is intialized in order to comprise a complete protein chain(s) + the ligand(s) it(they) is(are) directly interacting with.

    TODO: should this be deprecated/deleted?
    


<src.utils.Minimal_pdb at 0x7ee43f643e80>

In [54]:
print(site.minimal_site_pdb.cif.text)

data_minimal_site_pdb
#
_entry.id       minimal_site_pdb

#
loop_
_atom_site.label_comp_id                 
_atom_site.label_asym_id                 
_atom_site.label_entity_id               
_atom_site.label_seq_id                  
_atom_site.pdbx_PDB_ins_code             
_atom_site.auth_seq_id                   
_atom_site.auth_comp_id                  
_atom_site.auth_asym_id                  
_atom_site.pdbx_PDB_model_num            
_atom_site.pdbx_label_index              
_atom_site.pdbx_sifts_xref_db_name       
_atom_site.pdbx_sifts_xref_db_acc        
_atom_site.pdbx_sifts_xref_db_num        
_atom_site.pdbx_sifts_xref_db_res        
_atom_site.group_PDB                     
_atom_site.id                            
_atom_site.type_symbol                   
_atom_site.label_atom_id                 
_atom_site.label_alt_id                  
_atom_site.Cartn_x                       
_atom_site.Cartn_y                       
_atom_site.Cartn_z                       
_atom_site