---
badges: true
author: "Samdani Ansar"
categories:
- Structural Bioinformatics
date: '2023-04-23'
title: RCSB-PDB API
description: For API search to RCSB-PDB database
toc: true
image: images/edia.png

---

Input pdb/cif file parse and 
* get information on missing atoms, 
* missing residues, 
* residue number mapping(use SIFTS), 
* covalent interaction, 
* ligand ids, 
* missing ligand atom, 
* symmetry information, 
* parse EDIA score and propose probable low density region, 
* splitting pdb structure based on chain, alt conformation etc., 
* generate symmetry, 

The website [PDB-101](https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/introduction) provides information on different data available for the pdb database.

The introduction to RCSB-PDB API is discussed [here](https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/introduction-to-rcsb-pdb-apis). In this post we will see what different API's are available and how to use them to get data from the PDB database.

More detailed information on programmatic access to PDB database can be found [here](https://www.rcsb.org/docs/programmatic-access/web-services-overview)

# DATA API

In [None]:
The DATA API tutorial can be found [here](https://data.rcsb.org/index.html).


From pdb website rewrite it
All static data that is exposed in rcsb.org is available in the Data API. The schema follows the mmCIF dictionary, extended with annotations coming from external resources. The core PDB data is split up into core objects, one per level of the structural data hierarchy, with entity subdivided into polymeric and non-polymeric subschemas (differing from the mmCIF dictionary). These are some of the core objects:

core_entry: data that relates to a PDB entry or Computed Structure Model (CSM). Identified by an entry_id, which can be an alphanumeric PDB-ID or a CSM-ID that starts with AF_ or MA_
core_polymer_entity: data for each polymeric molecular entity in an entry (e.g., protein, DNA, and RNA). Identified by entry ID and entity ID separated by a _ character, e.g. 3PQR_1
core_nonpolymer_entity: data for each non-polymeric small chemical entity in an entry (e.g., enzyme cofactors, ligands, ions, etc). Identified by entry ID and entity ID separated by a _ character
core_branched_entity: data for branched molecules (e.g., oligosaccharides). Identified by entry ID and entity ID separated by a _ character
core_assembly: data for each biological assembly in an entry. Identified by entry ID and assembly ID separated by a _ character
core_polymer_entity_instance: an instance of a certain polymeric molecular entity, also known as chain. Identified by entry ID and asym ID separated by a _ character
core_chem_comp: a chemical component. Identified by a unique alphanumeric code chem_comp_id
Both internal additions to the mmCIF dictionary and external resources annotations are prefixed with rcsb_. In each core object, the rcsb_<core_object>_container_identifiers field holds the cardinal identifiers for the objects and any parent/child. Additionally every core object contains a single string identifier in field rcsb_id.


There are two main API's available in rcsb database
* **Data API** - for retrievin data when the PDB id's are known.serves to retrieve data when you know the PDB identifiers
* **Search API** - for finding matches to the search conditions provided.

The data can be obtained by two interfaces:
* **[REST API](https://data.rcsb.org/redoc/index.html)** - allows data retrieval for one object at a time.
* **[GraphQL API](https://data.rcsb.org/graphql/index.html)** - offers flexible interface for data retrival. To use it programmatically POST your GraphQL queries under the data.rcsb.org/graphql endpoint.

The output from the API interfaces will be in JSON format

api-docs as json can be found [here](https://data.rcsb.org/redoc/rcsb-restful-api-docs.json)

In [16]:
import requests
import json

response = requests.get('https://data.rcsb.org/redoc/rcsb-restful-api-docs.json')

# Decode the response content using the specified encoding
decoded_content = response.content.decode('iso-8859-1')

# Load the decoded content into JSON
json_data = json.loads(decoded_content)

# Access and work with the JSON data
print(json_data)


{'openapi': '3.0.1', 'info': {'title': 'RCSB RESTful API', 'description': 'Provides programmatic access to information and annotations stored in the Protein Data Bank. <br>Models are generated from JSON schema version: <b>1.42.0</b>. <br>API services deployed on: Sun, 21 May 2023 10:19:27 -0700', 'contact': {'name': 'RCSB PDB', 'url': 'www.rcsb.org', 'email': 'info@rcsb.org'}, 'version': '1.42.0'}, 'servers': [{'url': '/rest/v1'}], 'tags': [{'name': 'Assembly Service', 'description': 'provides access to information about structures at the quaternary structure level'}, {'name': 'Entity Service', 'description': 'provides access to information about structures at the level of unique molecular entities'}, {'name': 'Entity Instance Service', 'description': 'provides access to information about structures at the level of unique molecular instances (chains)'}, {'name': 'Chemical Component Service', 'description': 'provides access to information about chemical components from which the relevan

In [17]:
json_data.keys()

dict_keys(['openapi', 'info', 'servers', 'tags', 'paths', 'components'])

In [18]:
json_data['openapi']

'3.0.1'

In [19]:
json_data['info']

{'title': 'RCSB RESTful API',
 'description': 'Provides programmatic access to information and annotations stored in the Protein Data Bank. <br>Models are generated from JSON schema version: <b>1.42.0</b>. <br>API services deployed on: Sun, 21 May 2023 10:19:27 -0700',
 'contact': {'name': 'RCSB PDB',
  'url': 'www.rcsb.org',
  'email': 'info@rcsb.org'},
 'version': '1.42.0'}

In [20]:
json_data['servers']

[{'url': '/rest/v1'}]

In [21]:
json_data['tags']

[{'name': 'Assembly Service',
  'description': 'provides access to information about structures at the quaternary structure level'},
 {'name': 'Entity Service',
  'description': 'provides access to information about structures at the level of unique molecular entities'},
 {'name': 'Entity Instance Service',
  'description': 'provides access to information about structures at the level of unique molecular instances (chains)'},
 {'name': 'Chemical Component Service',
  'description': 'provides access to information about chemical components from which the relevant chemical structures can be constructed.'},
 {'name': 'Entry Service',
  'description': 'provides access to information about structures at the top entry level'},
 {'name': 'Groups Service',
  'description': 'provides access to groups formed by aggregating individual structures, sequences or assemblies that share a degree of similarity'},
 {'name': 'Interface Service',
  'description': 'provides access to information about pairw

In [24]:
json_data['paths']

{'/core/assembly/{entry_id}/{assembly_id}': {'get': {'tags': ['Assembly Service'],
   'summary': 'Get structural assembly description by ENTRY ID and ASSEMBLY ID.',
   'operationId': 'getAssemblyById',
   'parameters': [{'name': 'entry_id',
     'in': 'path',
     'description': 'ENTRY ID of the entry.',
     'required': True,
     'schema': {'type': 'string', 'example': '1RH7'}},
    {'name': 'assembly_id',
     'in': 'path',
     'description': 'ASSEMBLY ID of the biological assembly candidate.',
     'required': True,
     'schema': {'type': 'string', 'example': '1'}}],
   'responses': {'200': {'description': 'OK',
     'content': {'application/json;charset=utf-8': {'schema': {'$ref': '#/components/schemas/CoreAssembly'}}}},
    '404': {'description': 'Not Found'}}}},
 '/core/branched_entity/{entry_id}/{entity_id}': {'get': {'tags': ['Entity Service'],
   'summary': 'Get branched entity description by ENTRY ID and ENTITY ID.',
   'operationId': 'getBranchedEntityById',
   'parameter

In [27]:
json_data['components']

{'schemas': {'ClustersMembers': {'required': ['asym_id'],
   'type': 'object',
   'properties': {'asym_id': {'type': 'string',
     'description': 'Internal chain ID used in mmCIF files to uniquely identify structural elements in the asymmetric unit.'},
    'pdbx_struct_oper_list_ids': {'maxItems': 2147483647,
     'minItems': 1,
     'type': 'array',
     'items': {'type': 'string'}}},
   'description': 'Subunits that belong to the cluster, identified by asym_id and optionally by assembly operator id(s).'},
  'CoreAssembly': {'required': ['rcsb_assembly_container_identifiers',
    'rcsb_id'],
   'type': 'object',
   'properties': {'pdbx_struct_assembly': {'$ref': '#/components/schemas/PdbxStructAssembly'},
    'pdbx_struct_assembly_auth_evidence': {'maxItems': 2147483647,
     'minItems': 1,
     'uniqueItems': True,
     'type': 'array',
     'items': {'$ref': '#/components/schemas/PdbxStructAssemblyAuthEvidence'}},
    'pdbx_struct_assembly_gen': {'maxItems': 2147483647,
     'minIt

# Search API

In [None]:
The search API programmatically exposes all search functionality available at rcsb.org. It is possible to perform queries with arbitrary Boolean logic across all data available in the RCSB PDB data API via a convenient JSON-format query language. At the root level it is also possible to combine text-based searches (any text/numerical field in the RCSB PDB data API) with protein/nucleotide sequence search (mmseqs2 software) and Structure similarity searches (BioZernike software, described in Guzenko et al 2020). All output from the Search API is offered in JSON format.

[SearchAPI Tutorial](https://search.rcsb.org/index.html)
[SearchAPI Reference](https://search.rcsb.org/redoc/index.html)

Example os using [SearchAPI](https://search.rcsb.org/#search-example-15)

# ModelServer API

The ModelServer is a service for accessing subsets of macromolecular model data. It delivers atomic coordinates together with annotations in the primary data files in a compressed BinaryCIF encoding (BCIF). Structure data can be served at different levels of granularity (e.g., assembly, polymer chain, ligand), and ligand data may also be delivered in popular chemical informatics formats (e.g., SDF, MOL, MOL2).

More details on ModelServer API is availale [here](https://models.rcsb.org/)

# VolumeServer API

The VolumeServer is a service for accessing subsets of volumetric data. It automatically downsamples the data depending on the volume of the requested region to reduce the bandwidth requirements and provide near-instant access to even the largest data sets.

More details about VolumeServer API is available [here](https://maps.rcsb.org/)

# 1D Coordinate Server

The RCSB PDB 1D Coordinate Server compiles alignments between structural and sequence databases and integrates protein positional features from multiple resources. Alignment data is available for NCBI RefSeq (including protein and genomic sequences), UniProt and PDB sequences. Protein positional features are integrated from UniProt, CATH, SCOPe and RCSB PDB and collected from the RCSB PDB Data Warehouse.

The tutorial of 1D coordinate server is available [here](https://1d-coordinates.rcsb.org/)