<a href="https://colab.research.google.com/github/rcsb/py-rcsb-api/blob/master/notebooks/quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install rcsb-api

In [55]:
from rcsbapi.data import Schema, Query
from pprint import pprint

## RCSB PDB Data API: Quick-start

This quick-start notebook will walk through the basics of making queries in this package using a simple example. For more in-depth documentation reference the [readthedocs page](https://py-rcsb-api.readthedocs.io/en/latest/index.html).

\
install the package: 

```pip install rcsb-api```

\
In this notebook, we will be working with the below query. This GraphQL query requests non-polymer, polymer components of a structure (ions, cofactors, etc). We will be working with this query or related queries in this notebook.

```
{
  entry(entry_id: "4HHB") {
    rcsb_entry_info {
      nonpolymer_bound_components
    }
  }
}
```

## Making Queries

You would make the equivalent query in this package by creating a Query object as shown below. 

The Query object automatically generates a query and makes a request to our Data API. The JSON response can be accessed with the `get_response()` method.

In [None]:
#"entry_id" as key in input_ids
query = Query(input_type="entry", input_ids={"entry_id":"4HHB"}, return_data_list=["nonpolymer_bound_components"])
query.exec()
pprint(query.get_response())

Making a query requires 3 arguments - input_type, input_ids, and return_data_list.

## input_types
input_types are designated points where you can begin your query. Some examples are entry, polymer_entity, and polymer_entity_instance. You can also begin your search with uniprot or pubmed using their IDs. For a full list of input_types see the [README](https://github.com/rcsb/py-rcsb-api/blob/dev-it-schema-parse/README.md#input_types).

If you're unsure of which input_type would be best and are using a PDB ID (4HHB, 4HHB_1, 4HHB.A, 4HHB-1), you can generally begin at entry. This may produce a more verbose query that can later be refined.

## input_ids
input_ids are accepted as a dictionary or a list of PDB-format IDs. input_id dictionaries have specific keys depending on the input_type (entry, polymer_entity, etc). To get the keys associated with an input_type, use the `get_input_id_dict(<input_type>)` method.

In [None]:
# requires multiple keys to specify a polymer_entity_instance
query = Query(input_type="polymer_entity_instance", input_ids={"entry_id":"4HHB", "asym_id":"A"}, return_data_list=["nonpolymer_bound_components"])
query.exec()
pprint(query.get_response())
#Note that this query returns the same information and has to go to entry again. This could more efficiently be accessed through entry input_type, like above.

In [None]:
# to get the dictionary keys and descriptions for a given input_type, use the get_input_id_dict method
schema = Schema() #create an instance of the API Schema
pprint(schema.get_input_id_dict("polymer_entity_instance"))

input_id lists must be passed in PDB ID format 

|Type | Format |Example
|---|---|---|
|entries | entry_id | 4HHB
|polymer, branched, or non-polymer entities | [entry_id]_[entity_id] | 4HHB_1 |
|polymer, branched, or non-polymer entity instances| [entry_id].[asym_id] | 4HHB.A |
|biological assemblies | [entry_id]-[assembly_id]| 4HHB-1 |
|interface| [entry_id]-[assembly_id]-[interface_id] |4HHB-1.1 |

The below examples with lists for input_ids are equivalent to above. Note that although there is only one input id, the argument must be a list not a string.

In [None]:
query = Query(input_type="entry", input_ids=["4HHB"], return_data_list=["nonpolymer_bound_components"])
query.exec()
pprint(query.get_response())

In [None]:
# uses PDB ID format
query = Query(input_type="polymer_entity_instance", input_ids=["4HHB.A"], return_data_list=["nonpolymer_bound_components"])
query.exec()
pprint(query.get_response())

## return_data_list
return_data_list are the fields/data you are requesting in your query. <!--You can explore possible fields by using the search method on a string--> 

There are some fields that must be further specified using multiple fields separated by dots. You can search for the dot notation of a field by using the `find_paths(input_type, field_name)` method.

In [None]:
# return_data_list isn't specific enough, throws a ValueError. The ValueError will list up to 10 valid paths.
query = Query(input_type="polymer_entity_instance", input_ids=["4HHB.A"], return_data_list=["polymer_composition"])

In [None]:
# run find_paths("polymer_composition")
schema = Schema()
schema.find_paths("polymer_entity_instance", "polymer_composition")

In [None]:
# By looking through the list, find the intended field
query = Query(input_type="polymer_entity_instance", input_ids=["4HHB.A"], return_data_list=["polymer_entity.entry.rcsb_entry_info.polymer_composition"])
query.exec()
pprint(query.get_response())

If you're unsure which field to use, you can call `find_field_names(<search string>)`. This method will also return partial matches.

In [None]:
schema = Schema()
pprint(schema.find_field_names("comp"))

### More Complex Queries

You can make more complex queries by searching multiple ids at once or adding more fields/data in the return_data_list.

In [None]:
# search multiple ids. Note the input_type changed from "entry" to "entries"
query = Query(
    input_type="entries",
    input_ids=["4HHB", "12CA", "3PQR"],
    return_data_list=["nonpolymer_bound_components"]
)
query.exec()
pprint(query.get_response())

In [None]:
# search multiple fields
query = Query(
    input_type="entry",
    input_ids={"entry_id": "4HHB"},
    return_data_list=[
        "citation.title",
        "nonpolymer_bound_components",
        "rcsb_entry_info.polymer_composition"
    ]
)
query.exec()
pprint(query.get_response())