# Ligand PDB Searches

## Introduction

Ligands are ions or molecules that donate electrons to a central functional group to form a coordination complex. This notebook focuses on the different kinds of PDB searches surrounding ligands. While these searches are limited to interests relating to ligands, by combining them with other search queries, you can find highly specific outputs and data.

Within this notebook, there are a few kind of ligand searches that it is aiming to look at.

+ Free Versus Polymeric Ligands
+ Ligands of Interest
+ Ligand Binding Affinity
+ Structure-Ligand Complexes
 
With each of these query types, there are some different parameters which help you create the search you are interested in. Many of the searches would allow for you to search for practically everything but the code comments will provide a more clear indication of whether you can input any value or what the parameters of the value actually are.

This notebook has a variety of input variables and each of these will be explained in the step of the notebook they are relevant to. The output of this notebook will be a list of all possible PDB files that fill your search parameters, and then these can be further explored using libraries such as pandas to create data frames and analyze this data.

Steps:

1. Step 1 - Installing libraries and importing them into the notebook
2. Step 2 - Introducing the Searches
3. Step 3 - Free versus Polymeric Ligands
4. Step 4 - Ligands of Interest
5. Step 5 - Ligand Binding Affinity
6. Step 6 - Structure-Ligand Complexes
7. Step 7 - Next Steps and Combining the Searches


### Questions

1. How can you utilize code with Jupyter notebooks to more effectively run advanced searches in the PDB?
2. How can you focus PDB advanced searches on ligand related queries?
3. How you can use example searches as models for other specific queries?

### Learning Objectives

This note book aims to teach people how to utilize Python to create advanced searches in the PDB focusing around ligands.

### Purpose

This notebook is designed to utilize Python to facilitate PDB advanced searches forcusing on ligand related interests.

## Notebook Contents

For **novice and intermediate coders**, the code is divided into sequential coding cells that each perform one step in the process. This notebook includes the following steps:

For **Experienced coders** you can take the different search types and use them in the environment that is most preferred.

1. Step 1 - Installing libraries and importing them into the notebook
2. Step 2 - Introducing the Searches
3. Step 3 - Free versus Polymeric Ligands
4. Step 4 - Ligands of Interest
5. Step 5 - Ligand Binding Affinity
6. Step 6 - Structure-Ligand Complexes
7. Step 7 - Next Steps and Combining the Searches

## Libraries

A list of libraries that will need to be installed and imported to complete the tasks in the notebook.

| Library | Abbreviation |Contents | Source |
| :-----: | ------------ | :------- | :----- |
| json | N/A | library for working with JavaScript Object Notation for data interchange| [json — JSON encoder and decoder](https://docs.python.org/3/library/json.html) |
| rcsbsearchapi | N/A | library for automated searching of the [RCSB Protein Data Bank](https://www.rcsb.org)| [py-rcsbsearchapi on GitHub](https://github.com/rcsb/py-rcsbsearchapi) |
| rcsbattributes | attrs | sublibrary from rcsbsearchapi to specifically make *insert here* searches from the PDB | 
| AttributeQuery | Attr | sublibrary from rcsbsearchapi to specifically make *insert here* seraches from the PDB |
| pandas | pd | library to to manipulate data *insert links here* |

## Installation

These libraries will need to be installed in your computing environment to perform the tasks in this notebook.

To install from the command line on your computer, use this command (with the `json` library as the example):

`pip install json`

To install from within a Jupyter notebook or CoLab notebook, you need to type the same command in a coding cell, preceded by an exclamation point.

`!pip install json`

These libraries will be imported as they are needed over the course of this notebook.


#### Novice and Intermediate Coders

In [None]:
# Stepwise code for NOVICE and INTERMEDIATE CODERS
!pip install json
!pip install rcsbsearchapi
!pip install pandas
# These code lines are going to install the corresponding libraries onto your computer

In [None]:
# Importing the libraries with abbreviations
import json
import rcsbsearchapi
#from rcsbsearchapi import rcsb_attributes as attrs
from rcsbsearchapi.search import AttributeQuery, Attr
import pandas as pd
# These code lines are important to utilize the libraries within this notebook as well as define certain abbreviations further down

#### Experienced Coders

In [None]:
# Full block of raw code for EXPERIENCED CODERS
import json
import rcsbsearchapi
#from rcsbsearchapi import rcsb_attributes as attrs
from rcsbsearchapi.search import AttributeQuery, Attr
import pandas as pd

## Introducing the Searches

In the introduction to this notebook, the different ligand searches were introduced. As a reminder, here are the different search types that this notebook encompasses:

+ Free Versus Polymeric Ligands
+ Ligands of Interest
+ Ligand Binding Affinity
+ Structure-Ligand Complexes
    
With each of these searches, there will be different values you are focusing on. As a brief overview of each of these searches -

#### Free Versus Polymeric Ligands
This search will be looking at covalent or non-covalent linkage between chemical components and molecules uploaded to the PDB. In this search you will introduce an input and linkage-type, and recieve an output of all of the PDB files that include that chemical-component that is linked under the parameters.

#### Ligands of Interest
This search is focusing on PDB files that have specific ligands as the focus on the research or upload. With this query, you can search for any ligand, and use it to find specific files that include this ligand.

#### Ligand Binding Affinity
This search allows you to find a variety of molecules that have certain values associated with different binding affinity types. In the information for this search, it will more effectively break down what this looks like, but this search has a lot of different options to allow for a very specific output.

#### Structure-Ligand Complexes
This search will look for complexes with a specific ligand or just broadly. With this search, there are a lot of potential paramters to adjust depending on whether you are looking for nucleic acid material or want to omit it. The search information will break down more what this can look for.

## Free Versus Polymeric Ligands

This search will be looking at covalent or non-covalent linkage between chemical components and molecules uploaded to the PDB. In this search you will introduce an input and linkage-type, and recieve an output of all of the PDB files that include that chemical-component that is linked under the parameters.

In this search you have two big inputs:

*chemical_component* - which can be any chemical you are interested in examining linkage with

*linkage type* - which has two options - either "HAS_NO_COVALENT_LINKAGE" or "HAS_COVALENT_LINKAGE" in that exact search depending on what you are looking for.

Once you determine what you want your input values to be, you can run the code block to gather your outputs.
With these outputs, you can walk through the second half of the code and determine what you are hoping to see with your output and adjust this by either adding pound symbols before a line of code or removing them.

In [None]:
chemical_component = "ATP"
# this is going to be the chemical component that you are searching for linkage with
# hypothetically, you could put any chemical component here - it's really just a question of whether there will be any results

linkage_type = "HAS_NO_COVALENT_LINKAGE"
# this is the linkage type
# the only two options here are "HAS_NO_COVALENT LINKAGE" or "HAS_COVALENT_LINKAGE" 

q1_1 = AttributeQuery(attribute = "rcsb_nonpolymer_instance_annotation.comp_id", operator = "exact_match", value = chemical_component)
# search for the chemical component

q1_2 = AttributeQuery(attribute = "rcsb_nonpolymer_instance_annotation.type", operator = "exact_match", value = linkage_type)
# search for the linkage type

linkedligandquery = q1_1 & q1_2
# this statement combines the previous queries to create the search of both the chemical component AND the linkage type
linkedligandqueryresult = list(linkedligandquery())
# this statement combines the results of the query into a list

##print(linkedligandqueryresult[1:10])
# to use this statement, remove the two pound symbols
# the statement will take the results list and show the PDB structure titles of the first 10 results
# by changing the 10 to another number, you can see the first however many number of results you would like

##print(len(linkedligandqueryresult))
# to use this statement, remove the two pound symbols
# this statement will show the number of results within the query parameters

###*note to victoria -- add more possible things to DO with the results*

## Ligand of Interest (LOI)

This search is focusing on PDB files that have specific ligands as the focus on the research or upload. With this query, you can search for any ligand, and use it to find specific files that include this ligand.

In this search you will have one main input:

*ligand_of_interest* - which can contain any specific ligand you are looking for

In addition, if you are interested in finding all files that include a ligand of interest (more likely to be of use later), you can utilize the alternate q3_1 query through adding and removing pound signs.

Once you determine what you want your input values to be, you can run the code block to gather your outputs. With these outputs, you can walk through the second half of the code and determine what you are hoping to see with your output and adjust this by either adding pound symbols before a line of code or removing them.

In [None]:
ligand_of_interest = "BEZ"
# variable you would change if you are looking for a specific ligand
# as far as i'm aware this can be anything but i have not pushed the boundaries of it

q3_1 = AttributeQuery(attribute = "rcsb_nonpolymer_entity_annotation.comp_id", operator = "exact_match", value = ligand_of_interest)
# searches for the ligand of interest
# if you are interested in finding all ligands that are of interest you can use slightly different syntax
# here is the alternate query for that:
##q3_1 = AttributeQuery(attribute = "rcsb_nonpolymer_entity_annotation.comp_id", operator = "exists")
# if you use that line, you need to comment out the query above - this can be done by adding a pound symbol

q3_2 = AttributeQuery(attribute = "rcsb_nonpolymer_entity_annotation.type", operator = "exact_match", value = "SUBJECT_OF_INVESTIGATION")
# specific search for where the ligand of interest was the subject of the paper or why it was put in the PDB

ligandofinterestquery = q3_1 & q3_2
# combines the previous query lines to ensure you can effectively search for the specific ligands

ligandofinterestresult = list(ligandofinterestquery())
# combines the results of the query into a list

##print(ligandofinterestresult[1:10])
# to use this statement, remove the two pound symbols
# the statement will take the results list and show the PDB structure titles of the first 10 results
# by changing the 10 to another number, you can see the first however many number of results you would like

##print(len(ligandofinterestresult))
# to use this statement, remove the two pound symbols
# this statement will show the number of results within the query parameters

###*note to victoria -- add more possible things to DO with the results*

## Ligand Binding Affinity

This search allows you to find a variety of molecules that have certain values associated with different binding affinity types. 

In this search you have a few input values:

*affinity_value* - which is any number you are looking for searching the affinity type with, and while there are limitations, you can try any number here or utilize the PDB website to understand what the exact limitations are

*affinity_type* - here you have the kind of binding affinity you are interested in

This has very specific possible inputs as outlined below:
1. "EC50" - the concentration of the compound that generates a half-maximal response in nM
2. "IC50" - the concentration of ligand that reduces enzymatic activity by 50% in nM
3. "Kd" - the dissociation constant in nM
4. "Ki" - enzyme inhibition constant in nM
5. "Ka" - the association constant in M^-1
6. "&Delta;G" - Gibbs free energy of binding (for association reaction) in Kj/mol
7. "&Delta;H" - change in enthalpy associated with a chemical reaction in Kj/mol
8. "-T&Delta;S" - change in entropy associated with a chemical reaction in Kj/mol

With each of these, they are particularly case sensitive so be sure that you use the exact capitalization and description as in the quotations.

This block will go further than the previous blocks. Once you determine you affinity type and value, you can then also determine what the metric is via equals statements, or greater, less, etc. This is outlined by changing the operator in the lines of code, and the comments contain the specific syntax to use here

Once you determine all of your parameters, you can run the code block to gather your outputs. With these outputs, you can walk through the second half of the code and determine what you are hoping to see with your output and adjust this by either adding pound symbols before a line of code or removing them.

In [None]:
affinity_value = 2
# value that you are searching for the binding affinity with
# this can functionally be changed to any number HOWEVER many of the types of affinity searches will have specific ranges of probable outcomes
# this can also be a negative value or include values with e in them

affinity_type = "EC50"
# determines the kind of binding affinity you are looking for
# this data comes from the other databases that are measuring binding affinity
### *note to victoria - CITE these databases and explain them

# Here are the following options for affinity type searches - to use them replace the current measurement in the line of code with one of the others

# "EC50" - the concentration of the compound that generates a half-maximal response
# "IC50" - the concentration of ligand that reduces enzymatic activity by 50%
# "Kd" - the dissociation constant
# "Ki" - enzyme inhibition constant
# these are in nM

# "Ka" - the association constant
# this in M^-1

# "&Delta;G" - Gibbs free energy of binding (for association reaction)
# "&Delta;H" - change in enthalpy associated with a chemical reaction
# "-T&Delta;S" - change in entropy associated with a chemical reaction
# these are in Kj/mol

q4_1 = AttributeQuery(attribute = "rcsb_binding_affinity.value", operator = "equals", value = affinity_value)
# searches for the identified affinity value
# to change this to show values that may not equal the listed value, change the operator to "greater"
# other operator options include "less", "less_or_equal", "greater_or_equal"

q4_2 = AttributeQuery(attribute = "rcsb_binding_affinity.type", operator = "exact_match", value = affinity_type)
# searches for the kind of affinity you are looking for

bindingaffinityquery = q4_1 & q4_2
# combines the affinity value and affinity type searches

bindingaffinityresult = list(bindingaffinityquery())
# combines the results of the query into a list

##print(bindingaffinityresult[1:10])
# to use this statement, remove the two pound symbols
# the statement will take the results list and show the PDB structure titles of the first 10 results
# by changing the 10 to another number, you can see the first however many number of results you would like

##print(len(bindingaffinityresult))
# to use this statement, remove the two pound symbols
# this statement will show the number of results within the query parameters

###*note to victoria -- add more possible things to DO with the results*

## Structure-Ligand Complexes

This search will look for complexes with a specific ligand or just broadly. With this search, there are a lot of potential paramters to adjust depending on whether you are looking for nucleic acid material or want to omit it.

The first line of this code will simply just ensure you are finding things with ligands, however if you want more than one, you can change the number under the value to increase the number of ligands.

In addition, you need to determine whether you want nucleic acids included in your search. Depending on whether you do or do not, you can comment in the additional code blocks.

With all of these, you can easily change the operator and value to fit your specific search needs.

Once you determine what you want your input values to be, you can run the code block to gather your outputs. To do this, you will need to adapt the line:

*structureligandcomplexquery = q2_1*

You can adapt this line by adding & symbols and the search variables that you want to add to the query, and then you can gather the outputs

With these outputs, you can walk through the second half of the code and determine what you are hoping to see with your output and adjust this by either adding pound symbols before a line of code or removing them.

In [None]:
q2_1 = AttributeQuery(attribute = "rcsb_entry_info.nonpolymer_entity_count", operator = "greater", value = 0)
# this query is going to search for complexes with ANY ligand
# if you want more than one ligand, change the value to whatever number you are interested in finding more ligands than
# since that is a broad query, it will get many results but is necessary with any statements to narrow the search down

# now that the baseline query is established, the next few query options can help narrow down what you are looking for

# to add these lines to your query, remove the two pound symbols

##q2_2 = AttributeQuery(attribute = "rcsb_entry_info.polymer_entity_count_RNA", operator = "equals", value = 0)
# eliminates RNA from the search output
# to change this structure to show RNA-ligand complexes, change the operator to "greater"
# other operator options include "less", "less_or_equal", "greater_or_equal"

##q2_3 = AttributeQuery(attribute = "rcsb_entry_info.polymer_entity_count_DNA", operator = "equals", value = 0)
# eliminates DNA from the search output
# to change this structure to show DNA-ligand complexes, change the operator to "greater"

##q2_4 = AttributeQuery(attribute = "rcsb_entry_info.polymer_entity_count_nucleic_acid_hybrid", operator = "equals", value = 0)
# eliminates nucleic-acid hybrids from the search output
# to change this structure to show nucleic-acid-ligand complexes, change the operator to "greater"

# these lines help cut out nucleic acid material that might have been within a protein structure
# as a note, you could increase the number from zero, but ensuer the operator matches what you are looking for

# additionally, these lines can utilize the operator "range" which requires the following conditions changed in the code:
# the big change will come in the value statement - which will need to utilize from and to statements
# shown below is an example of how to utilize this syntax:
###*note to victoria - this doesn't currently function*

#q2_range1 = AttributeQuery(attribute = "rcsb_entry_info.polymer_entity_count_RNA", operator = "greater_or_equal", value = 1)
#q2_range2 = AttributeQuery(attribute = "rcsb_entry_info.polymer_entity_count_RNA", operator = "less_or_equal", value = 4)

# furthermore, you might need lines to eliminate proteins and just examine kind of ligand
# so this is where statements such as the following two lines can help you out:
##q2_5 = AttributeQuery(attribute = "rcsb_entry_info.polymer_entity_count_protein", operator = "equals", value = 0)
# you can remove the two pound symbols to activate this line of code
# this line however will only work if the first line is commented out

# however! there are options to customize these lines
# in many of the PDB example queries, it combines these searches with experimental technique queries, refinement factors and, data methods to get very specific outputs

structureligandcomplexquery = q2_1 #& 
# this statement combines the previous queries to create the search of any/all characteristics
# this is going to be the results statement but its not finished! right now it only includes that first statement to get ligands
# once you determine what statements you want to include, add the "q" name to the statement with & signs

structureligandcomplexresult = list(structureligandcomplexquery())
# this statement combines the results of the query into a list

print(structureligandcomplexresult[1:10])
# to use this statement, remove the two pound symbols
# the statement will take the results list and show the PDB structure titles of the first 10 results
# by changing the 10 to another number, you can see the first however many number of results you would like

##print(len(structureligandcomplexresult))
# to use this statement, remove the two pound symbols
# this statement will show the number of results within the query parameters

###*note to victoria -- add more possible things to DO with the results*

## Next Steps and Combining Searches

Now that you have a baseline for the different kinds of search parameters you can do surrounding ligands, it's time to apply it! First and foremost, this can be done through stacking the different ligand search types.

*** EXAMPLE HERE ***

Furthermore, you can take these searches and add them onto other PDB advanced searches.

OUTPUTS

*** EXAMPLE HERE ***

+ information about what to do with the output stuff