# Drug Search

## Purpose

The RCSB PDB is the repository for all publicly available experimentally determined protein structures in the world. This notebook is made in order to demonstrate and elaborate on how to use the rcsbsearchapi Python library to recreate Advanced Searches from the RCSB PDB in Python. Further, this notebook will show how to download the files of the results of these searches, also in Python. This will be done through searching particularly for different drugs represented in the RCSB PDB, with each search demonstrating a different way of utilizing the Advanced Search tool.

## Steps Taken

The following is a step-by-step explanation of what will be performed for each code example.

### 1) Creating the Search

An explanation of what the search is describing will be followed by the creation of a RCSB PDB Search.

### 2a) Validation By List

The search is tested for functionality by analyzing the first 10 results of the search in a list.

### 2b) Validation By File Request

The search is tested for functionality by requesting the file of the first search result. This is the step where the results will be changed if needed for the sake of requesting and downloading their corresponding file.

### 2c) Validation By File Download

The search is tested for functionality by downloading and reading the contents of the first search result's file. This includes the generation of a folder for the files of the search result.

### 3) Complete Search Download

Following validation, each file in the search result is downloaded into the previously generated file.

## Importing Libraries

A list of libraries that will need to be installed and imported to complete the tasks in the notebook.

| Library | Contents | Source |
| :-----: | :------- | :----- |
| rcsbsearchapi | library for automated searching of the [RCSB Protein Data Bank](https://www.rcsb.org)| [py-rcsbsearchapi on GitHub](https://github.com/rcsb/py-rcsbsearchapi) |
| requests | library for sending HTTP requests | [requests Documentation](https://requests.readthedocs.io/en/latest/) |
| os | standard library for creating directories | [os Documentation](https://docs.python.org/3/library/os.html) |

## Installation

These libraries will need to be installed in your computing environment to perform the tasks in this notebook.

To install from the command line on your computer, use this command (with the `requests` library as the example):

`pip install requests`

To install from within a Jupyter notebook or CoLab notebook, you need to type the same command in a coding cell, preceded by an exclamation point.

`!pip install requests`

These libraries will be imported as they are needed over the course of this notebook.


In [1]:
# Import the components of rcsbsearchapi needed for this search
from rcsbsearchapi import rcsb_attributes as attrs
# For Operator notation

from rcsbsearchapi.const import CHEMICAL_ATTRIBUTE_SEARCH_SERVICE, STRUCTURE_ATTRIBUTE_SEARCH_SERVICE
from rcsbsearchapi.search import AttributeQuery, Attr
# For Fluent notation

import requests  # to enable us to pull files from the PDB
import os        # to enable us to create a directory to store the files

## US Market Approved Drug Search

### 1)

The following code is a recreation of the search example on the RCSB Protein Data Bank shown [here](https://www.rcsb.org/search?request=%7B%22query%22%3A%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22and%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22and%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text_chem%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22drugbank_info.drug_products.approved%22%2C%22operator%22%3A%22exact_match%22%2C%22negation%22%3Afalse%2C%22value%22%3A%22Y%22%7D%7D%2C%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text_chem%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22drugbank_info.drug_products.country%22%2C%22operator%22%3A%22exact_match%22%2C%22negation%22%3Afalse%2C%22value%22%3A%22US%22%7D%7D%5D%2C%22logical_operator%22%3A%22and%22%7D%5D%2C%22label%22%3A%22text_chem%22%7D%5D%7D%2C%22return_type%22%3A%22mol_definition%22%2C%22request_options%22%3A%7B%22paginate%22%3A%7B%22start%22%3A0%2C%22rows%22%3A25%7D%2C%22results_content_type%22%3A%5B%22experimental%22%5D%2C%22sort%22%3A%5B%7B%22sort_by%22%3A%22score%22%2C%22direction%22%3A%22desc%22%7D%5D%2C%22scoring_strategy%22%3A%22combined%22%7D%2C%22request_info%22%3A%7B%22query_id%22%3A%226f325351daf2bdb517d029b864a98556%22%7D%7D) which is a search of drugs that were approved for use on the US market at any point in history. This search can be divided into 2 attributes:

Approved for use in the market   
United States

The following code creates these two attributes, combines them into one 'query', then places the result in a list.

In [8]:
market_approved = "Y" #this variable indicates certified approval at a national level
country = "US" #this variable indicates which country is being referred to

q1 = AttributeQuery(attribute="drugbank_info.drug_products.approved", operator="exact_match", value=market_approved, service=CHEMICAL_ATTRIBUTE_SEARCH_SERVICE)
#Attribute for market approval
q2 = AttributeQuery(attribute="drugbank_info.drug_products.country", operator="exact_match", value=country, service=CHEMICAL_ATTRIBUTE_SEARCH_SERVICE)
#Attribute for country
query = q1 & q2 #Combining attributes into a single query
result_drugs = list(query("mol_definition"))

### 2a)

We can check to make sure the list has been successfully created by printing the first 10 elements of the list. These 10 elements should be the same first ten elements seen on the RCSB RDB search.

In [9]:
print(f"The following drugs are among the {len(result_drugs)} drugs approved for the market in the {country}:", result_drugs[0:10])

The following drugs are among the 724 drugs approved for the market in the US: ['010', '017', '032', '05X', '07J', '08D', '08H', '08J', '08Y', '09L']


### 2b)

Now, we can begin downloading the files from the list we made. First, download this element in the list and then check to see if it was downloaded successfully. Then, open the file to see if its contents and in line with what is expected from the download.

In [12]:
#Downloading a file from our list:

test_validation = requests.get(f'https://files.rcsb.org/ligands/download/{result_drugs[0]}.cif')

In [13]:
# check to see that the file downloaded properly. A status code of 200 means everything is okay.

test_validation.status_code        # Status code check

200

### 2c)

To further check, we can create a file and then read the contents of the file. Creating the file includes the creation of a directory (folder) in order to store the folder, which will be called ligands.

In [None]:
# To really be sure, let's look at the file one line at a time. First we write the downloaded content to a file.

# make a ligands folder for our results. If this ligands folder already exists, then it doesn't create a new one
os.makedirs("ligands", exist_ok=True)

with open(f"ligands/{result_drugs[0]}.c", "w+") as file:
    file.write(test_validation.text)

In [None]:
file1 = open(f'ligands/{result_drugs[0]}_ideal.mol2', 'r')
file_text = file1.read() # This reads in the file as a string.

print(file_text)

### 3)

Once you've confirmed that the file download occurred correctly, we can finish by downloading all of the files from the list we made previously into the folder we generated. The following block of code will perform this.

In [None]:
baseUrl = "https://files.rcsb.org/ligands/download/"

for ChemID in result_drugs:
    cFile = f"{ChemID}.mol2"
    cFileUrl = baseUrl + cFile
    cFileLocal = "ligands/USMarketDrugSearch/" + cFile
    response = requests.get(cFileUrl)
    with open(cFileLocal, "w+") as file:
        file.write(response.text)

## Current US Market

### 1)

The following code is a recreation of the search example on the RCSB Protein Data Bank shown [here](https://www.rcsb.org/search?request=%7B%22query%22%3A%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22and%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22and%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text_chem%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22drugbank_info.drug_products.approved%22%2C%22operator%22%3A%22exact_match%22%2C%22negation%22%3Afalse%2C%22value%22%3A%22Y%22%7D%7D%2C%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text_chem%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22drugbank_info.drug_products.country%22%2C%22operator%22%3A%22exact_match%22%2C%22negation%22%3Afalse%2C%22value%22%3A%22US%22%7D%7D%5D%2C%22logical_operator%22%3A%22and%22%7D%5D%2C%22label%22%3A%22text_chem%22%7D%5D%7D%2C%22return_type%22%3A%22mol_definition%22%2C%22request_options%22%3A%7B%22paginate%22%3A%7B%22start%22%3A0%2C%22rows%22%3A25%7D%2C%22results_content_type%22%3A%5B%22experimental%22%5D%2C%22sort%22%3A%5B%7B%22sort_by%22%3A%22score%22%2C%22direction%22%3A%22desc%22%7D%5D%2C%22scoring_strategy%22%3A%22combined%22%7D%2C%22request_info%22%3A%7B%22query_id%22%3A%226f325351daf2bdb517d029b864a98556%22%7D%7D) which is a search of drugs that are currently available in the US market. This search can be divided into 3 attributes:

Approved for use in the market   
United States  
Currently Avaiable

The following code creates these two attributes, combines them into one 'query', then places the result in a list.

In [None]:
market_approved = "Y" #this variable indicates certified approval at a national level
country = "US" #this variable indicates which country is being referred to

q1 = AttributeQuery(attribute="drugbank_info.drug_products.approved", operator="exact_match", value=market_approved, service=CHEMICAL_ATTRIBUTE_SEARCH_SERVICE)
#Attribute for market approval
q2 = AttributeQuery(attribute="drugbank_info.drug_products.country", operator="exact_match", value=country, service=CHEMICAL_ATTRIBUTE_SEARCH_SERVICE)
#Attribute for country
q3 = AttributeQuery(attribute="drugbank_info.drug_products.ended_marketing_on", operator="exists", service=CHEMICAL_ATTRIBUTE_SEARCH_SERVICE, negation=True)
#Attribute for currently in the market

query = q1 & q2 & q3 #Combining attributes into a single query
result_drugs = list(query("mol_definition"))

### 2a)

We can check to make sure the list has been successfully created by printing the first 10 elements of the list. These 10 elements should be the same first ten elements seen on the RCSB RDB search.

In [None]:
print(f"The following drugs are among the {len(result_drugs)} drugs avaiable in the {country} market:", result_drugs[0:10])

### 2b)

Now, we can begin downloading the files from the list we made. First, download this element in the list and then check to see if it was downloaded successfully. Then, open the file to see if its contents and in line with what is expected from the download.

In [None]:
test_validation = requests.get(f'https://files.rcsb.org/ligands/download/{result_drugs[0]}_ideal.mol2')

In [None]:
test_validation.status_code

### 2c)

To further check, we can create a file and then read the contents of the file. Creating the file includes the creation of a directory (folder) in order to store the folder, which will be called ligands.

In [None]:
os.makedirs("ligands/Current_US_Market", exist_ok=True) #wrote in a new file using this same command

with open(f"ligands/Current_US_Market/{result_drugs[0]}_ideal.mol2", "w+") as file:
    file.write(test_validation.text)    

In [None]:
file = open(f'ligands/Current_US_Market/{result_drugs[0]}_ideal.mol2', 'r')
file_text = file.read() # This reads in the file as a string.

print(file_text)

### 3)

Once you've confirmed that the file download occurred correctly, we can finish by downloading all of the files from the list we made previously into the folder we generated. The following block of code will perform this.

In [None]:
baseUrl = "https://files.rcsb.org/ligands/download/"

for ChemID in result_drugs:
    cFile = f"{ChemID}.mol2"
    cFileUrl = baseUrl + cFile
    cFileLocal = "ligands/Current_US_Market/" + cFile #Do we need to make a local file, or is it expected they will fill in the code? 
    response = requests.get(cFileUrl)
    with open(cFileLocal, "w+") as file:
        file.write(response.text)

## Recalled Drugs

### 1)

The following code is a recreation of the search example on the RCSB Protein Data Bank shown [here](https://www.rcsb.org/search?request=%7B%22query%22%3A%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22and%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22and%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text_chem%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22drugbank_info.drug_products.approved%22%2C%22operator%22%3A%22exact_match%22%2C%22negation%22%3Afalse%2C%22value%22%3A%22Y%22%7D%7D%2C%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text_chem%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22drugbank_info.drug_products.country%22%2C%22operator%22%3A%22exact_match%22%2C%22negation%22%3Afalse%2C%22value%22%3A%22US%22%7D%7D%5D%2C%22logical_operator%22%3A%22and%22%7D%5D%2C%22label%22%3A%22text_chem%22%7D%5D%7D%2C%22return_type%22%3A%22mol_definition%22%2C%22request_options%22%3A%7B%22paginate%22%3A%7B%22start%22%3A0%2C%22rows%22%3A25%7D%2C%22results_content_type%22%3A%5B%22experimental%22%5D%2C%22sort%22%3A%5B%7B%22sort_by%22%3A%22score%22%2C%22direction%22%3A%22desc%22%7D%5D%2C%22scoring_strategy%22%3A%22combined%22%7D%2C%22request_info%22%3A%7B%22query_id%22%3A%226f325351daf2bdb517d029b864a98556%22%7D%7D) which is a search of drugs that were withdrawn from use following their approval. This search can be divided into 1 attribute:

Withdrawn

The following code creates these two attributes, combines them into one 'query', then places the result in a list.

In [None]:
q1Attribute = "withdrawn"

q1 = AttributeQuery(attribute="drugbank_info.drug_groups", operator="exact_match", value="withdrawn", service=CHEMICAL_ATTRIBUTE_SEARCH_SERVICE)
#Attribute for withdrawn from use
result_drugs = list(q1("mol_definition"))

### 2a)

We can check to make sure the list has been successfully created by printing the first 10 elements of the list. These 10 elements should be the same first ten elements seen on the RCSB RDB search.

In [None]:
print(f"The following drugs are among the {len(result_drugs)} drugs withdrawn due to later discovered harmful side-effects:", result_drugs[0:10])

### 2b)

Now, we can begin downloading the files from the list we made. First, download this element in the list and then check to see if it was downloaded successfully. Then, open the file to see if its contents and in line with what is expected from the download.

In [None]:
test_validation = requests.get(f'https://files.rcsb.org/ligands/download/{result_drugs[0]}_ideal.mol2')

In [None]:
test_validation.status_code

### 2c)

To further check, we can create a file and then read the contents of the file. Creating the file includes the creation of a directory (folder) in order to store the folder, which will be called ligands.

In [None]:
os.makedirs("ligands/Recalled_Drugs", exist_ok=True)

with open(f"ligands/Recalled_Drugs/{result_drugs[0]}_ideal.mol2", "w+") as file:
    file.write(test_validation.text)

In [None]:
file1 = open(f'ligands/Recalled_Drugs/{result_drugs[0]}_ideal.mol2', 'r')
file_text = file1.read()

print(file_text)

### 3)

Once you've confirmed that the file download occurred correctly, we can finish by downloading all of the files from the list we made previously into the folder we generated. The following block of code will perform this.

In [None]:
baseUrl = "https://files.rcsb.org/ligands/download/"

for ChemID in result_drugs:
    cFile = f"{ChemID}.mol2"
    cFileUrl = baseUrl + cFile
    cFileLocal = "ligands/Recalled_Drugs/" + cFile #Do we need to make a local file, or is it expected they will fill in the code? 
    response = requests.get(cFileUrl)
    with open(cFileLocal, "w+") as file:
        file.write(response.text)

## STI Bound

### 1)

The following code is a recreation of the search example on the RCSB Protein Data Bank shown [here](https://www.rcsb.org/search?request=%7B%22query%22%3A%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22and%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22and%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text_chem%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22drugbank_info.drug_products.approved%22%2C%22operator%22%3A%22exact_match%22%2C%22negation%22%3Afalse%2C%22value%22%3A%22Y%22%7D%7D%2C%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text_chem%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22drugbank_info.drug_products.country%22%2C%22operator%22%3A%22exact_match%22%2C%22negation%22%3Afalse%2C%22value%22%3A%22US%22%7D%7D%5D%2C%22logical_operator%22%3A%22and%22%7D%5D%2C%22label%22%3A%22text_chem%22%7D%5D%7D%2C%22return_type%22%3A%22mol_definition%22%2C%22request_options%22%3A%7B%22paginate%22%3A%7B%22start%22%3A0%2C%22rows%22%3A25%7D%2C%22results_content_type%22%3A%5B%22experimental%22%5D%2C%22sort%22%3A%5B%7B%22sort_by%22%3A%22score%22%2C%22direction%22%3A%22desc%22%7D%5D%2C%22scoring_strategy%22%3A%22combined%22%7D%2C%22request_info%22%3A%7B%22query_id%22%3A%226f325351daf2bdb517d029b864a98556%22%7D%7D) which is a search for structures that have the small molecule drug Gleevec or STI bounded to it. This search can be divided into 1 attribute:

STI Bound

The following code creates these two attributes, combines them into one 'query', then places the result in a list.

In [None]:
q1Attribute = "sti"
 
#q1 = AttributeQuery(attribute="rcsb.chem_comp_container_identifiers", operator="in", value=q1Attribute)
q1 = attrs.rcsb_chem_comp_container_identifiers.comp_id == q1Attribute
#Attribute for containing STI
result_drugs = list(q1())

### 2a)

We can check to make sure the list has been successfully created by printing the first 10 elements of the list. These 10 elements should be the same first ten elements seen on the RCSB RDB search.

In [None]:
print(f"The following structures are among the {len(result_drugs)} structures with Gleevec (aka STI) bound to it. {result_drugs[0:10]}")

### 2b)

Now, we can begin downloading the files from the list we made. First, download this element in the list and then check to see if it was downloaded successfully. Then, open the file to see if its contents and in line with what is expected from the download.

In [None]:
test_validation = requests.get(f'https://files.rcsb.org/download/{result_drugs[0]}.cif')

In [None]:
test_validation.status_code

### 2c)

To further check, we can create a file and then read the contents of the file. Creating the file includes the creation of a directory (folder) in order to store the folder, which will be called ligands.

In [None]:
os.makedirs("ligands/Structures_STI", exist_ok=True) 

with open(f"ligands/Structures_STI/{result_drugs[0]}.cif", 'w+') as file:
    file.write(test_validation.text)

In [None]:
file1 = open(f"ligands/Structures_STI/{result_drugs[0]}.cif", 'r')
file_text = file1.read() 

print(file_text)

### 3)

Once you've confirmed that the file download occurred correctly, we can finish by downloading all of the files from the list we made previously into the folder we generated. The following block of code will perform this.

In [None]:
baseUrl = "https://files.rcsb.org/download/"

for ChemID in result_drugs:
    cFile = f"{ChemID}.cif"
    cFileUrl = baseUrl + cFile
    cFileLocal = "ligands/Structures_STI/" + cFile #Do we need to make a local file, or is it expected they will fill in the code? 
    response = requests.get(cFileUrl)
    with open(cFileLocal, "w+") as file:
        file.write(response.text)

## Approved Ligands of Interest

### 1)

The following code is a recreation of the search example on the RCSB Protein Data Bank shown [here](https://www.rcsb.org/search?request=%7B%22query%22%3A%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22and%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22and%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text_chem%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22drugbank_info.drug_products.approved%22%2C%22operator%22%3A%22exact_match%22%2C%22negation%22%3Afalse%2C%22value%22%3A%22Y%22%7D%7D%2C%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text_chem%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22drugbank_info.drug_products.country%22%2C%22operator%22%3A%22exact_match%22%2C%22negation%22%3Afalse%2C%22value%22%3A%22US%22%7D%7D%5D%2C%22logical_operator%22%3A%22and%22%7D%5D%2C%22label%22%3A%22text_chem%22%7D%5D%7D%2C%22return_type%22%3A%22mol_definition%22%2C%22request_options%22%3A%7B%22paginate%22%3A%7B%22start%22%3A0%2C%22rows%22%3A25%7D%2C%22results_content_type%22%3A%5B%22experimental%22%5D%2C%22sort%22%3A%5B%7B%22sort_by%22%3A%22score%22%2C%22direction%22%3A%22desc%22%7D%5D%2C%22scoring_strategy%22%3A%22combined%22%7D%2C%22request_info%22%3A%7B%22query_id%22%3A%226f325351daf2bdb517d029b864a98556%22%7D%7D) which is a search for drugs currently used in the US market that have been declared Ligands of Interest (LoI). This search can be divided into 5 attributes:

Approved for use in the market   
United States
Currently Available  
Has Nonpolymer Entity Annotation   
Nonpolymer Entity Annontation is Ligand of Interest

The following code creates these two attributes, combines them into one 'query', then places the result in a list.

In [None]:
market_approved = "Y" #this variable indicates certified approval at a national level
country = "US" #this variable indicates which country is being referred to

q1 = AttributeQuery(attribute="drugbank_info.drug_products.approved", operator="exact_match", value=market_approved, service="text_chem")
#Attribute for market approval
q2 = AttributeQuery(attribute="drugbank_info.drug_products.country", operator="exact_match", value=country, service="text_chem")
#Attribute for country
q3 = AttributeQuery(attribute="drugbank_info.drug_products.ended_marketing_on", operator="exists", service="text_chem", negation=True)
#Attribute for currently in the market
q4 = AttributeQuery(attribute="rcsb_nonpolymer_entity_annotation.comp_id", operator="exists", service=STRUCTURE_ATTRIBUTE_SEARCH_SERVICE)
#Attribute for having a Nonpolymer Entity Annotation
q5 = AttributeQuery(attribute="rcsb_nonpolymer_entity_annotation.type", operator="exact_match", value="SUBJECT_OF_INVESTIGATION", service=STRUCTURE_ATTRIBUTE_SEARCH_SERVICE)
#Attribute for Nonpolymer Entity Annotation being Ligand of Interest


chemical_query = q1 & q2 & q3 #Combining attributes into a single query

structure_query = q4 & q5

combined_query = q1 & q2 & q3 & q4 & q5
result_entities = list(combined_query("non_polymer_entity"))

### 2a)

We can check to make sure the list has been successfully created by printing the first 10 elements of the list. These 10 elements should be the same first ten elements seen on the RCSB RDB search.

In [None]:
print(f"The following entities are among the {len(result_entities)} entities available in {country} that are labeled as Ligands of Interest: {result_entities[0:10]}")
print(result_entities[0])

### 2b)

Now, we can begin downloading the files from the list we made. The results of this search need to first be slightly changed in order to verify and later download their files. THen, download this element in the list and then check to see if it was downloaded successfully. Then, open the file to see if its contents and in line with what is expected from the download.

In [None]:
i = 0

while i < len(result_entities):
    
    if i != len(result_entities) - 1:
        if result_entities[i][0:4] is result_entities[i+1][0:4]:
            result_entities.remove(result_entities[i+1])
            i = i + 1
        else:
            result_entities[i] = result_entities[i][0:4]
            i = i + 1
            #investigate alternate methods of downloading in order to see if this truncation is needed/needs to be changed
    else:
        result_entities[i] = result_entities[i][0:4]
        i = i + 1
            
test_validation = requests.get(f'https://files.rcsb.org/download/{result_entities[0]}.cif')

In [None]:
test_validation.status_code

### 2c)

To further check, we can create a file and then read the contents of the file. Creating the file includes the creation of a directory (folder) in order to store the folder, which will be called ligands.

In [None]:
os.makedirs("ligands/Current_US_Market_LOI", exist_ok=True) 

with open(f"ligands/Current_US_Market_LOI/{result_entities}.cif", 'w+') as file:
    file.write(test_validation.text)

In [None]:
file1 = open(f"ligands/Structures_STI/{result_entities}.cif", 'r')
file_text = file1.read() 

print(file_text)

### 3)

Once you've confirmed that the file download occurred correctly, we can finish by downloading all of the files from the list we made previously into the folder we generated. The following block of code will perform this.

In [None]:
baseUrl = "https://files.rcsb.org/download/"

for ChemID in result_drugs:
    cFile = f"{ChemID}.cif"
    cFileUrl = baseUrl + cFile
    cFileLocal = "ligands/Current_US_Market_LOI/" + cFile #Do we need to make a local file, or is it expected they will fill in the code? 
    response = requests.get(cFileUrl)
    with open(cFileLocal, "w+") as file:
        file.write(response.text)