### Scratch code to learn how to perform a query on Blast using HTTP and request module

#### The following links are relevant for the code:
- #### https://www.youtube.com/watch?v=0pRvEDS6Afw
- #### https://ncbi.github.io/blast-cloud/dev/api.html
- #### https://www.ncbi.nlm.nih.gov/home/develop/https-guidance/
- #### https://it.python-requests.org/it/latest/user/quickstart.html
- #### https://docs.python-requests.org/en/master/user/quickstart/#json-response-content

#### NB! Another possibility would be to use the Biopytjon package instead that request:
http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec123

https://biopython.org/docs/dev/api/Bio.Blast.Applications.html

https://www.google.com/search?q=biopython+to+blast+a+sequence&rlz=1C1CHBF_itIT986IT986&sxsrf=AOaemvLzEwA-S0thjc08kkD3lqnwgGlbHQ%3A1642331757555&ei=bf7jYcS7IfeI9u8P6qKPiA4&ved=0ahUKEwiEpozIkrb1AhV3hP0HHWrRA-EQ4dUDCA4&uact=5&oq=biopython+to+blast+a+sequence&gs_lcp=Cgdnd3Mtd2l6EAMyCAghEBYQHRAeOgYIABAWEB46BQghEKABOgQIIRAKSgQIQRgASgQIRhgAUABYsiNgzCVoAXACeACAAeoBiAGqGpIBBTAuOC45mAEAoAEBwAEB&sclient=gws-wiz#kpvalbx=_f_7jYb3FHK2O9u8PgJia8AY16

https://www.biostars.org/p/259099/


## 1. Importing modules and defining the queries (save in a file named constants.py)

In [4]:
import os
import re
import requests
import time

URL = "https://blast.ncbi.nlm.nih.gov/Blast.cgi?"     # url endpoint

# Requests

PUT_Request = "CMD=put&"
GET_Request = "CMD=get&"

# PUT Parameters
Query = "QUERY=P22303&"
Program = "PROGRAM=blastp&"
Database = "DATABASE=pdb"

# GET Parameters
RID = "RID = "

# Put query
PUT_query = URL + PUT_Request + Query + Program + Database

GET_query_head = URL+ GET_Request + "RID="
url_request_head = URL+'CMD=Get&FORMAT_OBJECT=SearchInfo&RID='

## 2. Defining the functions (save in a file named API_functions.py)

In [11]:
def extract_attribute(put_response, rid_attr):
    """ will go line by line through a file, searching for mentions of the rid_attribute and extract/return the value 
    associated with it """
    for line in put_response.splitlines():
        if rid_attr in line:
            attribute_value = re.sub(rid_attr, "", line)
            attribute_value="".join(attribute_value.split())
            return attribute_value
        
def check_request_status(GET_query, url_submit):
    """ Check the status ID and when ready download the results as html file"""
    # Submit the request of the status:
    submit_request = requests.put(url_submit)
    query_status = extract_attribute(submit_request.text, "Status=")
    
    if query_status=='WAITING':
        time.sleep(30)                               # need to wait as required by the policy of Blast API 
        check_request_status(GET_query, url_submit)  # submit another request of status after 30 sec
        
    else:        
        print("Done!")     # The status is READY
        
        # Save the Submit result
        
        with open("results_trial.html", "w") as file_handle:
            s = requests.get(GET_query)
            file_handle.write(s.text)
            
        #query_hits = extract_attribute(submit_request.text, "ThereAreHits=")
        
        return #query_status , query_hits

## 3. Run the code

In [8]:
# 1. HTTP query 
p = requests.put(PUT_query)


In [12]:
# 2. It is necessary to get the RID (= Request ID, allows you to retrieve your search results and format them in many different 
#ways over the next 24 hours.)

rid = extract_attribute(p.text, RID)

# 3. Check the status of the Blast using the RID and when it's ready save the results as html

GET_query = GET_query_head + rid 
url_submit = url_request_head + rid 

check_request_status(GET_query, url_submit)

Done!


# TO DO: submit a query specific for our task