## Problema

Dada una lista de compuestos y una lista de ensayos, obtener un csv con los valores de IC50 de esos compuestos en esos ensayos.

In [1]:
assay_urls = [  "https://pubchem.ncbi.nlm.nih.gov/bioassay/77966" ,
                "https://pubchem.ncbi.nlm.nih.gov/bioassay/77967" ,
                "https://pubchem.ncbi.nlm.nih.gov/bioassay/158929",
                "https://pubchem.ncbi.nlm.nih.gov/bioassay/341241",
                "https://pubchem.ncbi.nlm.nih.gov/bioassay/341242" ]

compound_urls= ["https://pubchem.ncbi.nlm.nih.gov/compound/984240",
                "https://pubchem.ncbi.nlm.nih.gov/compound/10644619",
                "https://pubchem.ncbi.nlm.nih.gov/compound/24860464",
                "https://pubchem.ncbi.nlm.nih.gov/compound/10694479",
                "https://pubchem.ncbi.nlm.nih.gov/compound/10788715",
                "https://pubchem.ncbi.nlm.nih.gov/compound/10732494",
                "https://pubchem.ncbi.nlm.nih.gov/compound/10642107",
                "https://pubchem.ncbi.nlm.nih.gov/compound/121596332" ]


# Extrayendo los identificadores de las URLs
assay_ids = [int(url.split("/")[-1]) for url in assay_urls]
compound_ids = [int(url.split("/")[-1]) for url in compound_urls]

# Creando el string de ids separadas por coma para la URL de la consulta
assay_ids_st = ','.join([str(id) for id in assay_ids])
compound_ids_st = ','.join([str(id) for id in compound_ids])

print("Assays: ", assay_ids_st)
print("Compounds: ", compound_ids_st)


Assays:  77966,77967,158929,341241,341242
Compounds:  984240,10644619,24860464,10694479,10788715,10732494,10642107,121596332


## Solución 1

Obtener los datos de todos los compuestos de esos ensayos usando la opción  __concise__ , luego filtrar los compuestos especificados.

In [3]:
import requests
import pandas as pd

requests_url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/aid/{assay_ids_st}/concise/CSV"
 
response = requests.get(requests_url)

with open('output/data1.csv', 'wb') as f:
    f.write(response.content)

df = pd.read_csv('output/data1.csv')

# Dejando solo los campos de interés
df = df[['AID','CID','Activity Value [uM]','Activity Name']]

# Filtrando los compuestos de interés
df = df[df['CID'].isin(compound_ids)]

df.to_csv('output/data1.csv')

print(df)

        AID       CID  Activity Value [uM] Activity Name
1     77966  10788715                 3.86        KB app
3     77966  10732494                 1.98        KB app
4     77966  10642107                 1.13        KB app
8     77966  10644619                22.62        KB app
67   158929  10788715                 0.61          IC50
70   158929  10732494                 1.61          IC50
79   158929  10642107                 1.36          IC50
104  158929  10644619                 0.94          IC50


## Solución 2

Dado que un ensayo puede incluir miles de compuestos, y solo nos interesa verificar un conjunto pequeño y específico de ellos, la solución anterior es ineficiente. La alternativa es iterar por cada ensayo recuperando información solo de los compuestos de interés.

In [4]:
df = pd.DataFrame(columns=['PUBCHEM_CID','PubChem Standard Value','Standard Type'])

for assay in assay_ids:
    requests_url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/aid/{str(assay)}/CSV?cid={compound_ids_st}"

    response = requests.get(requests_url)

    if response.status_code == 200:

        with open('output/data2.csv', 'wb') as f:
            f.write(response.content)

        aux_df = pd.read_csv('output/data2.csv')

        # Filtrando las filas correspondientes a compuestos
        aux_df = aux_df[aux_df['PUBCHEM_CID'].notna()]

        # Dejando solo los campos de interés
        aux_df = aux_df[['PUBCHEM_CID','PubChem Standard Value','Standard Type']]

        aux_df['AID'] = assay

        df = pd.concat([df, aux_df], ignore_index=True)

df.to_csv('output/data2.csv')

print(df)

  df = pd.concat([df, aux_df], ignore_index=True)


   PUBCHEM_CID PubChem Standard Value Standard Type       AID
0   10788715.0                   3.86        KB app   77966.0
1   10732494.0                   1.98        KB app   77966.0
2   10642107.0                   1.13        KB app   77966.0
3   10644619.0                  22.62        KB app   77966.0
4   10788715.0                   0.61          IC50  158929.0
5   10732494.0                   1.61          IC50  158929.0
6   10642107.0                   1.36          IC50  158929.0
7   10644619.0                   0.94          IC50  158929.0
