# Wave test using GraphQL tool from langchain
An agent will be used for the integration of the graphql tool with the purpose of being able to make a query to the OpenTargetPlatform . An agent needs 3 basic componets which are:

- LLM model
- Tools
- Agent

Coding Test of Wave:
**Desafío de Programación-3: Consultas en Lenguaje Natural Contra una Base de Datos Estructural**

## Instrucciones


Repo de referencia: Onuralp Soylemez (@cx0): https://github.com/cx0/chatGPT-for-genetics

El objetivo de este desafío de programación es construir una función que tome una instrucción en lenguaje natural o una pregunta, y devuelva una respuesta apropiada utilizando los puntos finales de la API de Open Targets, específicamente la **Plataforma GraphQL de Open Targets.**

Puedes usar los scripts de Onuralp como punto de partida, o puedes escribirlo desde cero.

**Tareas:**

- Manejar consultas de un solo paso, por ejemplo: "¿Cuáles son los objetivos de vorinostat?", "Encuentra medicamentos que se usan para tratar la colitis ulcerosa." etc.

- Consultas de 2 pasos, por ejemplo: "¿Qué enfermedades están asociadas con los genes a los que se dirige fasudil?", "Muestra todas las enfermedades que tienen al menos 5 vías asociadas con Alzheimer"


**Expectativas:**
- Puedes construir la solución en un cuaderno Jupyter, pero preferimos verla como una funcionalidad de línea de comandos (CLI)
- La respuesta debe listar las entidades consultadas, sin párrafos o texto extra (es decir que te de solo la respuesta y ya)
- Probaremos la solución en un conjunto de instrucciones y preguntas reservadas (10 casos para cada tarea).
- Puedes necesitar una cuenta de OpenAI para el acceso a la API de OpenAI o un acceso similar a la API de LLM.


Library import

In [50]:
from langchain import OpenAI
from langchain.agents import load_tools, initialize_agent, AgentType, Tool
from langchain.utilities import GraphQLAPIWrapper
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

## 1) LLM integration
The LLM modelos will be integrated in a chain object in order to integrate a promtemplate too

In [51]:
# 1.1) Promt Template (in case we need to make a prompt engineering)
prompt = PromptTemplate(
input_variables=["query"],
template="{query}"
)

# 1.2) LLM Model , in this case a LLM modelo from OpenAI
llm = OpenAI(openai_api_key="YOURAPI",
                    model_name="gpt-3.5-turbo", temperature=0.85)

# 1.3) Creation of the chain object (integrates the llm and the prompt template)
llm_chain = LLMChain(llm=llm, prompt=prompt)

## 2) Set up of tools
Here will be set up the tools for the agent (an LLM models for general answers and the graphQL tool)

In [52]:
# 2.1) We set up the LLM as a tool in order to answer general questions
llm_tool = Tool(name='Language Model',
                func=llm_chain.run,
                description='use this tool for general purpose queries and logic')

# 2.2) We set up the graphql tool
graph_tool = load_tools( # IMPORTANT: usamos load_tools porque ya es una herramienta interna de Langchaing
    tool_names = ["graphql"],
    graphql_endpoint="https://api.platform.opentargets.org/api/v4/graphql",
    llm=llm)

# 2.3)  List of tools that the agent will take
tools = [llm_tool, graph_tool[0]]

## 3) Agent

In [53]:
agent = initialize_agent(
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, # Type of agent
    tools=tools, # herramienta que le doy
    llm=llm,
    verbose=True,
    max_iterations =6 )

# IMPORANT: The zero shot react agent has no memory, all the answers that it will give  are just for one question. It case you want to use a agent with memoory you have to use other type of agent such as Conversational React

# This agent uses the ReAct framework to determine which tool to use based solely on the tool's description. Any number of tools can be provided. This agent requires that a description is provided for each tool.

type(agent)

langchain.agents.agent.AgentExecutor

Look the current prompt of the agent

In [54]:
print(agent.agent.llm_chain.prompt.template)

Answer the following questions as best you can. You have access to the following tools:

Language Model: use this tool for general purpose queries and logic
query_graphql:     Input to this tool is a detailed and correct GraphQL query, output is a result from the API.
    If the query is not correct, an error message will be returned.
    If an error is returned with 'Bad request' in it, rewrite the query and try again.
    If an error is returned with 'Unauthorized' in it, do not try again, but tell the user to change their authentication.

    Example Input: query {{ allUsers {{ id, name, email }} }}    

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [Language Model, query_graphql]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Fin

### 3.1) Execute the agent

**PROMPT ENGENNERING**
_Un prefijo (prefix) podría ser una parte de una entrada o prompt que proporciona algún tipo de contexto o instrucción para el modelo, y un sufijo podría ser una parte adicional de la entrada que podría afectar la salida del modelo de alguna manera._

**SEGUNDO INTENTO**

In [58]:
prefix = "Context: The questions are related to get medical data, specifically data from OpenTargetPlatform" \
         "" \
         "If the question is about the relation among a target and diseases use the query TargetDiseases, " \
         "use the ensemblId of the target (which you can find get it from the organization ensembl) that in the variable target of the query." \
         "" \
         "On the othr hand, if the question is about the relation between a disease and targets use the query DiseasesTargets, use the efoId of the disases in the variable of disease in the query" \



graphql_fields = """
query TargetDiseases {
  target(ensemblId: "target") {
    id
    approvedSymbol
    associatedDiseases {
      count
      rows {
        disease {
          id
          name
        }
        datasourceScores {
          id
          score
        }
      }
    }
  }
}




query DiseasesTargets {
  disease(efoId: "disease") {
    id
    name
    associatedTargets {
      count
      rows {
        target {
          id
          approvedSymbol
        }
        score
      }
    }
  }
}


"""

# Ejemplo de preguntas
suffix = "What are the targets of vorinostat?"
#suffix = "I need to find what targets are related to Hereditary breast and ovarian cancer syndrome"
suffix= "What are the targets linked with the disease EFO_0000756?"

answer= agent.run(suffix + prefix+ graphql_fields)

answer



[1m> Entering new  chain...[0m
[32;1m[1;3mThe question is asking for targets linked with the disease EFO_0000756. We need to use the query DiseasesTargets and provide the EFO ID of the disease as input.
Action: query_graphql
Action Input: query DiseasesTargets { disease(efoId: "EFO_0000756") { id name associatedTargets { count rows { target { id approvedSymbol } score } } } }[0m
Observation: [33;1m[1;3m"{\n  \"disease\": {\n    \"id\": \"EFO_0000756\",\n    \"name\": \"melanoma\",\n    \"associatedTargets\": {\n      \"count\": 8059,\n      \"rows\": [\n        {\n          \"target\": {\n            \"id\": \"ENSG00000147889\",\n            \"approvedSymbol\": \"CDKN2A\"\n          },\n          \"score\": 0.9015566572669396\n        },\n        {\n          \"target\": {\n            \"id\": \"ENSG00000157764\",\n            \"approvedSymbol\": \"BRAF\"\n          },\n          \"score\": 0.8607786372700651\n        },\n        {\n          \"target\": {\n            \"id\":

'The targets linked with the disease EFO_0000756 (melanoma) are CDKN2A, BRAF, MAP2K1, CDK4, PTEN, POT1, NRAS, MAP2K2, KIT, BAP1, GNAQ, GNA11, NF1, RAC1, MITF, SF3B1, TP53, ARID2, PPP6C, ERBB4, STK11, KDR, CTNNB1, RAF1, and FLT4.'