## Generating Lucene Query Language queries from free text input

This notebook takes text input, generates LQL queries and queries RSpace ELN API
to retrieve documents.

To run this you'll need:

* An OpenAI API key
* An account on https://community.researchspace.com, and an API key. (It's free to set up)
* Python RSpace client - `pip install rspace_client`

### Setup

Here we import everything we need and  check RSpace API:

In [1]:
import os
from rspace_client.eln import eln
import json
import openai
import requests
from tenacity import retry, wait_random_exponential, stop_after_attempt
from open_ai_functions import pretty_print_conversation,chat_completion_request

eln_cli = eln.ELNClient(os.getenv("RSPACE_URL"), os.getenv("RSPACE_API_KEY"))
print(eln_cli.get_status())
GPT_MODEL = "gpt-4"

{'message': 'OK', 'rspaceVersion': '1.93.0'}


### Boilerplate

- sending request to OpenAI API
- pretty-printing results

Standard method to send messages to OpenAI's Chat completion API

In [2]:
def do_conversation(messages, functions):

    resp = chat_completion_request(messages, functions, {'name':'lucene'})
    active_messages = messages.copy()
    response_message = resp.json()['choices'][0]['message']
    active_messages.append(response_message)

    if response_message['function_call'] is not None:
        f_name = response_message['function_call']['name']
        f_args = json.loads(response_message['function_call']['arguments'])
        rspace_search_result = available_functions[f_name](**f_args)
    return (active_messages, rspace_search_result)

### Function definitions

The function we want to call, and its description in JSON Schema

In [3]:
## This is the function that will be invoked with arguments generated by AI.
## It will make calls to RSpace's search API.
def search_rspace_eln(luceneQuery, sort_order="lastModified desc"):
    q = "l: " + luceneQuery
    docs = eln_cli.get_documents(query=q, order_by=sort_order)['documents']
    wanted_keys = ['globalId','name', 'tags', 'created'] # The keys we want
    summarised = list(map(lambda d: dict((k, d[k]) for k in wanted_keys if k in d), docs))
    return summarised

In [4]:
available_functions = {
 "lucene":search_rspace_eln
}

functions = [
  {
    "name": "lucene",
    "description": """
    A valid Lucene Query Language string generated from user input.
    Document fields are name, docTag, fields.fieldData, and username.
    Don't use wildcards.
    """,
    "parameters": {
        "type":"object",
        "properties": {
            "luceneQuery": {
                "type":"string",
                "description":"Valid Lucene Query Language as plain text"
            },
            "sort_order": {
                "type":"string",
                "description":"How results should be sorted",
                "enum":["name asc", "name desc", "created asc", "created desc"]
            },
            
        }
    }
  }
]

### Executing the conversation


In [5]:
messages = [
 {
     "role" : "system",
     "content": "Generate function arguments from user input. Don't show reasoning."
 },
 {
     "role" : "user",
     "content": """
         I want to search for documents that are tagged with PCR but not ECL, 
         containing the phrase “DNA replication” but not "RNA"
         List results in reverse alphabetical order
         """
 } 
]


In [6]:
(conversation, results) = do_conversation(messages,functions)
pretty_print_conversation(conversation)
print("Search results from RSpace\n--------------------------")
print(json.dumps(results, indent=2))

RetryError: RetryError[<Future at 0x1071c8130 state=finished raised TypeError>]