# Entity Extraction with LLM Function Calling

Databricks Doc: https://docs.databricks.com/en/machine-learning/model-serving/function-calling.html

OpenAI doc: https://platform.openai.com/docs/guides/function-calling

In [0]:
%pip install mlflow openai --quiet

dbutils.library.restartPython()

In [0]:
%run ../rag_app_sample_code/A_POC_app/pdf_uc_volume/00_config

## Preparation

In [0]:
# get Databricks credentials
import openai
from mlflow.utils.databricks_utils import get_databricks_host_creds

creds = get_databricks_host_creds()

In [0]:
# use openai's client to make calls to Databricks Foundation Model API
client = openai.OpenAI(
    api_key=creds.token,
    base_url=creds.host + '/serving-endpoints',
)

model = "databricks-meta-llama-3-1-70b-instruct"

In [0]:
# # I used a LLM to create a fake insurance claim data as an example
# import os

# file_path = "/Workspace/Users/david.huang@databricks.com/client-demos/function-calling/synth-gen-sample.txt"

# with open(file_path, 'r') as file:
#     sample = file.read()

# print(sample)

In [0]:
parsed_docs = spark.table(destination_tables_config["parsed_docs_table_name"])

In [0]:
display(parsed_docs)

In [0]:
import pyspark.sql.functions as F
_path = "dbfs:/Volumes/felixflory/ey_dbs_workshop_2024_10/raw_data/project_churches/Project Churches - FINAL Red Flag Report 181121.pdf"
F.lit(_path)

In [0]:
sample = (
  parsed_docs
  .select("doc_parsed_contents")
  .where(F.col("path") == F.lit(_path))
  .take(1)[0].doc_parsed_contents['parsed_content'])

In [0]:
print(sample)

## `function-calling` with Llama 3.1 70b

* Since you are looking to extract information from a peice of text and perform structured output, `function-calling` is a good tool for this
* Basically, you are constraining the output of the LLM to something that's structured and valid
* At this time, out of all the models that we offer in our Foundation Model API, only `databricks-meta-llama-3-1-70b-instruct` currently have `function-calling` enabled. 

In [0]:
# create your function schema
get_company_information_1 = {
    "type": "function",
    "function": {
        "name": "get_EBITDA_kpi_1",
        "description": "Get EBITDA kpi values",
        "parameters": {
            "type": "object",
            "properties": {
                "reported_EBITDA": {
                    "type": "string",
                    "description": "Reported EBITDA for the year 2021",
                },
                "adjusted_EBITDA": {
                    "type": "string",
                    "description": "adjusted EBITDA for for the year 2021 Source: Management information & EY analysis",
                },
                "adjusted_NWC": {
                    "type": "string",
                    "description": "adjusted NWC for for the Period End Jun21A",
                },
            },
        },
    },
}

Note that only the function_schema is defined and not a function itself. The tool calling below will only produce the function arguments and not call a function itself. The arguments to the function are the extracted entities. 

In [0]:
# call LLM with function calling
tools = [
    get_company_information_1,
    # get_claimant_information_2,
    # get_claimant_information_3,
    # get_claimant_information_4,
    # get_claimant_information_5,
]

messages = [
    {
        "role": "user",
        "content": f"Given the following information, please provide kpi information: \n{sample}\n",
    }
]

responses = []

for t in tools:
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.0,
        tools=[t],
        tool_choice="required",
    )
    output_string = response.choices[0].message.model_dump()["tool_calls"][0][
        "function"
    ]["arguments"]
    responses.append(output_string)

In [0]:
# look at output as JSON object
import json

for r in responses:
    json_obj = json.loads(r)
    print(json.dumps(json_obj, indent=4))

In [0]:
response.usage.total_tokens