# Function calling using Foundation Model APIs

This notebook demonstrates how the *function calling* (or *tool use*) API can be used to extract structured information from natural language inputs using the large language models (LLMs) made available using Foundation Model APIs. This notebook uses the OpenAI SDK to demonstrate interoperability.


LLMs generate output in natural language, the exact structure of which is hard to predict even when the LLM is given precise instructions. Function calling forces the LLM to adhere to a strict schema, making it easy to automatically parse the LLM's outputs. This unlocks advanced use cases, enabling LLMs to be components in complex data processing pipelines and Agent workflows.

### Set up environment

In [0]:
%pip install --upgrade openai tenacity tqdm
dbutils.library.restartPython()

In [0]:
%run "./_resources/00-init"

In [0]:
# The endpoint ID of the model to use. Not all endpoints support function calling.
MODEL_ENDPOINT_ID = "databricks-meta-llama-3-3-70b-instruct"

In [0]:
import concurrent.futures
import pandas as pd
from openai import OpenAI, RateLimitError
from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential,
    retry_if_exception,
)  # for exponential backoff
from tqdm.notebook import tqdm
from typing import List, Optional


# A token and the workspace's base FMAPI URL are needed to talk to endpoints
fmapi_token = (
    dbutils.notebook.entry_point.getDbutils()
    .notebook()
    .getContext()
    .apiToken()
    .getOrElse(None)
)
fmapi_base_url = (
    f'https://{spark.conf.get("spark.databricks.workspaceUrl")}/serving-endpoints'
)


The following defines helper functions that assist the LLM to respond according to the specified schema.

In [0]:

openai_client = OpenAI(api_key=fmapi_token, base_url=fmapi_base_url)


# NOTE: We *strongly* recommend handling retry errors with backoffs, so your code gracefully degrades when it bumps up against pay-per-token rate limits.
@retry(
    wait=wait_random_exponential(min=1, max=30),
    stop=stop_after_attempt(3),
    retry=retry_if_exception(RateLimitError),
)

def call_chat_model(
    prompt: str, temperature: float = 0.0, max_tokens: int = 100, **kwargs
):
    """Calls the chat model and returns the response text or tool calls."""
    chat_args = {
        "model": MODEL_ENDPOINT_ID,
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt},
        ],
        "max_tokens": max_tokens,
        "temperature": temperature,
    }
    chat_args.update(kwargs)

    try:
        chat_completion = openai_client.chat.completions.create(**chat_args)
        response = chat_completion.choices[0].message

        if response.tool_calls:
            call_args = [c.function.arguments for c in response.tool_calls]
            if len(call_args) == 1:
                return call_args[0]
            return call_args
        
        return response.content  
    except Exception as e:
        # print(f"Error: {e}")
        return None
    
def call_in_parallel(func, prompts: List[str]) -> List:
    """Calls func(p) for all prompts in parallel and returns responses."""
    # This uses a relatively small thread pool to avoid triggering default workspace rate limits.
    with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
        results = []
        for r in tqdm(executor.map(func, prompts), total=len(prompts)):
            results.append(r)
        return results


def results_to_dataframe(units: List[str], responses: List[str]):
    """Combines reviews and model responses into a dataframe for tabular display."""
    return pd.DataFrame({"Units": units, "Model response": responses})


## Example: Taxonomy Consolidation
This section demonstrates a few increasingly reliable approaches for consolidating taxonomy descriptions:
* **Unstructured (least reliable)**: Basic prompting. Relies on the model to generate valid JSON on its own.
* **Tool schema**: Augment prompt with a tool schema, guiding the model to adhere to that schema.

In [0]:
catalog = catalog
schema = db
table_name = "raw_supplier_dummy_data"
                     
# Read the DataFrame to Unity Catalog as a Delta table
table_path = f"{catalog}.{schema}.{table_name}"
clean_df = spark.table(table_path)
display(clean_df)

In [0]:
delivery_unit_name_list = clean_df.select("DELIVERY_UNIT_NAME").rdd.flatMap(lambda x: x).collect()

### Unstructured generation
Given a unit name, the most obvious strategy is to instruct the model to generate a JSON that looks like this: `{"label": "UNIT_A"}`.

This approach mostly works with models like DBRX and Llama-3-70B. However, sometimes models generate extraneous text such as, "helpful" comments about the task or input.

Prompt engineering can refine performance. For example, SHOUTING instructions at the model is a popular strategy. But if you use this strategy you must validate the output to detect and disregard nonconformant outputs.

In [0]:
PROMPT_TEMPLATE = """
Imagine you are trying to consolidate the delivery unit names that can have many variations. Your task is to map the delivery unit {unit} to one of the following categories:
[
    "Logistics Unit 1",
    "Supply Team A",
    "Delivery Group North",
    "Central Distribution Team",
    "East Logistics Hub",
    "West End Delivery",
    "Urban Supply Group",
    "Rural Delivery Unit",
    "Coastal Logistics",
    "City Centre Distribution",
    "North Delivery Hub",
    "Midlands Logistics Team",
    "Southwest Supply Group",
    "Northwest Distribution",
    "London Logistics Unit",
    "Southern Delivery Squad",
    "East Coast Dispatch",
    "Regional Delivery Team A",
    "Western Supply Chain",
    "Inner City Logistics",
    "Central Midlands Delivery",
    "Remote Area Delivery",
    "Urban Hub Logistics",
    "Express Delivery Unit",
    "Northern Distribution Centre",
    "Route B Supply Team",
    "West Midlands Distribution",
    "East End Logistics",
    "Metro Delivery Unit",
    "Capital Logistics",
    "Suburban Supply Group",
    "Greater London Dispatch",
    "Outer Ring Logistics",
    "Highlands Delivery Team",
    "Valley Supply Unit",
    "Central Hub Dispatch",
    "Rural Network Logistics",
    "West Coast Delivery",
    "Supply Chain Express",
    "South Logistics Unit",
    "Northeast Distribution Team",
    "South Delivery Hub",
    "East Midlands Supply",
    "London Central Dispatch",
    "Island Delivery Group",
    "Regional Logistics Unit B",
    "Express Route Distribution",
    "City Zone Logistics",
    "Outskirt Delivery Unit",
    "Central District Supply"
]
*Do no answer with None
*Must find the closest label to the input
return Your output in json format. Do not add extra text
"""

def prompt_with_outlier_tool(delivery_unit_name_list: List[str]):
    # Convert the list of products to a string format suitable for the LLM
    units_str = "\n".join(delivery_unit_name_list)
    prompt = PROMPT_TEMPLATE.format(unit=units_str)
    return call_chat_model(prompt)

results = call_in_parallel(prompt_with_outlier_tool, delivery_unit_name_list)

results_df = results_to_dataframe(delivery_unit_name_list, results)

In [0]:
display(results_df)

### Classifying with tools
Output quality can be improved by using the `tools` API. You can provide a strict JSON schema for the output, and the FMAPI inference service ensures that the model's output either adheres to this schema or returns an error if this is not possible.

In [0]:
PROMPT_TEMPLATE_UPDATED = """
Imagine you are trying to consolidate the delivery unit names that can have many variations. Your task is to map the delivery unit {unit} to one category
"""

In [0]:
tools = [
    {
        "type": "function",
        "function": {
            "name": "_taxonomy_consolidation",
            "description": "Consolidate the taxonomy of delivery units",
            "parameters": {
                "type": "object",
                "properties": {
                    "unit": {
                        "type": "string",
                        "enum": ["Logistics Unit 1",
                                "Supply Team A",
                                "Delivery Group North",
                                "Central Distribution Team",
                                "East Logistics Hub",
                                "West End Delivery",
                                "Urban Supply Group",
                                "Rural Delivery Unit",
                                "Coastal Logistics",
                                "City Centre Distribution",
                                "North Delivery Hub",
                                "Midlands Logistics Team",
                                "Southwest Supply Group",
                                "Northwest Distribution",
                                "London Logistics Unit",
                                "Southern Delivery Squad",
                                "East Coast Dispatch",
                                "Regional Delivery Team A",
                                "Western Supply Chain",
                                "Inner City Logistics",
                                "Central Midlands Delivery",
                                "Remote Area Delivery",
                                "Urban Hub Logistics",
                                "Express Delivery Unit",
                                "Northern Distribution Centre",
                                "Route B Supply Team",
                                "West Midlands Distribution",
                                "East End Logistics",
                                "Metro Delivery Unit",
                                "Capital Logistics",
                                "Suburban Supply Group",
                                "Greater London Dispatch",
                                "Outer Ring Logistics",
                                "Highlands Delivery Team",
                                "Valley Supply Unit",
                                "Central Hub Dispatch",
                                "Rural Network Logistics",
                                "West Coast Delivery",
                                "Supply Chain Express",
                                "South Logistics Unit",
                                "Northeast Distribution Team",
                                "South Delivery Hub",
                                "East Midlands Supply",
                                "London Central Dispatch",
                                "Island Delivery Group",
                                "Regional Logistics Unit B",
                                "Express Route Distribution",
                                "City Zone Logistics",
                                "Outskirt Delivery Unit",
                                "Central District Supply"]               
                    }
                },  # This closing brace was missing
                "required": ["unit"]
            }
        }
    }
]


def prompt_with_tool(delivery_unit_name_list: List[str]):
    # Convert the list of products to a string format suitable for the LLM
    units_str = "\n".join(delivery_unit_name_list)
    prompt = PROMPT_TEMPLATE.format(unit=units_str)
    return call_chat_model(prompt, tools=tools)

results = call_in_parallel(prompt_with_tool, delivery_unit_name_list)

tagged_df=results_to_dataframe(delivery_unit_name_list, results)

In [0]:
display(tagged_df)