// https://huggingface.co/ibm-granite/granite-20b-functioncalling

From the paper - Introducing Function calling abilities into Granite model family

https://arxiv.org/pdf/2407.00121v1

In [None]:
!wget -O granite-20b-functioncalling.i1-Q4_K_S.gguf https://huggingface.co/mradermacher/granite-20b-functioncalling-i1-GGUF/resolve/main/granite-20b-functioncalling.i1-Q4_K_S.gguf?download=true

In [None]:
!pip install 'huggingface_hub[cli,torch]'
!pip install transformers==4.34.0

In [None]:
from huggingface_hub import login, logout
import os
login(os.getenv("HF_API_KEY"))

In [None]:
!huggingface-cli download mradermacher/granite-20b-functioncalling-i1-GGUF granite-20b-functioncalling.i1-Q4_K_S.gguf --local-dir . --local-dir-use-symlinks False

In [None]:
import json
import torch
from transformers import AutoTokenizer

In [2]:
device = "cuda" # or "cpu"
model_path = "ibm-granite/granite-20b-functioncalling"
tokenizer = AutoTokenizer.from_pretrained(model_path)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
!NVCC_APPEND_FLAGS='-allow-unsupported-compiler' CUDACXX=/usr/local/cuda/bin/nvcc CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=all-major" FORCE_CMAKE=1 pip install llama-cpp-python[server] --no-cache-dir --force-reinstall --upgrade

In [3]:
from llama_cpp import Llama
llm = Llama(model_path="granite-20b-functioncalling.i1-Q4_K_S.gguf", n_gpu_layers=-1)

llama_model_loader: loaded meta data with 32 key-value pairs and 628 tensors from granite-20b-functioncalling.i1-Q4_K_S.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = starcoder
llama_model_loader: - kv   1:                               general.name str              = StarCoder
llama_model_loader: - kv   2:                   starcoder.context_length u32              = 8192
llama_model_loader: - kv   3:                 starcoder.embedding_length u32              = 6144
llama_model_loader: - kv   4:              starcoder.feed_forward_length u32              = 24576
llama_model_loader: - kv   5:                      starcoder.block_count u32              = 52
llama_model_loader: - kv   6:             starcoder.attention.head_count u32              = 48
llama_model_loader: - kv   7:          starcoder.attention.head_

In [4]:
# define the user query and list of available functions
query = "What's the current weather in New York?"
functions = [
    {
        "name": "get_current_weather",
        "description": "Get the current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                }
            },
            "required": ["location"]
        }
    },
    {
        "name": "get_stock_price",
        "description": "Retrieves the current stock price for a given ticker symbol. The ticker symbol must be a valid symbol for a publicly traded company on a major US stock exchange like NYSE or NASDAQ. The tool will return the latest trade price in USD. It should be used when the user asks about the current or most recent price of a specific stock. It will not provide any other information about the stock or company.",
        "parameters": {
            "type": "object",
            "properties": {
                "ticker": {
                    "type": "string",
                    "description": "The stock ticker symbol, e.g. AAPL for Apple Inc."
                }
            },
            "required": ["ticker"]
        }
    }
]


# serialize functions and define a payload to generate the input template
payload = {
    "functions_str": [json.dumps(x) for x in functions],
    "query": query,
}

instruction = tokenizer.apply_chat_template(payload, tokenize=False, add_generation_prompt=True)

# tokenize the text
# input_tokens = tokenizer(instruction, return_tensors="pt").to(device)


Using sep_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using mask_token, but it is not set yet.


In [5]:
outputs = llm(instruction, max_tokens=500, stop=["<function_call>:", "\n"], echo=True)


llama_print_timings:        load time =     440.15 ms
llama_print_timings:      sample time =       0.87 ms /    26 runs   (    0.03 ms per token, 29919.45 tokens per second)
llama_print_timings: prompt eval time =     439.73 ms /   324 tokens (    1.36 ms per token,   736.81 tokens per second)
llama_print_timings:        eval time =    1304.37 ms /    25 runs   (   52.17 ms per token,    19.17 tokens per second)
llama_print_timings:       total time =    1759.73 ms /   349 tokens


In [6]:
outputs

{'id': 'cmpl-598b1212-1e5a-46db-a2cd-5be25dc426a3',
 'object': 'text_completion',
 'created': 1723694903,
 'model': 'granite-20b-functioncalling.i1-Q4_K_S.gguf',
 'choices': [{'text': 'SYSTEM: You are a helpful assistant with access to the following function calls. Your task is to produce a sequence of function calls necessary to generate response to the user utterance. Use the following function calls as required. \n<|function_call_library|>\n{"name": "get_current_weather", "description": "Get the current weather", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}}, "required": ["location"]}}\n{"name": "get_stock_price", "description": "Retrieves the current stock price for a given ticker symbol. The ticker symbol must be a valid symbol for a publicly traded company on a major US stock exchange like NYSE or NASDAQ. The tool will return the latest trade price in USD. It should be used when the use

In [7]:
# define the user query and list of available functions
query = "What's the current IBM Stock Price?"
functions = [
    {
        "name": "get_current_weather",
        "description": "Get the current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                }
            },
            "required": ["location"]
        }
    },
    {
        "name": "get_stock_price",
        "description": "Retrieves the current stock price for a given ticker symbol. The ticker symbol must be a valid symbol for a publicly traded company on a major US stock exchange like NYSE or NASDAQ. The tool will return the latest trade price in USD. It should be used when the user asks about the current or most recent price of a specific stock. It will not provide any other information about the stock or company.",
        "parameters": {
            "type": "object",
            "properties": {
                "ticker": {
                    "type": "string",
                    "description": "The stock ticker symbol, e.g. AAPL for Apple Inc."
                }
            },
            "required": ["ticker"]
        }
    }
]


# serialize functions and define a payload to generate the input template
payload = {
    "functions_str": [json.dumps(x) for x in functions],
    "query": query,
}

instruction = tokenizer.apply_chat_template(payload, tokenize=False, add_generation_prompt=True)

# tokenize the text
# input_tokens = tokenizer(instruction, return_tensors="pt").to(device)


Using sep_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using mask_token, but it is not set yet.


In [8]:
outputs = llm(instruction, max_tokens=500, stop=["<function_call>:", "\n"], echo=True)

Llama.generate: 318 prefix-match hit, remaining 5 prompt tokens to eval

llama_print_timings:        load time =     440.15 ms
llama_print_timings:      sample time =       0.95 ms /    26 runs   (    0.04 ms per token, 27282.27 tokens per second)
llama_print_timings: prompt eval time =      64.08 ms /     5 tokens (   12.82 ms per token,    78.02 tokens per second)
llama_print_timings:        eval time =    1300.93 ms /    25 runs   (   52.04 ms per token,    19.22 tokens per second)
llama_print_timings:       total time =    1379.41 ms /    30 tokens


In [9]:
outputs

{'id': 'cmpl-ec69510b-0c28-4c6e-983c-28887937b23c',
 'object': 'text_completion',
 'created': 1723695158,
 'model': 'granite-20b-functioncalling.i1-Q4_K_S.gguf',
 'choices': [{'text': 'SYSTEM: You are a helpful assistant with access to the following function calls. Your task is to produce a sequence of function calls necessary to generate response to the user utterance. Use the following function calls as required. \n<|function_call_library|>\n{"name": "get_current_weather", "description": "Get the current weather", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}}, "required": ["location"]}}\n{"name": "get_stock_price", "description": "Retrieves the current stock price for a given ticker symbol. The ticker symbol must be a valid symbol for a publicly traded company on a major US stock exchange like NYSE or NASDAQ. The tool will return the latest trade price in USD. It should be used when the use

In [19]:
# define the user query and list of available functions
query = "Should i buy NVIDIA stock when its raining in Sydney?"
functions = [
    {
        "name": "get_current_weather",
        "description": "Get the current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                }
            },
            "required": ["location"]
        }
    },
    {
        "name": "get_stock_price",
        "description": "Retrieves the current stock price for a given ticker symbol. The ticker symbol must be a valid symbol for a publicly traded company on a major US stock exchange like NYSE or NASDAQ. The tool will return the latest trade price in USD. It should be used when the user asks about the current or most recent price of a specific stock. It will not provide any other information about the stock or company.",
        "parameters": {
            "type": "object",
            "properties": {
                "ticker": {
                    "type": "string",
                    "description": "The stock ticker symbol, e.g. AAPL for Apple Inc."
                }
            },
            "required": ["ticker"]
        }
    }
]


# serialize functions and define a payload to generate the input template
payload = {
    "functions_str": [json.dumps(x) for x in functions],
    "query": query,
}

instruction = tokenizer.apply_chat_template(payload, tokenize=False, add_generation_prompt=True)

# tokenize the text
# input_tokens = tokenizer(instruction, return_tensors="pt").to(device)

Using sep_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using mask_token, but it is not set yet.


In [20]:
outputs = llm(instruction, max_tokens=500, stop=["<function_call>:", "\n"], echo=True)

Llama.generate: 320 prefix-match hit, remaining 9 prompt tokens to eval

llama_print_timings:        load time =     440.15 ms
llama_print_timings:      sample time =       2.14 ms /    53 runs   (    0.04 ms per token, 24801.12 tokens per second)
llama_print_timings: prompt eval time =      66.34 ms /     9 tokens (    7.37 ms per token,   135.66 tokens per second)
llama_print_timings:        eval time =    2718.90 ms /    52 runs   (   52.29 ms per token,    19.13 tokens per second)
llama_print_timings:       total time =    2820.24 ms /    61 tokens


In [21]:
outputs

{'id': 'cmpl-03b8f24a-558e-4358-bae3-b6095d2c17a9',
 'object': 'text_completion',
 'created': 1723695364,
 'model': 'granite-20b-functioncalling.i1-Q4_K_S.gguf',
 'choices': [{'text': 'SYSTEM: You are a helpful assistant with access to the following function calls. Your task is to produce a sequence of function calls necessary to generate response to the user utterance. Use the following function calls as required. \n<|function_call_library|>\n{"name": "get_current_weather", "description": "Get the current weather", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}}, "required": ["location"]}}\n{"name": "get_stock_price", "description": "Retrieves the current stock price for a given ticker symbol. The ticker symbol must be a valid symbol for a publicly traded company on a major US stock exchange like NYSE or NASDAQ. The tool will return the latest trade price in USD. It should be used when the use