A notebook to test a pair of a provider and a model: \
This implementation is based on "3.4 特定のproviderでエラーが出る場合の対応" in `README_t4.md`.

Setup: \
To use this notebook, you need to use `python>=3.10.0` and install as follows
```
pip install "../../lighteval[math,extended_tasks,litellm,vllm]" "transformers>=4.51.0,<4.53.0" "openai>=1.40.0" "datasets<4.0.0" "ipywidgets"
```
This lineup can be changed due to an update or your environment.

In [1]:
from dotenv import load_dotenv
assert load_dotenv('../../.env'), "Failed to load .env file"

import litellm
import os

In [None]:
# Helper functions
def get_base_url(provider):
    base_url_dict = {
        "openai": "https://api.openai.com/v1",
        "deepinfra": "https://api.deepinfra.com/v1/openai",
        "vllm": "http://localhost:8000/v1",
    }
    return base_url_dict[provider]

def get_api_key(provider):
    api_name_dict = {
        "openai": "OPENAI_API_KEY",
        "deepinfra": "DEEPINFRA_API_KEY",
        "vllm": None,
    }
    return os.getenv(api_name_dict[provider]) if api_name_dict[provider] else ""

In [8]:
# Setup parameters
provider = "vllm"
base_url = get_base_url(provider)

model = "tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5"

optional_params = {
    # You should refer to the official documentation for the parameters: https://docs.litellm.ai/docs/api-reference/litellm.completion.
    "n": 2,
    "temperature": 0.6,
    "max_tokens": 128,
}

api_key = get_api_key(provider)
if api_key != "": optional_params["api_key"] = api_key

In [9]:
# Set a test prompt
test_prompt = "こんにちは。なにかしゃべって"

In [10]:
# Define request payload
request_payload = {
    "model": f"{provider}/{model}",
    "messages": [
        {
            "role": "user",
            "content": test_prompt,
        }
    ],
    "logprobs": None,
    "caching": False,
    "base_url": base_url,
    **optional_params
}

In [11]:
# Check the payload
request_payload

{'model': 'vllm/tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5',
 'messages': [{'role': 'user', 'content': 'こんにちは。なにかしゃべって'}],
 'logprobs': None,
 'caching': False,
 'base_url': 'http://localhost:8000/v1',
 'n': 2,
 'temperature': 0.6,
 'max_tokens': 128}

In [12]:
# Get responses (this takes a while)
responses = litellm.completion(**request_payload)

Processed prompts:   0%|          | 0/2 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]

Processed prompts:  50%|█████     | 1/2 [00:01<00:01,  1.87s/it, est. speed input: 21.92 toks/s, output: 101.59 toks/s]


In [15]:
print(responses)

ModelResponse(id='chatcmpl-50b6f2a1-0f24-4c2c-b715-7c1b44efa2e1', created=1752474184, model='tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='stop', index=0, message=Message(content='こんにちは！何かお話したいことはありますか？😊\n\n例えば、\n\n*   **雑談**：最近あった面白いこと、好きなこと、興味のあることなど\n*   **質問**：何か知りたいこと、困っていること\n*   **お願い**：何か手伝ってほしいこと、相談したいこと\n*   **ゲーム**：簡単なクイズやなぞなぞ\n\nどんなことでも構いませんので、お気軽にお話くださいね。', role='assistant', tool_calls=None, function_call=None, provider_specific_fields=None))], usage=Usage(completion_tokens=115, prompt_tokens=41, total_tokens=156, completion_tokens_details=None, prompt_tokens_details=None))


In [16]:
print(responses.choices[0].message.content)

こんにちは！何かお話したいことはありますか？😊

例えば、

*   **雑談**：最近あった面白いこと、好きなこと、興味のあることなど
*   **質問**：何か知りたいこと、困っていること
*   **お願い**：何か手伝ってほしいこと、相談したいこと
*   **ゲーム**：簡単なクイズやなぞなぞ

どんなことでも構いませんので、お気軽にお話くださいね。
