[RAG](https://dspy.ai/tutorials/rag/) Using a DSPy Optimizer to improve your RAG prompt

In [1]:
import ujson
from dspy.utils import download

download("https://huggingface.co/dspy/cache/resolve/main/ragqa_arena_tech_corpus.jsonl")

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
import dspy
import ujson
from sentence_transformers import SentenceTransformer

max_characters = 6000  # for truncating >99th percentile of documents
topk_docs_to_retrieve = 5  # number of documents to retrieve per search query

with open("ragqa_arena_tech_corpus.jsonl") as f:
    corpus = [ujson.loads(line)['text'][:max_characters] for line in f]
    print(f"Loaded {len(corpus)} documents. Will encode them below.")

# same embedding model we use in our RAG
model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2", device="cuda")
embedder = dspy.Embedder(model.encode)
search = dspy.retrievers.Embeddings(embedder=embedder, corpus=corpus, k=topk_docs_to_retrieve, brute_force_threshold=30000)

  from .autonotebook import tqdm as notebook_tqdm


Loaded 28436 documents. Will encode them below.


Batches: 100%|██████████| 1/1 [00:00<00:00, 72.91it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 71.16it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 112.64it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 96.36it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 86.82it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 159.94it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 58.37it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 81.52it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 122.34it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  8.62it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 23.25it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 50.70it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 62.06it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 142.52it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 51.98it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 39.00it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 77.50it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 55.37it/s]
Batche

In [4]:
import dspy
import openai
import os

LLM_URL=os.getenv('LLM_URL', 'http://localhost:8080/v1')
API_KEY=os.getenv('API_KEY', 'fake')
LLM_MODEL=os.getenv('LLM_MODEL', 'openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf')
MAX_TOKENS=os.getenv('MAX_TOKENS', 6000)
TEMPERATURE=os.getenv('TEMPERATURE', 0.2)
dspy.enable_logging()
lm = dspy.LM(model=LLM_MODEL,
             api_base=LLM_URL,  # ensure this points to your port
             api_key=API_KEY,
             temperature=TEMPERATURE,
             model_type='chat',
             stream=False)
dspy.configure(lm=lm)
#dspy.settings.configure(track_usage=True)

In [5]:
class RAG(dspy.Module):
    def __init__(self):
        self.respond = dspy.ChainOfThought('context, question -> response')

    def forward(self, question):
        context = search(question).passages
        return self.respond(context=context, question=question)

In [None]:
rag = RAG()
rag(question="what are high memory and low memory on linux?")

Prediction(
    reasoning='High Memory and Low Memory are terms used to describe the division of memory space in a Linux system. High Memory refers to the segment of memory that user-space programs can access, while Low Memory is the segment that the Linux kernel can access directly. This division is necessary to prevent user-space applications from accessing kernel-space memory, which could potentially lead to security vulnerabilities.',
    response='High Memory and Low Memory are terms used to describe the division of memory space in a Linux system. High Memory refers to the segment of memory that user-space programs can access, while Low Memory is the segment that the Linux kernel can access directly. This division is necessary to prevent user-space applications from accessing kernel-space memory, which could potentially lead to security vulnerabilities. The Linux kernel splits the available memory into two parts: High Memory (user-space) and Low Memory (kernel-space), with the lat

[92m12:45:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:45:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:45:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:45:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:45:31 - LiteLLM:INFO[0m: co

In [7]:
dspy.inspect_history()





[34m[2025-06-03T12:22:46.432884][0m

[31mSystem message:[0m

Your input fields are:
1. `context` (str)
2. `question` (str)
Your output fields are:
1. `reasoning` (str)
2. `response` (str)
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## context ## ]]
{context}

[[ ## question ## ]]
{question}

[[ ## reasoning ## ]]
{reasoning}

[[ ## response ## ]]
{response}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Given the fields `context`, `question`, produce the fields `response`.


[31mUser message:[0m

[[ ## context ## ]]
[1] «As far as I remember, High Memory is used for application space and Low Memory for the kernel. Advantage is that (user-space) applications cant access kernel-space memory.»
[2] «This is relevant to the Linux kernel; Im not sure how any Unix kernel handles this. The High Memory is the segment of memory that user-space programs can address. It cannot touch Low Memory. Lo

In [15]:
import random
import ujson

with open("ragqa_arena_tech_examples.jsonl") as f:
    data = [ujson.loads(line) for line in f]

data = [dspy.Example(**d).with_inputs('question') for d in data]

random.Random(0).shuffle(data)
# 200, 200:500, 500:1000
trainset, devset, testset = data[:50], data[50:150], data[150:450]

len(trainset), len(devset), len(testset)

(50, 100, 300)

In [16]:
from dspy.evaluate import SemanticF1

# Instantiate the metric.
metric = SemanticF1(decompositional=True)

# Define an evaluator that we can re-use.
evaluate = dspy.Evaluate(devset=devset, metric=metric, num_threads=24,
                         display_progress=True, display_table=2)

evaluate(RAG())

  0%|          | 0/100 [00:00<?, ?it/s]

Average Metric: 55.82 / 100 (55.8%): 100%|██████████| 100/100 [12:18<00:00,  7.39s/it]

2025/06/03 12:40:10 INFO dspy.evaluate.evaluate: Average Metric: 55.81709025389931 / 100 (55.8%)





Unnamed: 0,question,example_response,gold_doc_ids,reasoning,pred_response,SemanticF1
0,does using == in javascript ever make sense?,"Yes, using `==` in JavaScript can make sense and is convenient in ...","[5778, 5791, 5818]",The use of `==` in JavaScript can be misleading due to its behavio...,"Yes, using `==` in JavaScript can make sense in certain situations...",✔️ [0.667]
1,what is the difference between a virus and trojan?,The terms have a great deal of overlap and aren't necessarily mutu...,"[3768, 3769, 3888, 3890, 4046]",The difference between a virus and a Trojan lies in how they sprea...,"A virus and a Trojan are both types of malware, but they differ in...",✔️ [0.600]


55.82

In [17]:
import mlflow

mlflow.set_tracking_uri("http://localhost:5500")
mlflow.set_experiment("optimize-rag")
mlflow.dspy.autolog(
    log_compiles=True,    # Track optimization process
    log_evals=True,       # Track evaluation results
    log_traces_from_compile=True  # Track program traces during optimization
)

2025/06/03 12:44:43 INFO mlflow.tracking.fluent: Experiment with name 'optimize-rag' does not exist. Creating a new experiment.


In [18]:
tp = dspy.MIPROv2(metric=metric, auto="medium", num_threads=24)  # use fewer threads if your rate limit is small

optimized_rag = tp.compile(RAG(), trainset=trainset,
                           max_bootstrapped_demos=2, max_labeled_demos=2,
                           requires_permission_to_run=False)

2025/06/03 12:45:14 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'a3b3616e10834427b5ac20b2cd63de46', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current dspy workflow
2025/06/03 12:45:14 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING MEDIUM AUTO RUN SETTINGS:
num_trials: 18
minibatch: False
num_fewshot_candidates: 12
num_instruct_candidates: 6
valset size: 40

2025/06/03 12:45:14 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2025/06/03 12:45:14 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2025/06/03 12:45:14 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=12 sets of demonstrations...


Bootstrapping set 1/12
Bootstrapping set 2/12
Bootstrapping set 3/12


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 222.11it/s]
  0%|          | 0/10 [00:00<?, ?it/s][92m12:45:15 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:45:23 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:45:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:45:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:Lite

Bootstrapped 2 full traces after 8 examples for up to 1 rounds, amounting to 8 attempts.


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 101.92it/s]


Bootstrapping set 4/12


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 173.67it/s]
  0%|          | 0/10 [00:00<?, ?it/s][92m12:47:28 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:47:32 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:47:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:47:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:Lite

Bootstrapped 2 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 449.65it/s]


Bootstrapping set 5/12


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 162.38it/s]
  0%|          | 0/10 [00:00<?, ?it/s][92m12:47:50 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:47:56 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:47:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:47:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:Lite

Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 297.55it/s]


Bootstrapping set 6/12


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 181.20it/s]
  0%|          | 0/10 [00:00<?, ?it/s][92m12:48:02 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:48:09 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:48:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:48:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:Lite

Bootstrapped 1 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 68.31it/s]


Bootstrapping set 7/12


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 202.64it/s]
  0%|          | 0/10 [00:00<?, ?it/s][92m12:48:32 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:48:42 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:48:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:48:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:Lite

Bootstrapped 1 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 127.95it/s]


Bootstrapping set 8/12


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 147.57it/s]
  0%|          | 0/10 [00:00<?, ?it/s][92m12:49:00 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:49:06 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:49:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:49:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:Lite

Bootstrapped 1 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 180.78it/s]


Bootstrapping set 9/12


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 175.23it/s]
  0%|          | 0/10 [00:00<?, ?it/s][92m12:49:43 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:49:51 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:49:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:49:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:Lite

Bootstrapped 2 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 132.24it/s]


Bootstrapping set 10/12


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 224.02it/s]
  0%|          | 0/10 [00:00<?, ?it/s][92m12:50:07 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:50:17 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:50:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:50:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:Lite

Bootstrapped 1 full traces after 3 examples for up to 1 rounds, amounting to 3 attempts.


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 173.53it/s]


Bootstrapping set 11/12


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 194.04it/s]
  0%|          | 0/10 [00:00<?, ?it/s][92m12:51:04 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:51:14 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:51:14 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:51:14 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:Lite

Bootstrapped 2 full traces after 5 examples for up to 1 rounds, amounting to 5 attempts.


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 185.85it/s]


Bootstrapping set 12/12


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 157.42it/s]
  0%|          | 0/10 [00:00<?, ?it/s][92m12:52:23 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:52:29 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:52:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:52:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:Lite

Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 218.42it/s]
2025/06/03 12:52:34 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2025/06/03 12:52:34 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.
[92m12:52:34 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:52:41 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:52:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost cal

  0%|          | 0/40 [00:00<?, ?it/s]

[92m12:55:14 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m12:55:14 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m12:55:18 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m12:55:18 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m12:55:18 - LiteLLM:INFO[0m: utils.py:2991 - 
L

Average Metric: 0.53 / 1 (53.3%):   2%|▎         | 1/40 [01:09<44:59, 69.22s/it]

[92m12:56:23 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:56:26 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:56:26 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:56:26 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 0.53 / 2 (26.7%):   5%|▌         | 2/40 [01:11<18:57, 29.94s/it]

[92m12:56:26 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:56:30 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:56:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:56:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 0.87 / 3 (28.9%):   8%|▊         | 3/40 [01:16<11:21, 18.42s/it]

[92m12:56:31 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:56:37 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:56:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:56:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 1.53 / 4 (38.3%):  10%|█         | 4/40 [01:22<08:14, 13.75s/it]

[92m12:56:37 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:56:39 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:56:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:56:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 2.31 / 5 (46.2%):  12%|█▎        | 5/40 [01:25<05:40,  9.72s/it]

[92m12:56:40 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:56:42 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:56:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:56:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 3.10 / 6 (51.6%):  15%|█▌        | 6/40 [01:28<04:12,  7.43s/it]

[92m12:56:43 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:56:48 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:56:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:56:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 3.92 / 7 (55.9%):  18%|█▊        | 7/40 [01:34<03:47,  6.91s/it]

[92m12:56:49 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:56:49 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:56:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:56:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 4.58 / 8 (57.3%):  20%|██        | 8/40 [01:35<02:38,  4.94s/it]

[92m12:56:49 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:56:55 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:56:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:56:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 5.20 / 9 (57.7%):  22%|██▎       | 9/40 [01:40<02:42,  5.24s/it]

[92m12:56:55 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:56:55 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:56:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:56:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 6.04 / 10 (60.4%):  25%|██▌       | 10/40 [01:41<01:52,  3.77s/it]

[92m12:56:56 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:57:05 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:57:05 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:57:05 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 6.71 / 11 (61.0%):  28%|██▊       | 11/40 [01:50<02:39,  5.51s/it]

[92m12:57:05 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:57:06 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:57:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:57:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 7.38 / 12 (61.5%):  30%|███       | 12/40 [01:51<01:53,  4.05s/it]

[92m12:57:06 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:57:11 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:57:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:57:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 8.04 / 13 (61.9%):  32%|███▎      | 13/40 [01:56<01:59,  4.43s/it]

[92m12:57:11 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:57:11 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:57:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:57:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 8.93 / 14 (63.8%):  35%|███▌      | 14/40 [01:57<01:23,  3.22s/it]

[92m12:57:11 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:57:16 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:57:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:57:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 9.72 / 15 (64.8%):  38%|███▊      | 15/40 [02:02<01:34,  3.76s/it]

[92m12:57:17 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:57:22 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:57:22 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:57:22 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 9.72 / 16 (60.8%):  40%|████      | 16/40 [02:07<01:41,  4.21s/it]

[92m12:57:22 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:57:24 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:57:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:57:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 10.61 / 17 (62.4%):  42%|████▎     | 17/40 [02:09<01:23,  3.62s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:57:31 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:57:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:57:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 11.28 / 18 (62.7%):  45%|████▌     | 18/40 [02:17<01:45,  4.79s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:57:33 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:57:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:57:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 12.05 / 19 (63.4%):  48%|████▊     | 19/40 [02:19<01:22,  3.95s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:57:37 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:57:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:57:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:57:37 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 12.05 / 20 (60.3%):  50%|█████     | 20/40 [03:27<07:45, 23.29s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:58:46 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:58:46 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:58:46 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:58:46 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 12.05 / 21 (57.4%):  52%|█████▎    | 21/40 [03:35<05:52, 18.57s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:58:55 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:58:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:58:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 12.37 / 22 (56.2%):  55%|█████▌    | 22/40 [03:41<04:27, 14.84s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:58:57 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:58:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:58:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 12.66 / 23 (55.0%):  57%|█████▊    | 23/40 [03:43<03:05, 10.94s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:59:03 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:59:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:59:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 13.55 / 24 (56.4%):  60%|██████    | 24/40 [03:49<02:31,  9.48s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:59:07 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:59:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:59:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 14.21 / 25 (56.9%):  62%|██████▎   | 25/40 [03:53<01:58,  7.89s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:59:16 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:59:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:59:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 15.06 / 26 (57.9%):  65%|██████▌   | 26/40 [04:02<01:54,  8.17s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:59:38 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:59:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:59:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 15.73 / 27 (58.2%):  68%|██████▊   | 27/40 [04:23<02:37, 12.11s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:59:40 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:59:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:59:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 16.21 / 28 (57.9%):  70%|███████   | 28/40 [04:26<01:51,  9.31s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:59:45 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:59:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:59:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 16.21 / 29 (55.9%):  72%|███████▎  | 29/40 [04:31<01:27,  7.96s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:59:48 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:59:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:59:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 16.21 / 30 (54.0%):  75%|███████▌  | 30/40 [04:34<01:04,  6.45s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:59:52 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:59:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:59:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 16.81 / 31 (54.2%):  78%|███████▊  | 31/40 [04:37<00:50,  5.57s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:59:55 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:59:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:59:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 17.47 / 32 (54.6%):  80%|████████  | 32/40 [04:40<00:38,  4.87s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m12:59:58 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m12:59:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m12:59:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 18.14 / 33 (55.0%):  82%|████████▎ | 33/40 [04:44<00:30,  4.43s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:00:01 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:00:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:00:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 19.00 / 34 (55.9%):  85%|████████▌ | 34/40 [04:46<00:23,  3.89s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:00:04 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:00:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:00:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 19.60 / 35 (56.0%):  88%|████████▊ | 35/40 [04:50<00:18,  3.66s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:00:07 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:00:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:00:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 20.26 / 36 (56.3%):  90%|█████████ | 36/40 [04:52<00:13,  3.42s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:00:11 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:00:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:00:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 21.11 / 37 (57.1%):  92%|█████████▎| 37/40 [04:56<00:10,  3.54s/it]

[92m13:00:11 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:00:11 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:00:11 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:00:17 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:00:17 - LiteLLM:INFO[0m: cost

Average Metric: 21.89 / 38 (57.6%):  95%|█████████▌| 38/40 [05:02<00:08,  4.31s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:00:19 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:00:19 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:00:19 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:00:19 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 21.89 / 38 (57.6%):  98%|█████████▊| 39/40 [05:22<00:08,  8.94s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:00:41 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:00:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:00:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 22.22 / 39 (57.0%): 100%|██████████| 40/40 [05:27<00:00,  7.64s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:00:43 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:00:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:00:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 22.89 / 40 (57.2%): : 41it [05:28,  8.02s/it]                      

2025/06/03 13:00:43 INFO dspy.evaluate.evaluate: Average Metric: 22.885779768497148 / 40 (57.2%)
2025/06/03 13:00:43 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 57.21

2025/06/03 13:00:43 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 2 / 18 =====



🏃 View run eval_full_0 at: http://localhost:5500/#/experiments/344816129373506955/runs/b716a6751d4946e7aa25199dd808db2f
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955
  0%|          | 0/40 [00:00<?, ?it/s]

[92m13:00:43 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:00:43 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:00:46 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:00:46 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:00:46 - LiteLLM:INFO[0m: utils.py:2991 - 
L

Average Metric: 0.75 / 1 (75.0%):   2%|▎         | 1/40 [01:28<57:47, 88.91s/it]

[92m13:02:12 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:02:16 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:02:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:02:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 1.42 / 2 (70.8%):   5%|▌         | 2/40 [01:32<24:40, 38.97s/it]

[92m13:02:16 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:02:22 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:02:22 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:02:22 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 2.31 / 3 (76.9%):   8%|▊         | 3/40 [01:38<14:46, 23.95s/it]

[92m13:02:22 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:02:22 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:02:22 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:02:22 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 3.16 / 4 (79.1%):  10%|█         | 4/40 [01:39<08:48, 14.69s/it]

[92m13:02:22 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:02:29 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:02:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:02:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 4.01 / 5 (80.2%):  12%|█▎        | 5/40 [01:45<06:48, 11.66s/it]

[92m13:02:29 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:02:29 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:02:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:02:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 4.90 / 6 (81.6%):  15%|█▌        | 6/40 [01:46<04:27,  7.88s/it]

[92m13:02:29 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:02:35 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:02:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:02:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 5.67 / 7 (81.0%):  18%|█▊        | 7/40 [01:52<03:57,  7.20s/it]

[92m13:02:35 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:02:38 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:02:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:02:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 6.34 / 8 (79.2%):  20%|██        | 8/40 [01:55<03:07,  5.86s/it]

[92m13:02:38 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:02:42 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:02:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:02:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 6.91 / 9 (76.8%):  22%|██▎       | 9/40 [01:58<02:40,  5.18s/it]

[92m13:02:42 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:02:44 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:02:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:02:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 7.73 / 10 (77.3%):  25%|██▌       | 10/40 [02:01<02:09,  4.32s/it]

[92m13:02:44 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:02:48 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:02:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:02:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 7.73 / 11 (70.3%):  28%|██▊       | 11/40 [02:05<02:04,  4.31s/it]

[92m13:02:48 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:02:49 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:02:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:02:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 8.40 / 12 (70.0%):  30%|███       | 12/40 [02:06<01:30,  3.23s/it]

[92m13:02:49 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:02:56 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:02:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:02:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 9.00 / 13 (69.2%):  32%|███▎      | 13/40 [02:12<01:55,  4.28s/it]

[92m13:02:56 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:02:58 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:02:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:02:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 9.84 / 14 (70.3%):  35%|███▌      | 14/40 [02:15<01:37,  3.75s/it]

[92m13:02:58 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:03:02 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:03:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:03:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 10.51 / 15 (70.1%):  38%|███▊      | 15/40 [02:19<01:34,  3.79s/it]

[92m13:03:02 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:03:10 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:03:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:03:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 10.84 / 16 (67.8%):  40%|████      | 16/40 [02:27<02:00,  5.02s/it]

[92m13:03:10 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:03:11 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:03:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:03:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 11.51 / 17 (67.7%):  42%|████▎     | 17/40 [02:28<01:28,  3.84s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:03:16 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:03:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:03:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 12.18 / 18 (67.6%):  45%|████▌     | 18/40 [02:33<01:31,  4.17s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:03:18 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:03:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:03:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 12.84 / 19 (67.6%):  48%|████▊     | 19/40 [02:34<01:12,  3.43s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:03:26 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:03:26 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:03:26 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:03:26 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 12.84 / 20 (64.2%):  50%|█████     | 20/40 [04:02<09:34, 28.70s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:04:47 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:04:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:04:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 13.51 / 21 (64.3%):  52%|█████▎    | 21/40 [04:04<06:31, 20.61s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:04:53 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:04:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:04:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 14.18 / 22 (64.4%):  55%|█████▌    | 22/40 [04:10<04:53, 16.32s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:04:56 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:04:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:04:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 14.50 / 23 (63.0%):  57%|█████▊    | 23/40 [04:13<03:26, 12.17s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:05:00 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:05:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:05:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 15.16 / 24 (63.2%):  60%|██████    | 24/40 [04:17<02:37,  9.86s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:05:03 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:05:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:05:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 15.83 / 25 (63.3%):  62%|██████▎   | 25/40 [04:19<01:54,  7.61s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:05:08 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:05:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:05:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 16.68 / 26 (64.1%):  65%|██████▌   | 26/40 [04:25<01:36,  6.92s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:05:23 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:05:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:05:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 16.98 / 27 (62.9%):  68%|██████▊   | 27/40 [04:39<01:59,  9.21s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:05:26 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:05:26 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:05:26 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 17.83 / 28 (63.7%):  70%|███████   | 28/40 [04:43<01:30,  7.57s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:05:32 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:05:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:05:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 17.83 / 29 (61.5%):  72%|███████▎  | 29/40 [04:49<01:17,  7.05s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:05:36 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:05:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:05:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 18.40 / 30 (61.3%):  75%|███████▌  | 30/40 [04:53<01:00,  6.10s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:05:42 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:05:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:05:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 19.00 / 31 (61.3%):  78%|███████▊  | 31/40 [04:59<00:54,  6.06s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:05:42 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:05:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:05:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 19.00 / 32 (59.4%):  80%|████████  | 32/40 [04:59<00:34,  4.32s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:05:50 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:05:50 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:05:50 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 19.66 / 33 (59.6%):  82%|████████▎ | 33/40 [05:06<00:36,  5.22s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:05:55 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:05:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:05:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 20.33 / 34 (59.8%):  85%|████████▌ | 34/40 [05:12<00:31,  5.32s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:05:56 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:05:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:05:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 21.00 / 35 (60.0%):  88%|████████▊ | 35/40 [05:13<00:19,  3.98s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:06:02 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:06:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:06:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 21.00 / 36 (58.3%):  90%|█████████ | 36/40 [05:18<00:17,  4.47s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:06:03 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:06:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:06:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 21.79 / 37 (58.9%):  92%|█████████▎| 37/40 [05:19<00:10,  3.46s/it]

[92m13:06:03 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:06:03 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:06:03 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:06:09 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:06:09 - LiteLLM:INFO[0m: cost

Average Metric: 22.63 / 38 (59.6%):  95%|█████████▌| 38/40 [05:27<00:09,  4.77s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:06:16 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:06:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:06:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:06:16 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 22.63 / 38 (59.6%):  98%|█████████▊| 39/40 [05:41<00:07,  7.47s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:06:26 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:06:26 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:06:26 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:06:26 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 23.52 / 39 (60.3%): 100%|██████████| 40/40 [05:50<00:00,  7.84s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:06:39 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:06:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:06:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 24.21 / 40 (60.5%): : 41it [05:56,  8.69s/it]                      

2025/06/03 13:06:39 INFO dspy.evaluate.evaluate: Average Metric: 24.20897977098229 / 40 (60.5%)
2025/06/03 13:06:39 INFO dspy.teleprompt.mipro_optimizer_v2: [92mBest full score so far![0m Score: 60.52
2025/06/03 13:06:39 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 60.52 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 6'].
2025/06/03 13:06:39 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [57.21, 60.52]
2025/06/03 13:06:39 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 60.52


2025/06/03 13:06:39 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 3 / 18 =====



🏃 View run eval_full_1 at: http://localhost:5500/#/experiments/344816129373506955/runs/a52c98cc281641fda2b36275a0d0956b
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955
  0%|          | 0/40 [00:00<?, ?it/s]

[92m13:06:39 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:06:39 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:06:39 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:06:43 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:06:43 - LiteLLM:INFO[0m: cost

Average Metric: 0.62 / 1 (61.5%):   2%|▎         | 1/40 [02:36<1:41:49, 156.66s/it]

[92m13:09:16 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:09:22 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:09:22 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:09:22 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 1.12 / 2 (55.8%):   5%|▌         | 2/40 [02:42<43:08, 68.11s/it]   

[92m13:09:22 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:09:24 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:09:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:09:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 1.78 / 3 (59.4%):   8%|▊         | 3/40 [02:44<23:20, 37.86s/it]

[92m13:09:24 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:09:28 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:09:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:09:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 2.45 / 4 (61.2%):  10%|█         | 4/40 [02:49<14:47, 24.66s/it]

[92m13:09:28 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:09:29 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:09:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:09:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 3.02 / 5 (60.4%):  12%|█▎        | 5/40 [02:49<09:23, 16.10s/it]

[92m13:09:29 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:09:37 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:09:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:09:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 3.77 / 6 (62.8%):  15%|█▌        | 6/40 [02:58<07:36, 13.41s/it]

[92m13:09:38 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:09:43 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:09:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:09:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 4.54 / 7 (64.9%):  18%|█▊        | 7/40 [03:04<06:01, 10.95s/it]

[92m13:09:43 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:09:47 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:09:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:09:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 5.21 / 8 (65.1%):  20%|██        | 8/40 [03:07<04:37,  8.67s/it]

[92m13:09:47 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:09:52 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:09:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:09:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 5.88 / 9 (65.3%):  22%|██▎       | 9/40 [03:12<03:49,  7.40s/it]

[92m13:09:52 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:09:52 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:09:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:09:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 6.73 / 10 (67.3%):  25%|██▌       | 10/40 [03:12<02:37,  5.26s/it]

[92m13:09:52 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:09:59 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:09:59 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:09:59 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 7.33 / 11 (66.7%):  28%|██▊       | 11/40 [03:19<02:44,  5.69s/it]

[92m13:09:59 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:10:02 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:10:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:10:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 7.33 / 12 (61.1%):  30%|███       | 12/40 [03:23<02:21,  5.05s/it]

[92m13:10:03 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:10:09 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:10:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:10:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 8.18 / 13 (62.9%):  32%|███▎      | 13/40 [03:29<02:30,  5.56s/it]

[92m13:10:09 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:10:14 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:10:14 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:10:14 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 8.96 / 14 (64.0%):  35%|███▌      | 14/40 [03:35<02:21,  5.45s/it]

[92m13:10:14 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:10:18 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:10:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:10:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 9.89 / 15 (65.9%):  38%|███▊      | 15/40 [03:38<02:03,  4.93s/it]

[92m13:10:18 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:10:19 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:10:19 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:10:19 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 10.56 / 16 (66.0%):  40%|████      | 16/40 [03:39<01:30,  3.77s/it]

[92m13:10:19 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:10:24 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:10:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:10:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 11.22 / 17 (66.0%):  42%|████▎     | 17/40 [03:45<01:36,  4.18s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:10:29 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:10:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:10:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 12.11 / 18 (67.3%):  45%|████▌     | 18/40 [03:49<01:33,  4.26s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:10:36 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:10:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:10:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 12.59 / 19 (66.3%):  48%|████▊     | 19/40 [03:56<01:46,  5.08s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:10:56 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:10:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:10:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:10:56 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 12.59 / 20 (63.0%):  50%|█████     | 20/40 [07:39<23:31, 70.57s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:14:22 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:14:22 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:14:22 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 12.59 / 21 (60.0%):  52%|█████▎    | 21/40 [07:42<15:54, 50.26s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:14:30 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:14:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:14:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 13.26 / 22 (60.3%):  55%|█████▌    | 22/40 [07:50<11:17, 37.65s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:14:35 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:14:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:14:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 13.87 / 23 (60.3%):  57%|█████▊    | 23/40 [07:55<07:52, 27.81s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:14:37 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:14:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:14:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 13.87 / 24 (57.8%):  60%|██████    | 24/40 [07:57<05:19, 19.98s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:14:45 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:14:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:14:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 14.72 / 25 (58.9%):  62%|██████▎   | 25/40 [08:05<04:07, 16.50s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:14:53 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:14:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:14:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 15.57 / 26 (59.9%):  65%|██████▌   | 26/40 [08:13<03:15, 13.96s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:15:03 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:15:03 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
[92m13:15:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:15:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:15:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: op

Average Metric: 16.57 / 28 (59.2%):  68%|██████▊   | 27/40 [08:23<02:44, 12.69s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:15:08 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:15:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:15:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 17.23 / 29 (59.4%):  72%|███████▎  | 29/40 [08:29<01:29,  8.10s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:15:10 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:15:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:15:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:15:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:15:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 17.23 / 30 (57.4%):  75%|███████▌  | 30/40 [08:31<01:05,  6.60s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:15:15 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:15:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:15:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 17.90 / 31 (57.7%):  78%|███████▊  | 31/40 [08:36<00:55,  6.17s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:15:18 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:15:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:15:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:15:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 18.76 / 32 (58.6%):  80%|████████  | 32/40 [08:38<00:42,  5.26s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:15:21 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:15:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:15:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 19.42 / 33 (58.9%):  82%|████████▎ | 33/40 [08:41<00:32,  4.63s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:15:27 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:15:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:15:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:15:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 20.21 / 34 (59.5%):  85%|████████▌ | 34/40 [08:47<00:29,  4.93s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:15:27 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:15:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:15:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 20.88 / 35 (59.7%):  88%|████████▊ | 35/40 [08:47<00:17,  3.60s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:15:32 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:15:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:15:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:15:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 21.66 / 36 (60.2%):  90%|█████████ | 36/40 [08:53<00:16,  4.12s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:15:35 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:15:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:15:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 22.32 / 37 (60.3%):  92%|█████████▎| 37/40 [08:55<00:10,  3.66s/it]

[92m13:15:35 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:15:35 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:15:35 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:15:44 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:15:44 - LiteLLM:INFO[0m: cost

Average Metric: 22.99 / 38 (60.5%):  95%|█████████▌| 38/40 [09:50<00:37, 18.85s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:16:33 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:16:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:16:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 23.78 / 39 (61.0%):  98%|█████████▊| 39/40 [09:53<00:14, 14.04s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:16:40 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:16:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:16:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:16:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 24.44 / 40 (61.1%): 100%|██████████| 40/40 [10:01<00:00, 15.03s/it]

2025/06/03 13:16:40 INFO dspy.evaluate.evaluate: Average Metric: 24.444713638682302 / 40 (61.1%)
2025/06/03 13:16:40 INFO dspy.teleprompt.mipro_optimizer_v2: [92mBest full score so far![0m Score: 61.11
2025/06/03 13:16:40 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 61.11 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 2'].
2025/06/03 13:16:40 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [57.21, 60.52, 61.11]
2025/06/03 13:16:40 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 61.11


2025/06/03 13:16:40 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 4 / 18 =====



🏃 View run eval_full_2 at: http://localhost:5500/#/experiments/344816129373506955/runs/362936e7d50f4b74bf904217a91f0c56
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955
  0%|          | 0/40 [00:00<?, ?it/s]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:16:42 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:16:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:16:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 24.34 / 40 (60.9%): 100%|██████████| 40/40 [00:05<00:00,  7.42it/s]

2025/06/03 13:16:46 INFO dspy.evaluate.evaluate: Average Metric: 24.341447303449822 / 40 (60.9%)
2025/06/03 13:16:46 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 60.85 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 6'].
2025/06/03 13:16:46 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [57.21, 60.52, 61.11, 60.85]
2025/06/03 13:16:46 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 61.11


2025/06/03 13:16:46 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 5 / 18 =====



🏃 View run eval_full_3 at: http://localhost:5500/#/experiments/344816129373506955/runs/089548dd373b4bc59e68cec9dbca9521
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955
  0%|          | 0/40 [00:00<?, ?it/s]

[92m13:16:47 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:16:47 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:16:47 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:16:47 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:16:47 - LiteLLM:INFO[0m: utils.py:2991 - 
L

Average Metric: 0.67 / 1 (66.7%):   2%|▎         | 1/40 [01:46<1:09:23, 106.75s/it]

[92m13:18:33 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:18:36 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:18:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:18:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:18:36 - 

Average Metric: 1.33 / 2 (66.7%):   5%|▌         | 2/40 [01:50<29:11, 46.10s/it]   

[92m13:18:36 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:18:42 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:18:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:18:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 1.33 / 3 (44.4%):   8%|▊         | 3/40 [01:56<17:10, 27.86s/it]

[92m13:18:43 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:18:43 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:18:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:18:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:18:43 - 

Average Metric: 2.24 / 4 (56.1%):  10%|█         | 4/40 [01:57<10:13, 17.05s/it]

[92m13:18:43 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:18:48 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:18:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:18:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 2.91 / 5 (58.2%):  12%|█▎        | 5/40 [02:01<07:20, 12.60s/it]

[92m13:18:48 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:18:49 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:18:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:18:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:18:49 - 

Average Metric: 3.68 / 6 (61.4%):  15%|█▌        | 6/40 [02:03<05:01,  8.87s/it]

[92m13:18:49 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:18:53 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:18:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:18:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 4.35 / 7 (62.1%):  18%|█▊        | 7/40 [02:07<03:59,  7.26s/it]

[92m13:18:53 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:18:59 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:18:59 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:18:59 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:18:59 - 

Average Metric: 5.24 / 8 (65.5%):  20%|██        | 8/40 [02:13<03:42,  6.94s/it]

[92m13:19:00 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:19:00 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:19:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:19:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 6.13 / 9 (68.1%):  22%|██▎       | 9/40 [02:13<02:29,  4.84s/it]

[92m13:19:00 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:19:08 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:19:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:19:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:19:08 - 

Average Metric: 6.73 / 10 (67.3%):  25%|██▌       | 10/40 [02:22<03:01,  6.07s/it]

[92m13:19:09 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:19:12 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:19:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:19:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:19:12 - 

Average Metric: 7.50 / 11 (68.2%):  28%|██▊       | 11/40 [02:38<04:19,  8.94s/it]

[92m13:19:24 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:19:24 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:19:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:19:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 8.17 / 12 (68.1%):  30%|███       | 12/40 [02:38<02:58,  6.37s/it]

[92m13:19:24 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:19:29 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:19:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:19:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:19:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 8.77 / 13 (67.5%):  32%|███▎      | 13/40 [02:43<02:41,  6.00s/it]

[92m13:19:30 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:19:32 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:19:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:19:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 9.44 / 14 (67.4%):  35%|███▌      | 14/40 [02:45<02:05,  4.85s/it]

[92m13:19:32 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:19:39 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:19:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:19:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:19:39 - 

Average Metric: 10.22 / 15 (68.2%):  38%|███▊      | 15/40 [02:53<02:21,  5.68s/it]

[92m13:19:39 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:19:43 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:19:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:19:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 11.00 / 16 (68.7%):  40%|████      | 16/40 [02:56<01:59,  4.99s/it]

[92m13:19:43 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:19:55 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:19:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:19:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:19:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 11.67 / 17 (68.6%):  42%|████▎     | 17/40 [03:08<02:42,  7.09s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:19:57 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:19:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:19:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 12.44 / 18 (69.1%):  45%|████▌     | 18/40 [03:11<02:04,  5.65s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:20:06 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:20:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:20:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:20:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 13.11 / 19 (69.0%):  48%|████▊     | 19/40 [03:51<05:35, 16.00s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:20:39 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:20:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:20:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:20:39 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 13.79 / 20 (69.0%):  50%|█████     | 20/40 [04:21<06:46, 20.31s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:21:16 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:21:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:21:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:21:16 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 13.79 / 21 (65.7%):  52%|█████▎    | 21/40 [04:55<07:41, 24.30s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:21:44 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:21:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:21:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:21:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 13.79 / 22 (62.7%):  55%|█████▌    | 22/40 [04:58<05:23, 17.97s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:21:48 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:21:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:21:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 14.68 / 23 (63.8%):  57%|█████▊    | 23/40 [05:02<03:54, 13.81s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:21:51 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:21:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:21:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:21:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 15.16 / 24 (63.2%):  60%|██████    | 24/40 [05:05<02:49, 10.57s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:21:58 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:21:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:21:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 15.83 / 25 (63.3%):  62%|██████▎   | 25/40 [05:11<02:19,  9.28s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:21:58 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:21:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:21:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:21:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 16.49 / 26 (63.4%):  65%|██████▌   | 26/40 [05:12<01:32,  6.64s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:22:06 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:22:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:22:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 17.16 / 27 (63.6%):  68%|██████▊   | 27/40 [05:20<01:31,  7.05s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:22:08 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:22:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:22:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:22:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:22:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 17.73 / 28 (63.3%):  70%|███████   | 28/40 [05:27<01:25,  7.13s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:22:22 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:22:22 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:22:22 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:22:22 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 18.40 / 29 (63.4%):  72%|███████▎  | 29/40 [05:36<01:24,  7.67s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:22:26 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:22:26 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:22:26 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 18.89 / 30 (63.0%):  75%|███████▌  | 30/40 [05:39<01:03,  6.39s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:22:30 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:22:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:22:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:22:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:22:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 19.56 / 31 (63.1%):  78%|███████▊  | 31/40 [05:44<00:52,  5.83s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:22:34 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:22:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:22:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:22:34 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 19.56 / 32 (61.1%):  80%|████████  | 32/40 [05:51<00:50,  6.26s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:22:41 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:22:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:22:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 20.22 / 33 (61.3%):  82%|████████▎ | 33/40 [05:54<00:37,  5.31s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:22:47 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:22:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:22:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:22:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 20.82 / 34 (61.2%):  85%|████████▌ | 34/40 [06:01<00:34,  5.78s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:22:49 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:22:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:22:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 21.67 / 35 (61.9%):  88%|████████▊ | 35/40 [06:03<00:23,  4.61s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:22:53 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:22:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:22:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:22:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 22.34 / 36 (62.0%):  90%|█████████ | 36/40 [06:07<00:17,  4.42s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:23:00 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:23:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:23:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:23:00 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 23.11 / 37 (62.5%):  92%|█████████▎| 37/40 [06:15<00:16,  5.56s/it]

[92m13:23:02 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:23:02 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:23:02 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:23:09 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:23:09 - LiteLLM:INFO[0m: cost

Average Metric: 23.11 / 38 (60.8%):  95%|█████████▌| 38/40 [06:23<00:12,  6.26s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:23:11 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:23:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:23:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:23:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 23.78 / 39 (61.0%):  98%|█████████▊| 39/40 [06:25<00:04,  4.90s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:23:17 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:23:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:23:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 24.44 / 40 (61.1%): 100%|██████████| 40/40 [06:30<00:00,  9.77s/it]

2025/06/03 13:23:17 INFO dspy.evaluate.evaluate: Average Metric: 24.443529934276533 / 40 (61.1%)
2025/06/03 13:23:17 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 61.11 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 4'].
2025/06/03 13:23:17 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [57.21, 60.52, 61.11, 60.85, 61.11]
2025/06/03 13:23:17 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 61.11


2025/06/03 13:23:17 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 6 / 18 =====



🏃 View run eval_full_4 at: http://localhost:5500/#/experiments/344816129373506955/runs/b0a8cc5613b743f68ec1d7bb5dd1b77d
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955
  0%|          | 0/40 [00:00<?, ?it/s]

[92m13:23:19 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:23:19 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:23:19 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:23:19 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:23:19 - LiteLLM:INFO[0m: utils.py:2991 - 
L

Average Metric: 0.79 / 1 (78.9%):   2%|▎         | 1/40 [01:49<1:10:52, 109.04s/it]

[92m13:25:06 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:25:14 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:25:14 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:25:14 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:25:14 - 

Average Metric: 1.41 / 2 (70.7%):   5%|▌         | 2/40 [02:03<34:00, 53.70s/it]   

[92m13:25:21 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:25:21 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:25:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:25:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:25:21 - 

Average Metric: 2.08 / 3 (69.4%):   8%|▊         | 3/40 [02:04<18:11, 29.51s/it]

[92m13:25:22 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:25:27 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:25:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:25:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 2.08 / 4 (52.0%):  10%|█         | 4/40 [02:10<12:04, 20.13s/it]

[92m13:25:27 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:25:30 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:25:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:25:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:25:30 - 

Average Metric: 2.87 / 5 (57.4%):  12%|█▎        | 5/40 [02:13<08:10, 14.00s/it]

[92m13:25:31 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:25:38 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:25:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:25:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 3.20 / 6 (53.4%):  15%|█▌        | 6/40 [02:21<06:42, 11.85s/it]

[92m13:25:38 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:25:40 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:25:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:25:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:25:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 4.09 / 7 (58.5%):  18%|█▊        | 7/40 [02:23<04:45,  8.64s/it]

[92m13:25:40 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:25:47 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:25:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:25:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 4.76 / 8 (59.5%):  20%|██        | 8/40 [02:29<04:14,  7.94s/it]

[92m13:25:47 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:25:48 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:25:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:25:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:25:48 - 

Average Metric: 5.61 / 9 (62.3%):  22%|██▎       | 9/40 [02:30<02:59,  5.79s/it]

[92m13:25:48 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:25:54 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:25:54 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:25:54 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 6.45 / 10 (64.5%):  25%|██▌       | 10/40 [02:36<02:56,  5.87s/it]

[92m13:25:54 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:25:54 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:25:54 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:25:54 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:25:54 - 

Average Metric: 6.45 / 11 (58.7%):  28%|██▊       | 11/40 [02:37<02:04,  4.29s/it]

[92m13:25:55 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:25:58 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:25:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:25:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 7.12 / 12 (59.3%):  30%|███       | 12/40 [02:41<01:55,  4.12s/it]

[92m13:25:58 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:26:01 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:26:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:26:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:26:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 7.79 / 13 (59.9%):  32%|███▎      | 13/40 [02:43<01:37,  3.62s/it]

[92m13:26:01 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:26:09 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:26:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:26:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 8.45 / 14 (60.4%):  35%|███▌      | 14/40 [02:52<02:13,  5.13s/it]

[92m13:26:09 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:26:12 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:26:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:26:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:26:12 - 

Average Metric: 9.23 / 15 (61.5%):  38%|███▊      | 15/40 [02:54<01:48,  4.35s/it]

[92m13:26:12 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:26:16 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:26:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:26:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 9.89 / 16 (61.8%):  40%|████      | 16/40 [02:58<01:41,  4.23s/it]

[92m13:26:16 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:26:17 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:26:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:26:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:26:17 - 

Average Metric: 10.56 / 17 (62.1%):  42%|████▎     | 17/40 [03:00<01:20,  3.51s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:26:23 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:26:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:26:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 11.34 / 18 (63.0%):  45%|████▌     | 18/40 [03:06<01:29,  4.06s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:26:28 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:26:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:26:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:26:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:26:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 12.11 / 19 (63.7%):  48%|████▊     | 19/40 [03:14<01:55,  5.51s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:26:37 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:26:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:26:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:26:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 12.11 / 20 (60.5%):  50%|█████     | 20/40 [04:49<10:45, 32.27s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:28:12 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:28:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:28:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:28:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:28:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 12.11 / 21 (57.7%):  52%|█████▎    | 21/40 [04:58<08:00, 25.30s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:28:20 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:28:20 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:28:20 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:28:20 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 12.64 / 22 (57.5%):  55%|█████▌    | 22/40 [05:03<05:46, 19.23s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:28:24 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:28:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:28:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 12.93 / 23 (56.2%):  57%|█████▊    | 23/40 [05:07<04:07, 14.59s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:28:28 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:28:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:28:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:28:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:28:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 13.60 / 24 (56.6%):  60%|██████    | 24/40 [05:11<03:03, 11.48s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:28:32 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:28:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:28:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 13.60 / 25 (54.4%):  62%|██████▎   | 25/40 [05:15<02:18,  9.23s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:28:38 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:28:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:28:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:28:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 14.44 / 26 (55.5%):  65%|██████▌   | 26/40 [05:21<01:53,  8.13s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:28:50 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:28:50 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:28:50 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 14.44 / 27 (53.5%):  68%|██████▊   | 27/40 [05:33<02:02,  9.39s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:28:58 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:28:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:28:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:28:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 14.78 / 28 (52.8%):  70%|███████   | 28/40 [05:41<01:45,  8.81s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:29:06 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:29:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:29:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 14.78 / 29 (51.0%):  72%|███████▎  | 29/40 [05:49<01:34,  8.57s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:29:10 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:29:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:29:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:29:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 14.78 / 30 (49.3%):  75%|███████▌  | 30/40 [05:53<01:12,  7.30s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:29:12 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:29:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:29:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 15.44 / 31 (49.8%):  78%|███████▊  | 31/40 [05:55<00:50,  5.64s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:29:17 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:29:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:29:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:29:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:29:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 16.04 / 32 (50.1%):  80%|████████  | 32/40 [06:00<00:44,  5.57s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:29:22 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:29:22 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:29:22 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 16.71 / 33 (50.6%):  82%|████████▎ | 33/40 [06:05<00:37,  5.36s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:29:29 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:29:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:29:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:29:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 17.60 / 34 (51.8%):  85%|████████▌ | 34/40 [06:12<00:34,  5.73s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:29:33 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:29:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:29:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:29:33 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 18.39 / 35 (52.5%):  88%|████████▊ | 35/40 [06:25<00:39,  7.93s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:29:44 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:29:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:29:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:29:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:29:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 19.28 / 36 (53.5%):  90%|█████████ | 36/40 [06:27<00:24,  6.15s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:29:51 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:29:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:29:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 19.94 / 37 (53.9%):  92%|█████████▎| 37/40 [06:33<00:18,  6.33s/it]

[92m13:29:51 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:29:51 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:29:51 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:29:52 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:29:52 - LiteLLM:INFO[0m: cost

Average Metric: 20.80 / 38 (54.7%):  95%|█████████▌| 38/40 [06:35<00:09,  4.88s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:30:01 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:30:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:30:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 21.65 / 39 (55.5%):  98%|█████████▊| 39/40 [06:44<00:06,  6.16s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:30:02 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:30:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:30:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:30:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 22.22 / 40 (55.5%): 100%|██████████| 40/40 [06:45<00:00, 10.14s/it]

2025/06/03 13:30:02 INFO dspy.evaluate.evaluate: Average Metric: 22.218507747432557 / 40 (55.5%)
2025/06/03 13:30:02 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 55.55 with parameters ['Predictor 0: Instruction 3', 'Predictor 0: Few-Shot Set 5'].
2025/06/03 13:30:02 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [57.21, 60.52, 61.11, 60.85, 61.11, 55.55]
2025/06/03 13:30:02 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 61.11


2025/06/03 13:30:02 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 7 / 18 =====



🏃 View run eval_full_5 at: http://localhost:5500/#/experiments/344816129373506955/runs/464d4d7a7b7145d490ed17e3562ef023
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955
  0%|          | 0/40 [00:00<?, ?it/s]

[92m13:30:03 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:30:03 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:30:03 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:30:03 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:30:03 - LiteLLM:INFO[0m: utils.py:2991 - 
L

Average Metric: 0.89 / 1 (88.9%):   2%|▎         | 1/40 [01:35<1:01:47, 95.06s/it]

[92m13:31:38 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:31:41 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:31:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:31:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:31:41 - 

Average Metric: 0.89 / 2 (44.4%):   5%|▌         | 2/40 [01:45<28:48, 45.49s/it]  

[92m13:31:48 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:31:51 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:31:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:31:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 1.50 / 3 (50.1%):   8%|▊         | 3/40 [01:48<16:02, 26.02s/it]

[92m13:31:51 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:31:56 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:31:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:31:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:31:56 - 

Average Metric: 2.17 / 4 (54.3%):  10%|█         | 4/40 [01:53<10:36, 17.68s/it]

[92m13:31:56 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:31:57 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:31:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:31:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 2.84 / 5 (56.8%):  12%|█▎        | 5/40 [01:54<06:42, 11.51s/it]

[92m13:31:57 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:32:01 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:32:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 3.50 / 6 (58.4%):  15%|█▌        | 6/40 [01:58<05:13,  9.23s/it]

[92m13:32:01 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:32:08 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:32:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 4.28 / 7 (61.1%):  18%|█▊        | 7/40 [02:05<04:35,  8.34s/it]

[92m13:32:08 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:32:08 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:32:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 5.07 / 8 (63.3%):  20%|██        | 8/40 [02:05<03:06,  5.84s/it]

[92m13:32:08 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:32:13 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:32:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 5.84 / 9 (64.9%):  22%|██▎       | 9/40 [02:11<02:54,  5.62s/it]

[92m13:32:14 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:32:15 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:32:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:15 - 

Average Metric: 6.51 / 10 (65.1%):  25%|██▌       | 10/40 [02:12<02:09,  4.33s/it]

[92m13:32:15 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:32:23 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:32:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 7.18 / 11 (65.2%):  28%|██▊       | 11/40 [02:20<02:37,  5.44s/it]

[92m13:32:23 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:32:24 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:32:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:24 - 

Average Metric: 7.95 / 12 (66.2%):  30%|███       | 12/40 [02:21<01:56,  4.17s/it]

[92m13:32:24 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:32:30 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:32:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 8.72 / 13 (67.1%):  32%|███▎      | 13/40 [02:27<02:04,  4.59s/it]

[92m13:32:30 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:32:31 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:32:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 9.57 / 14 (68.4%):  35%|███▌      | 14/40 [02:28<01:35,  3.66s/it]

[92m13:32:31 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:32:36 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:32:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 10.24 / 15 (68.3%):  38%|███▊      | 15/40 [02:33<01:38,  3.95s/it]

[92m13:32:36 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:32:38 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:32:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:38 - 

Average Metric: 10.57 / 16 (66.1%):  40%|████      | 16/40 [02:35<01:23,  3.47s/it]

[92m13:32:38 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:32:42 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:32:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 11.35 / 17 (66.7%):  42%|████▎     | 17/40 [02:39<01:22,  3.58s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:32:44 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:32:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 11.92 / 18 (66.2%):  45%|████▌     | 18/40 [02:42<01:11,  3.25s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:32:51 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:32:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:51 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 12.67 / 19 (66.7%):  48%|████▊     | 19/40 [02:51<01:46,  5.09s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:32:59 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:32:59 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:59 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:32:59 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 13.33 / 20 (66.7%):  50%|█████     | 20/40 [03:55<07:37, 22.89s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:34:02 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:34:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:34:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:34:02 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 13.33 / 21 (63.5%):  52%|█████▎    | 21/40 [04:13<06:46, 21.38s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:34:18 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:34:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:34:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 13.33 / 22 (60.6%):  55%|█████▌    | 22/40 [04:15<04:37, 15.43s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:34:21 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:34:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:34:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:34:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:34:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 14.02 / 23 (61.0%):  57%|█████▊    | 23/40 [04:23<03:46, 13.33s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:34:30 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:34:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:34:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:34:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:34:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 14.30 / 24 (59.6%):  60%|██████    | 24/40 [04:27<02:49, 10.59s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:34:36 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:34:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:34:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 14.97 / 25 (59.9%):  62%|██████▎   | 25/40 [04:33<02:16,  9.11s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:34:38 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:34:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:34:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:34:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:34:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 15.64 / 26 (60.1%):  65%|██████▌   | 26/40 [04:35<01:38,  7.00s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:34:49 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:34:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:34:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 16.49 / 27 (61.1%):  68%|██████▊   | 27/40 [04:46<01:45,  8.15s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:34:56 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:34:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:34:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:34:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 17.17 / 28 (61.3%):  70%|███████   | 28/40 [04:53<01:34,  7.87s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:34:58 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:34:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:34:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 17.65 / 29 (60.9%):  72%|███████▎  | 29/40 [04:55<01:05,  5.98s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:35:03 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:35:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:35:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:35:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:35:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 17.65 / 30 (58.8%):  75%|███████▌  | 30/40 [05:00<00:58,  5.83s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:35:07 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:35:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:35:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 18.22 / 31 (58.8%):  78%|███████▊  | 31/40 [05:04<00:47,  5.24s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:35:09 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:35:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:35:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:35:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 18.89 / 32 (59.0%):  80%|████████  | 32/40 [05:07<00:35,  4.42s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:35:17 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:35:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:35:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 19.78 / 33 (59.9%):  82%|████████▎ | 33/40 [05:14<00:36,  5.27s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:35:18 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:35:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:35:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:35:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 20.44 / 34 (60.1%):  85%|████████▌ | 34/40 [05:15<00:23,  3.93s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:35:25 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:35:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:35:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:35:25 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 21.04 / 35 (60.1%):  88%|████████▊ | 35/40 [05:25<00:29,  5.81s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:35:38 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:35:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:35:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 21.71 / 36 (60.3%):  90%|█████████ | 36/40 [05:35<00:28,  7.01s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:35:40 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:35:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:35:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:35:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 22.38 / 37 (60.5%):  92%|█████████▎| 37/40 [05:38<00:17,  5.77s/it]

[92m13:35:41 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:35:41 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:35:41 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:35:45 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:35:45 - LiteLLM:INFO[0m: cost

Average Metric: 23.04 / 38 (60.6%):  95%|█████████▌| 38/40 [05:44<00:11,  5.85s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:35:53 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:35:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:35:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:35:53 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 23.71 / 39 (60.8%):  98%|█████████▊| 39/40 [05:53<00:06,  6.84s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:35:58 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:35:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:35:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 23.71 / 39 (60.8%): 100%|██████████| 40/40 [05:55<00:00,  5.61s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:36:04 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:36:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:36:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:36:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 23.71 / 39 (60.8%): : 41it [06:01,  5.60s/it]                      

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:36:06 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:36:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:36:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 24.38 / 40 (60.9%): : 42it [06:03,  8.66s/it]

2025/06/03 13:36:06 INFO dspy.evaluate.evaluate: Average Metric: 24.377706507603552 / 40 (60.9%)
2025/06/03 13:36:06 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 60.94 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 6'].
2025/06/03 13:36:06 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [57.21, 60.52, 61.11, 60.85, 61.11, 55.55, 60.94]
2025/06/03 13:36:06 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 61.11


2025/06/03 13:36:06 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 8 / 18 =====



🏃 View run eval_full_6 at: http://localhost:5500/#/experiments/344816129373506955/runs/cbc3af74d4b84445b9d79cf5b8fc74eb
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955
  0%|          | 0/40 [00:00<?, ?it/s]

[92m13:36:09 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:36:09 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:36:09 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:36:09 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:36:09 - LiteLLM:INFO[0m: utils.py:2991 - 
L

Average Metric: 0.85 / 1 (84.7%):   2%|▎         | 1/40 [01:26<56:11, 86.45s/it]

[92m13:37:33 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:37:34 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:37:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:37:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 1.53 / 2 (76.6%):   5%|▌         | 2/40 [01:27<22:58, 36.28s/it]

[92m13:37:34 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:37:39 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:37:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:37:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:37:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 2.20 / 3 (73.3%):   8%|▊         | 3/40 [01:32<13:38, 22.13s/it]

[92m13:37:39 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:37:41 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:37:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:37:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 2.87 / 4 (71.7%):  10%|█         | 4/40 [01:35<08:33, 14.25s/it]

[92m13:37:41 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:37:46 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:37:46 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:37:46 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:37:46 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 3.66 / 5 (73.1%):  12%|█▎        | 5/40 [01:39<06:20, 10.86s/it]

[92m13:37:46 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:37:48 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:37:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:37:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 4.32 / 6 (72.0%):  15%|█▌        | 6/40 [01:41<04:23,  7.74s/it]

[92m13:37:48 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:37:54 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:37:54 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:37:54 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:37:54 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 5.10 / 7 (72.8%):  18%|█▊        | 7/40 [01:48<04:03,  7.38s/it]

[92m13:37:55 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:37:58 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:37:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:37:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 5.87 / 8 (73.4%):  20%|██        | 8/40 [01:51<03:18,  6.21s/it]

[92m13:37:58 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:38:00 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:38:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:38:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:38:00 - 

Average Metric: 6.54 / 9 (72.6%):  22%|██▎       | 9/40 [01:54<02:33,  4.96s/it]

[92m13:38:00 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:38:04 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:38:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:38:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 7.20 / 10 (72.0%):  25%|██▌       | 10/40 [01:58<02:20,  4.69s/it]

[92m13:38:05 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:38:07 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:38:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:38:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:38:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 8.09 / 11 (73.6%):  28%|██▊       | 11/40 [02:00<01:58,  4.08s/it]

[92m13:38:07 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:38:11 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:38:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:38:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 8.76 / 12 (73.0%):  30%|███       | 12/40 [02:04<01:50,  3.94s/it]

[92m13:38:11 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:38:16 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:38:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:38:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:38:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 9.43 / 13 (72.5%):  32%|███▎      | 13/40 [02:09<01:58,  4.40s/it]

[92m13:38:16 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:38:18 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:38:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:38:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 10.09 / 14 (72.1%):  35%|███▌      | 14/40 [02:12<01:36,  3.70s/it]

[92m13:38:18 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:38:21 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:38:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:38:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:38:21 - 

Average Metric: 10.87 / 15 (72.4%):  38%|███▊      | 15/40 [02:15<01:27,  3.50s/it]

[92m13:38:21 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:38:27 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:38:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:38:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 11.53 / 16 (72.1%):  40%|████      | 16/40 [02:20<01:38,  4.09s/it]

[92m13:38:27 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:38:29 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:38:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:38:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:38:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 12.20 / 17 (71.8%):  42%|████▎     | 17/40 [02:22<01:22,  3.58s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:38:32 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:38:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:38:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 12.99 / 18 (72.2%):  45%|████▌     | 18/40 [02:25<01:12,  3.29s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:38:35 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:38:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:38:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:38:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 13.59 / 19 (71.5%):  48%|████▊     | 19/40 [02:28<01:08,  3.26s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:38:36 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:38:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:38:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:38:37 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 13.59 / 20 (67.9%):  50%|█████     | 20/40 [03:49<08:50, 26.54s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:39:57 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:39:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:39:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:39:57 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 13.59 / 21 (64.7%):  52%|█████▎    | 21/40 [03:55<06:25, 20.31s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:40:03 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:40:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:40:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:40:03 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 14.28 / 22 (64.9%):  55%|█████▌    | 22/40 [04:02<04:53, 16.32s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:40:11 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:40:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:40:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 14.60 / 23 (63.5%):  57%|█████▊    | 23/40 [04:04<03:25, 12.06s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:40:16 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:40:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:40:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:40:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:40:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 15.26 / 24 (63.6%):  60%|██████    | 24/40 [04:10<02:42, 10.14s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:40:24 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:40:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:40:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 15.60 / 25 (62.4%):  62%|██████▎   | 25/40 [04:18<02:22,  9.48s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:40:34 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:40:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:40:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:40:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:40:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 16.37 / 26 (63.0%):  65%|██████▌   | 26/40 [04:27<02:13,  9.54s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:40:36 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:40:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:40:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 16.85 / 27 (62.4%):  68%|██████▊   | 27/40 [04:29<01:34,  7.29s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:40:44 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:40:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:40:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:40:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:40:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 16.85 / 28 (60.2%):  70%|███████   | 28/40 [04:37<01:30,  7.55s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:40:51 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:40:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:40:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 17.42 / 29 (60.1%):  72%|███████▎  | 29/40 [04:45<01:22,  7.48s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:40:53 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:40:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:40:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:40:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:40:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 17.95 / 30 (59.8%):  75%|███████▌  | 30/40 [04:46<00:57,  5.73s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:40:57 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:40:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:40:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 18.62 / 31 (60.1%):  78%|███████▊  | 31/40 [04:50<00:46,  5.19s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:41:01 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:41:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:41:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:41:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 19.24 / 32 (60.1%):  80%|████████  | 32/40 [04:54<00:37,  4.72s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:41:07 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:41:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:41:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 20.13 / 33 (61.0%):  82%|████████▎ | 33/40 [05:00<00:35,  5.06s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:41:07 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:41:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:41:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:41:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 20.79 / 34 (61.2%):  85%|████████▌ | 34/40 [05:01<00:22,  3.81s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:41:13 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:41:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:41:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 21.64 / 35 (61.8%):  88%|████████▊ | 35/40 [05:06<00:21,  4.34s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:41:15 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:41:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:41:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:41:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:41:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 22.24 / 36 (61.8%):  90%|█████████ | 36/40 [05:08<00:14,  3.52s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:41:21 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:41:21 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:41:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:41:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:41:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: op

Average Metric: 22.91 / 37 (61.9%):  92%|█████████▎| 37/40 [05:14<00:13,  4.42s/it]

[92m13:41:21 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:41:21 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:41:21 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:41:21 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/

Average Metric: 23.66 / 38 (62.3%):  95%|█████████▌| 38/40 [05:23<00:11,  5.54s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:41:38 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:41:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:41:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:41:38 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 24.43 / 39 (62.6%):  98%|█████████▊| 39/40 [05:32<00:06,  6.81s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:41:45 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:41:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:41:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 24.43 / 39 (62.6%): 100%|██████████| 40/40 [05:38<00:00,  6.56s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:41:47 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:41:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:41:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:41:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:41:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 24.43 / 39 (62.6%): : 41it [05:41,  5.32s/it]                      

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:41:57 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:41:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:41:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 25.00 / 40 (62.5%): : 42it [05:51,  8.36s/it]

2025/06/03 13:41:57 INFO dspy.evaluate.evaluate: Average Metric: 25.001450947336117 / 40 (62.5%)
2025/06/03 13:41:57 INFO dspy.teleprompt.mipro_optimizer_v2: [92mBest full score so far![0m Score: 62.5
2025/06/03 13:41:57 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 62.5 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 1'].
2025/06/03 13:41:57 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [57.21, 60.52, 61.11, 60.85, 61.11, 55.55, 60.94, 62.5]
2025/06/03 13:41:57 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 62.5


2025/06/03 13:41:57 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 9 / 18 =====



🏃 View run eval_full_7 at: http://localhost:5500/#/experiments/344816129373506955/runs/564c591f36f0441296ff2f2c0341b5c5
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955
  0%|          | 0/40 [00:00<?, ?it/s]

[92m13:41:58 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:41:58 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:41:58 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:41:58 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:41:58 - LiteLLM:INFO[0m: utils.py:2991 - 
L

Average Metric: 0.67 / 1 (66.7%):   2%|▎         | 1/40 [01:47<1:09:43, 107.26s/it]

[92m13:43:45 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:43:51 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:43:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:43:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 1.33 / 2 (66.7%):   5%|▌         | 2/40 [01:53<30:20, 47.92s/it]   

[92m13:43:51 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:43:53 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:43:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:43:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:43:53 - 

Average Metric: 2.11 / 3 (70.3%):   8%|▊         | 3/40 [01:56<16:45, 27.17s/it]

[92m13:43:54 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:43:59 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:43:59 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:43:59 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 2.95 / 4 (73.9%):  10%|█         | 4/40 [02:01<11:11, 18.66s/it]

[92m13:43:59 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:44:00 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:44:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:44:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:44:00 - 

Average Metric: 3.62 / 5 (72.4%):  12%|█▎        | 5/40 [02:02<07:09, 12.27s/it]

[92m13:44:00 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:44:07 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:44:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:44:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 4.47 / 6 (74.5%):  15%|█▌        | 6/40 [02:09<05:51, 10.34s/it]

[92m13:44:07 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:44:07 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:44:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:44:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:44:07 - 

Average Metric: 4.47 / 7 (63.8%):  18%|█▊        | 7/40 [02:09<03:56,  7.16s/it]

[92m13:44:07 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:44:17 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:44:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:44:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 4.80 / 8 (60.0%):  20%|██        | 8/40 [02:19<04:12,  7.89s/it]

[92m13:44:17 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:44:19 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:44:19 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:44:19 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:44:19 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 5.62 / 9 (62.4%):  22%|██▎       | 9/40 [02:21<03:08,  6.10s/it]

[92m13:44:19 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:44:26 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:44:26 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:44:26 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:44:26

Average Metric: 6.47 / 11 (58.8%):  25%|██▌       | 10/40 [02:28<03:14,  6.48s/it]

[92m13:44:26 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:44:26 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:44:33 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:44:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:44:33 - LiteLLM:INF

Average Metric: 7.31 / 12 (60.9%):  30%|███       | 12/40 [02:35<02:18,  4.95s/it]

[92m13:44:33 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:44:33 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:44:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:44:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:44:33 - 

Average Metric: 8.16 / 13 (62.8%):  32%|███▎      | 13/40 [02:35<01:44,  3.88s/it]

[92m13:44:33 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:44:38 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:44:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:44:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 8.83 / 14 (63.1%):  35%|███▌      | 14/40 [02:40<01:43,  3.98s/it]

[92m13:44:38 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:44:39 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:44:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:44:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:44:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 9.74 / 15 (64.9%):  38%|███▊      | 15/40 [02:41<01:21,  3.26s/it]

[92m13:44:39 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:44:43 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:44:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:44:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 10.40 / 16 (65.0%):  40%|████      | 16/40 [02:45<01:24,  3.52s/it]

[92m13:44:43 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:44:47 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:44:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:44:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:44:47 - 

Average Metric: 11.29 / 17 (66.4%):  42%|████▎     | 17/40 [02:49<01:22,  3.61s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:44:53 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:44:53 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:44:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:44:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[

Average Metric: 12.73 / 19 (67.0%):  45%|████▌     | 18/40 [02:55<01:33,  4.24s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:45:09 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:45:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:45:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:45:09 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 12.73 / 20 (63.7%):  50%|█████     | 20/40 [04:53<09:44, 29.21s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:46:55 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:46:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:46:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 12.73 / 21 (60.6%):  52%|█████▎    | 21/40 [04:57<07:14, 22.84s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:47:00 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:47:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 13.05 / 22 (59.3%):  55%|█████▌    | 22/40 [05:03<05:32, 18.45s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:47:02 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:47:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 13.74 / 23 (59.7%):  57%|█████▊    | 23/40 [05:04<03:56, 13.90s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:47:09 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:47:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 14.41 / 24 (60.0%):  60%|██████    | 24/40 [05:11<03:11, 11.94s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:47:09 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:47:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 15.07 / 25 (60.3%):  62%|██████▎   | 25/40 [05:11<02:10,  8.68s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:47:15 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:47:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 15.92 / 26 (61.2%):  65%|██████▌   | 26/40 [05:18<01:51,  7.95s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:47:26 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:47:26 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:26 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 16.60 / 27 (61.5%):  68%|██████▊   | 27/40 [05:28<01:52,  8.66s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:47:28 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:47:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 17.33 / 28 (61.9%):  70%|███████   | 28/40 [05:30<01:21,  6.80s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:47:34 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:47:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 17.33 / 29 (59.8%):  72%|███████▎  | 29/40 [05:36<01:11,  6.48s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:47:35 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:47:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 18.00 / 30 (60.0%):  75%|███████▌  | 30/40 [05:37<00:47,  4.75s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:47:40 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:47:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 18.67 / 31 (60.2%):  78%|███████▊  | 31/40 [05:42<00:43,  4.88s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:47:42 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:47:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 19.24 / 32 (60.1%):  80%|████████  | 32/40 [05:44<00:31,  3.99s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:47:49 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:47:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 19.90 / 33 (60.3%):  82%|████████▎ | 33/40 [05:51<00:34,  4.87s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:47:49 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:47:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 20.57 / 34 (60.5%):  85%|████████▌ | 34/40 [05:51<00:21,  3.63s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:47:54 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:47:54 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:54 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 21.36 / 35 (61.0%):  88%|████████▊ | 35/40 [05:56<00:19,  3.92s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:47:56 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:47:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:47:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 21.96 / 36 (61.0%):  90%|█████████ | 36/40 [05:58<00:13,  3.43s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:48:02 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:48:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:48:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 22.63 / 37 (61.2%):  92%|█████████▎| 37/40 [06:04<00:12,  4.09s/it]

[92m13:48:02 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:48:02 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:48:02 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:48:06 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:48:06 - LiteLLM:INFO[0m: cost

Average Metric: 23.29 / 38 (61.3%):  95%|█████████▌| 38/40 [06:08<00:08,  4.19s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:48:14 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:48:14 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:48:14 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:48:14 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 23.96 / 39 (61.4%):  98%|█████████▊| 39/40 [06:17<00:05,  5.37s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:48:20 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:48:20 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:48:20 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 23.96 / 39 (61.4%): 100%|██████████| 40/40 [06:22<00:00,  5.40s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:48:23 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:48:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:48:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:48:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 23.96 / 39 (61.4%): : 41it [06:30,  6.15s/it]                      

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:48:29 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:48:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:48:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:48:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:48:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 24.85 / 40 (62.1%): : 42it [06:32,  9.34s/it]

2025/06/03 13:48:30 INFO dspy.evaluate.evaluate: Average Metric: 24.8487274306749 / 40 (62.1%)
2025/06/03 13:48:30 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 62.12 with parameters ['Predictor 0: Instruction 3', 'Predictor 0: Few-Shot Set 3'].
2025/06/03 13:48:30 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [57.21, 60.52, 61.11, 60.85, 61.11, 55.55, 60.94, 62.5, 62.12]
2025/06/03 13:48:30 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 62.5


2025/06/03 13:48:30 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 10 / 18 =====



🏃 View run eval_full_8 at: http://localhost:5500/#/experiments/344816129373506955/runs/09fd8d0732314356b294adbb81b6b2e1
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955
  0%|          | 0/40 [00:00<?, ?it/s]

[92m13:48:33 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:48:33 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:48:33 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:48:33 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:48:33 - LiteLLM:INFO[0m: utils.py:2991 - 
L

Average Metric: 0.85 / 1 (84.7%):   2%|▎         | 1/40 [01:57<1:16:26, 117.60s/it]

[92m13:50:27 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:50:28 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:50:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:50:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:50:28 - 

Average Metric: 1.70 / 2 (85.2%):   5%|▌         | 2/40 [01:58<30:52, 48.74s/it]   

[92m13:50:28 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:50:34 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:50:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:50:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 2.30 / 3 (76.8%):   8%|▊         | 3/40 [02:04<18:04, 29.31s/it]

[92m13:50:34 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:50:38 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:50:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:50:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:50:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 3.08 / 4 (77.0%):  10%|█         | 4/40 [02:08<11:34, 19.30s/it]

[92m13:50:38 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:50:40 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:50:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:50:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 3.08 / 5 (61.6%):  12%|█▎        | 5/40 [02:10<07:37, 13.07s/it]

[92m13:50:40 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:50:48 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:50:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:50:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:50:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 3.08 / 6 (51.3%):  15%|█▌        | 6/40 [02:18<06:25, 11.33s/it]

[92m13:50:48 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:50:49 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:50:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:50:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 3.75 / 7 (53.5%):  18%|█▊        | 7/40 [02:19<04:23,  8.00s/it]

[92m13:50:49 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:50:57 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:50:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:50:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:50:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 4.41 / 8 (55.1%):  20%|██        | 8/40 [02:27<04:16,  8.03s/it]

[92m13:50:57 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:50:58 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:50:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:50:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 5.26 / 9 (58.4%):  22%|██▎       | 9/40 [02:28<02:58,  5.76s/it]

[92m13:50:58 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:51:05 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:51:05 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:51:05 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:51:05 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 6.03 / 10 (60.3%):  25%|██▌       | 10/40 [02:35<03:05,  6.18s/it]

[92m13:51:05 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:51:09 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:51:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:51:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 6.37 / 11 (57.9%):  28%|██▊       | 11/40 [02:39<02:42,  5.59s/it]

[92m13:51:09 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:51:11 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:51:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:51:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:51:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 7.26 / 12 (60.5%):  30%|███       | 12/40 [02:41<02:02,  4.37s/it]

[92m13:51:11 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:51:17 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:51:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:51:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 8.04 / 13 (61.9%):  32%|███▎      | 13/40 [02:47<02:17,  5.10s/it]

[92m13:51:18 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:51:18 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:51:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:51:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:51:18 - 

Average Metric: 8.93 / 14 (63.8%):  35%|███▌      | 14/40 [02:48<01:34,  3.62s/it]

[92m13:51:18 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:51:27 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:51:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:51:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 9.71 / 15 (64.7%):  38%|███▊      | 15/40 [02:57<02:15,  5.42s/it]

[92m13:51:27 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:51:28 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:51:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:51:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:51:28 - 

Average Metric: 10.37 / 16 (64.8%):  40%|████      | 16/40 [03:03<02:15,  5.63s/it]

[92m13:51:34 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:51:36 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:51:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:51:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:51:36 - 

Average Metric: 11.26 / 17 (66.3%):  42%|████▎     | 17/40 [03:06<01:46,  4.62s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:51:45 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:51:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:51:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 12.04 / 18 (66.9%):  45%|████▌     | 18/40 [03:15<02:12,  6.04s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:51:53 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:51:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:51:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:51:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:51:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 12.93 / 19 (68.0%):  48%|████▊     | 19/40 [04:32<09:32, 27.24s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:53:07 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:53:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:53:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:53:07 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 13.59 / 20 (68.0%):  50%|█████     | 20/40 [04:47<07:54, 23.70s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:53:24 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:53:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:53:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:53:24 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 13.59 / 21 (64.7%):  52%|█████▎    | 21/40 [05:00<06:28, 20.47s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:53:36 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:53:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:53:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:53:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:53:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 13.59 / 22 (61.8%):  55%|█████▌    | 22/40 [05:06<04:52, 16.23s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:53:41 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:53:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:53:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 13.91 / 23 (60.5%):  57%|█████▊    | 23/40 [05:11<03:36, 12.72s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:53:45 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:53:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:53:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:53:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 14.77 / 24 (61.5%):  60%|██████    | 24/40 [05:15<02:41, 10.10s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:53:52 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:53:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:53:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 15.44 / 25 (61.7%):  62%|██████▎   | 25/40 [05:22<02:18,  9.26s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:54:01 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:54:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 16.28 / 26 (62.6%):  65%|██████▌   | 26/40 [05:31<02:09,  9.26s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:54:03 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:54:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 16.95 / 27 (62.8%):  68%|██████▊   | 27/40 [05:33<01:29,  6.86s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:54:10 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:54:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 17.43 / 28 (62.3%):  70%|███████   | 28/40 [05:40<01:23,  6.93s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:54:19 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:54:19 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:19 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 17.43 / 29 (60.1%):  72%|███████▎  | 29/40 [05:49<01:22,  7.47s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:54:25 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:54:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 17.43 / 30 (58.1%):  75%|███████▌  | 30/40 [05:55<01:10,  7.03s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:54:25 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:54:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 18.00 / 31 (58.1%):  78%|███████▊  | 31/40 [05:55<00:46,  5.18s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:54:30 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:54:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 18.67 / 32 (58.3%):  80%|████████  | 32/40 [06:05<00:51,  6.42s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:54:38 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:54:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 19.62 / 33 (59.4%):  82%|████████▎ | 33/40 [06:08<00:39,  5.59s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:54:45 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:54:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 19.62 / 34 (57.7%):  85%|████████▌ | 34/40 [06:15<00:35,  5.92s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:54:49 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:54:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 20.41 / 35 (58.3%):  88%|████████▊ | 35/40 [06:19<00:26,  5.34s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:54:55 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:54:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:54:55 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 21.02 / 36 (58.4%):  90%|█████████ | 36/40 [06:28<00:25,  6.38s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:55:04 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:55:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:55:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 21.69 / 37 (58.6%):  92%|█████████▎| 37/40 [06:34<00:19,  6.43s/it]

[92m13:55:05 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:55:05 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:55:05 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:55:08 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:55:08 - LiteLLM:INFO[0m: cost

Average Metric: 22.46 / 38 (59.1%):  95%|█████████▌| 38/40 [06:40<00:12,  6.33s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:55:18 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:55:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:55:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:55:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:55:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 23.13 / 39 (59.3%):  98%|█████████▊| 39/40 [06:48<00:06,  6.80s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:55:22 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:55:22 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:55:22 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:55:22 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 23.13 / 39 (59.3%): 100%|██████████| 40/40 [06:54<00:00,  6.59s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:55:30 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:55:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:55:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 23.13 / 39 (59.3%): : 41it [07:00,  6.21s/it]                      

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:55:34 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:55:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:55:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:55:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 23.90 / 40 (59.8%): : 42it [07:03, 10.10s/it]

2025/06/03 13:55:34 INFO dspy.evaluate.evaluate: Average Metric: 23.902626381608616 / 40 (59.8%)
2025/06/03 13:55:34 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 59.76 with parameters ['Predictor 0: Instruction 3', 'Predictor 0: Few-Shot Set 10'].
2025/06/03 13:55:34 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [57.21, 60.52, 61.11, 60.85, 61.11, 55.55, 60.94, 62.5, 62.12, 59.76]
2025/06/03 13:55:34 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 62.5


2025/06/03 13:55:34 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 11 / 18 =====



🏃 View run eval_full_9 at: http://localhost:5500/#/experiments/344816129373506955/runs/f135822752054ef8967fd98a6b1a3fe7
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955
Average Metric: 25.31 / 40 (63.3%): 100%|██████████| 40/40 [00:05<00:00,  7.67it/s]

2025/06/03 13:55:39 INFO dspy.evaluate.evaluate: Average Metric: 25.308443206605418 / 40 (63.3%)
2025/06/03 13:55:39 INFO dspy.teleprompt.mipro_optimizer_v2: [92mBest full score so far![0m Score: 63.27
2025/06/03 13:55:39 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 63.27 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 1'].
2025/06/03 13:55:39 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [57.21, 60.52, 61.11, 60.85, 61.11, 55.55, 60.94, 62.5, 62.12, 59.76, 63.27]
2025/06/03 13:55:39 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 63.27


2025/06/03 13:55:39 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 12 / 18 =====



🏃 View run eval_full_10 at: http://localhost:5500/#/experiments/344816129373506955/runs/72f921a93eaf4189a5971029d5609101
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955
Average Metric: 25.31 / 40 (63.3%): 100%|██████████| 40/40 [00:05<00:00,  7.10it/s]

2025/06/03 13:55:45 INFO dspy.evaluate.evaluate: Average Metric: 25.308443206605418 / 40 (63.3%)
2025/06/03 13:55:45 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 63.27 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 1'].
2025/06/03 13:55:45 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [57.21, 60.52, 61.11, 60.85, 61.11, 55.55, 60.94, 62.5, 62.12, 59.76, 63.27, 63.27]
2025/06/03 13:55:45 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 63.27


2025/06/03 13:55:45 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 13 / 18 =====



🏃 View run eval_full_11 at: http://localhost:5500/#/experiments/344816129373506955/runs/1093ed0e5bb54c8eabbd5b3f7e280d3f
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955
  0%|          | 0/40 [00:00<?, ?it/s]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:55:45 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:55:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:55:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:55:45 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m13:55:45 - 

Average Metric: 0.67 / 1 (66.7%):   2%|▎         | 1/40 [01:20<52:04, 80.11s/it]

[92m13:57:05 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:57:11 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:57:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:57:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:57:11 - 

Average Metric: 1.24 / 2 (61.9%):   5%|▌         | 2/40 [01:26<23:09, 36.58s/it]

[92m13:57:11 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:57:13 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:57:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:57:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:57:13 - 

Average Metric: 1.90 / 3 (63.5%):   8%|▊         | 3/40 [01:34<14:29, 23.50s/it]

[92m13:57:19 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:57:20 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:57:20 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:57:20 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 2.79 / 4 (69.8%):  10%|█         | 4/40 [01:35<08:45, 14.60s/it]

[92m13:57:20 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:57:29 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:57:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:57:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:57:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 3.46 / 5 (69.2%):  12%|█▎        | 5/40 [01:43<07:18, 12.52s/it]

[92m13:57:29 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:57:30 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:57:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:57:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 3.46 / 6 (57.7%):  15%|█▌        | 6/40 [01:45<04:58,  8.77s/it]

[92m13:57:30 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:57:36 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:57:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:57:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:57:36 - 

Average Metric: 4.13 / 7 (59.0%):  18%|█▊        | 7/40 [01:51<04:21,  7.92s/it]

[92m13:57:36 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:57:43 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:57:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:57:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 4.79 / 8 (59.9%):  20%|██        | 8/40 [01:58<04:00,  7.51s/it]

[92m13:57:43 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:57:44 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:57:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:57:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:57:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 5.46 / 9 (60.7%):  22%|██▎       | 9/40 [01:59<02:54,  5.64s/it]

[92m13:57:45 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:57:50 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:57:50 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:57:50 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 6.31 / 10 (63.1%):  25%|██▌       | 10/40 [02:04<02:44,  5.49s/it]

[92m13:57:50 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:57:57 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:57:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:57:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:57:57 - 

Average Metric: 6.97 / 11 (63.4%):  28%|██▊       | 11/40 [02:12<02:57,  6.13s/it]

[92m13:57:57 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:58:01 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:58:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:58:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 7.57 / 12 (63.1%):  30%|███       | 12/40 [02:15<02:28,  5.31s/it]

[92m13:58:01 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:58:04 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:58:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:58:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:58:04 - 

Average Metric: 8.35 / 13 (64.2%):  32%|███▎      | 13/40 [02:19<02:08,  4.75s/it]

[92m13:58:04 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:58:09 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:58:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:58:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 9.01 / 14 (64.4%):  35%|███▌      | 14/40 [02:24<02:04,  4.77s/it]

[92m13:58:09 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:58:11 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:58:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:58:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:58:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 9.79 / 15 (65.3%):  38%|███▊      | 15/40 [02:26<01:39,  3.99s/it]

[92m13:58:11 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:58:16 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:58:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:58:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 10.68 / 16 (66.7%):  40%|████      | 16/40 [02:31<01:42,  4.28s/it]

[92m13:58:16 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:58:17 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:58:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:58:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:58:17 - 

Average Metric: 10.68 / 17 (62.8%):  42%|████▎     | 17/40 [02:32<01:16,  3.33s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:58:26 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:58:26 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:58:26 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 11.54 / 18 (64.1%):  45%|████▌     | 18/40 [02:40<01:47,  4.87s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:58:32 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:58:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:58:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:58:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:58:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 12.38 / 19 (65.2%):  48%|████▊     | 19/40 [02:56<02:51,  8.15s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m13:58:47 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m13:58:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:58:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m13:58:47 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 12.38 / 20 (61.9%):  50%|█████     | 20/40 [04:30<11:15, 33.77s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:00:19 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:00:19 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:00:19 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:00:19 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 12.38 / 21 (59.0%):  52%|█████▎    | 21/40 [04:34<07:52, 24.85s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:00:23 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:00:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:00:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:00:23 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 13.23 / 22 (60.1%):  55%|█████▌    | 22/40 [04:44<06:06, 20.35s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:00:32 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:00:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:00:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 13.55 / 23 (58.9%):  57%|█████▊    | 23/40 [04:47<04:16, 15.11s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:00:41 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:00:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:00:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:00:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 14.22 / 24 (59.2%):  60%|██████    | 24/40 [04:55<03:31, 13.25s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:00:42 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:00:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:00:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 14.88 / 25 (59.5%):  62%|██████▎   | 25/40 [04:57<02:24,  9.62s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:00:48 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:00:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:00:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:00:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 15.73 / 26 (60.5%):  65%|██████▌   | 26/40 [05:03<02:02,  8.74s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:00:58 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:00:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:00:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 16.42 / 27 (60.8%):  68%|██████▊   | 27/40 [05:12<01:54,  8.84s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:01:06 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:01:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 16.75 / 28 (59.8%):  70%|███████   | 28/40 [05:21<01:44,  8.74s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:01:06 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:01:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 17.43 / 29 (60.1%):  72%|███████▎  | 29/40 [05:21<01:08,  6.22s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:01:11 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:01:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 18.10 / 30 (60.3%):  75%|███████▌  | 30/40 [05:26<00:57,  5.79s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:01:12 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:01:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 18.10 / 31 (58.4%):  78%|███████▊  | 31/40 [05:27<00:39,  4.41s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:01:19 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:01:19 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:19 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:19 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:19 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 18.77 / 32 (58.6%):  80%|████████  | 32/40 [05:34<00:41,  5.16s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:01:23 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:01:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 19.61 / 33 (59.4%):  82%|████████▎ | 33/40 [05:38<00:33,  4.74s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:01:25 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:01:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 20.40 / 34 (60.0%):  85%|████████▌ | 34/40 [05:40<00:24,  4.03s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:01:30 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:01:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 21.00 / 35 (60.0%):  88%|████████▊ | 35/40 [05:45<00:20,  4.16s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:01:33 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:01:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 21.79 / 36 (60.5%):  90%|█████████ | 36/40 [05:48<00:15,  3.79s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:01:40 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:01:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 22.57 / 37 (61.0%):  92%|█████████▎| 37/40 [05:55<00:14,  4.83s/it]

[92m14:01:40 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m14:01:40 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m14:01:40 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:01:41 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:01:41 - LiteLLM:INFO[0m: cost

Average Metric: 22.85 / 38 (60.1%):  95%|█████████▌| 38/40 [06:05<00:12,  6.44s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:01:55 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:01:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:01:56 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 22.85 / 38 (60.1%):  98%|█████████▊| 39/40 [06:18<00:08,  8.37s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:02:06 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:02:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:02:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:02:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 23.60 / 39 (60.5%): 100%|██████████| 40/40 [06:21<00:00,  6.66s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:02:11 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:02:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:02:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 24.45 / 40 (61.1%): : 41it [06:26,  9.42s/it]                      

2025/06/03 14:02:11 INFO dspy.evaluate.evaluate: Average Metric: 24.450691392820534 / 40 (61.1%)
2025/06/03 14:02:11 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 61.13 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 5'].
2025/06/03 14:02:11 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [57.21, 60.52, 61.11, 60.85, 61.11, 55.55, 60.94, 62.5, 62.12, 59.76, 63.27, 63.27, 61.13]
2025/06/03 14:02:11 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 63.27


2025/06/03 14:02:11 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 14 / 18 =====



🏃 View run eval_full_12 at: http://localhost:5500/#/experiments/344816129373506955/runs/caf97e68989e48e29ffe677180db4aa1
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955
  0%|          | 0/40 [00:00<?, ?it/s]

[92m14:02:11 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m14:02:11 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:02:12 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:02:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:02:12 - LiteLLM:INF

Average Metric: 0.67 / 1 (66.7%):   2%|▎         | 1/40 [01:15<48:45, 75.02s/it]

[92m14:03:26 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:03:29 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:03:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:03:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:03:29 - 

Average Metric: 1.33 / 2 (66.7%):   5%|▌         | 2/40 [01:25<23:35, 37.24s/it]

[92m14:03:37 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:03:39 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:03:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:03:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:03:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 2.00 / 3 (66.7%):   8%|▊         | 3/40 [01:27<13:01, 21.13s/it]

[92m14:03:39 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:03:45 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:03:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:03:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:03:45 - 

Average Metric: 2.89 / 4 (72.2%):  10%|█         | 4/40 [01:40<10:35, 17.65s/it]

[92m14:03:51 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:03:53 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:03:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:03:53 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 3.78 / 5 (75.6%):  12%|█▎        | 5/40 [01:42<06:59, 11.98s/it]

[92m14:03:53 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:03:58 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:03:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:03:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:03:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 4.06 / 6 (67.7%):  15%|█▌        | 6/40 [01:47<05:32,  9.77s/it]

[92m14:03:59 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:04:00 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:04:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 4.85 / 7 (69.3%):  18%|█▊        | 7/40 [01:49<03:54,  7.11s/it]

[92m14:04:00 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:04:08 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:04:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 5.63 / 8 (70.3%):  20%|██        | 8/40 [01:57<03:58,  7.45s/it]

[92m14:04:08 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:04:09 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:04:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 6.40 / 9 (71.1%):  22%|██▎       | 9/40 [01:57<02:44,  5.31s/it]

[92m14:04:09 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:04:15 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:04:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 7.07 / 10 (70.7%):  25%|██▌       | 10/40 [02:04<02:48,  5.61s/it]

[92m14:04:15 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:04:17 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:04:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 7.73 / 11 (70.3%):  28%|██▊       | 11/40 [02:05<02:08,  4.42s/it]

[92m14:04:17 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:04:24 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:04:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:24 - 

Average Metric: 8.40 / 12 (70.0%):  30%|███       | 12/40 [02:12<02:22,  5.10s/it]

[92m14:04:24 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:04:25 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:04:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 9.07 / 13 (69.8%):  32%|███▎      | 13/40 [02:13<01:46,  3.96s/it]

[92m14:04:25 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:04:30 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:04:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:30 - 

Average Metric: 9.96 / 14 (71.1%):  35%|███▌      | 14/40 [02:19<01:54,  4.40s/it]

[92m14:04:30 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:04:34 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:04:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 10.73 / 15 (71.5%):  38%|███▊      | 15/40 [02:23<01:45,  4.21s/it]

[92m14:04:34 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:04:41 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:04:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:41 - 

Average Metric: 11.58 / 16 (72.4%):  40%|████      | 16/40 [02:30<02:02,  5.10s/it]

[92m14:04:41 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:04:43 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:04:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 12.15 / 17 (71.5%):  42%|████▎     | 17/40 [02:31<01:33,  4.06s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:04:48 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:04:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 12.82 / 18 (71.2%):  45%|████▌     | 18/40 [02:40<02:01,  5.54s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:04:56 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:04:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:04:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 13.48 / 19 (71.0%):  48%|████▊     | 19/40 [02:54<02:48,  8.00s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:05:13 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:05:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:05:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:05:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 13.48 / 20 (67.4%):  50%|█████     | 20/40 [04:05<08:55, 26.79s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:06:18 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:06:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:06:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:06:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 13.48 / 21 (64.2%):  52%|█████▎    | 21/40 [04:06<06:06, 19.28s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:06:25 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:06:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:06:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 14.34 / 22 (65.2%):  55%|█████▌    | 22/40 [04:14<04:42, 15.72s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:06:28 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:06:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:06:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:06:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 14.66 / 23 (63.7%):  57%|█████▊    | 23/40 [04:23<03:53, 13.73s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:06:36 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:06:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:06:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:06:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:06:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 14.99 / 24 (62.5%):  60%|██████    | 24/40 [04:35<03:31, 13.23s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:06:49 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:06:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:06:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:06:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 15.77 / 25 (63.1%):  62%|██████▎   | 25/40 [04:38<02:30, 10.05s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:06:54 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:06:54 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:06:54 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 16.66 / 26 (64.1%):  65%|██████▌   | 26/40 [04:42<01:57,  8.38s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:07:01 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:07:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 17.34 / 27 (64.2%):  68%|██████▊   | 27/40 [04:49<01:44,  8.04s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:07:08 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:07:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 18.01 / 28 (64.3%):  70%|███████   | 28/40 [04:56<01:31,  7.64s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:07:09 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:07:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 18.54 / 29 (63.9%):  72%|███████▎  | 29/40 [04:57<01:02,  5.64s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:07:15 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:07:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 18.54 / 30 (61.8%):  75%|███████▌  | 30/40 [05:04<00:59,  5.94s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:07:16 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:07:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 19.21 / 31 (62.0%):  78%|███████▊  | 31/40 [05:05<00:40,  4.53s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:07:21 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:07:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 19.21 / 32 (60.0%):  80%|████████  | 32/40 [05:10<00:36,  4.61s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:07:24 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:07:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 19.88 / 33 (60.2%):  82%|████████▎ | 33/40 [05:12<00:27,  3.95s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:07:31 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:07:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 20.48 / 34 (60.2%):  85%|████████▌ | 34/40 [05:19<00:29,  4.85s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:07:37 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:07:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 21.14 / 35 (60.4%):  88%|████████▊ | 35/40 [05:26<00:27,  5.41s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:07:43 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:07:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:43 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 22.00 / 36 (61.1%):  90%|█████████ | 36/40 [05:34<00:25,  6.31s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:07:51 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:07:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:51 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 22.61 / 37 (61.1%):  92%|█████████▎| 37/40 [05:40<00:18,  6.16s/it]

[92m14:07:52 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m14:07:52 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m14:07:52 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:07:57 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:07:57 - LiteLLM:INFO[0m: cost

Average Metric: 23.19 / 38 (61.0%):  95%|█████████▌| 38/40 [05:45<00:11,  5.89s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:07:57 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:07:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:07:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 23.85 / 39 (61.2%):  98%|█████████▊| 39/40 [05:46<00:04,  4.33s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:08:04 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:08:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:08:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 24.52 / 40 (61.3%): 100%|██████████| 40/40 [05:53<00:00,  8.83s/it]

2025/06/03 14:08:04 INFO dspy.evaluate.evaluate: Average Metric: 24.519484967466592 / 40 (61.3%)
2025/06/03 14:08:04 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 61.3 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 1'].
2025/06/03 14:08:04 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [57.21, 60.52, 61.11, 60.85, 61.11, 55.55, 60.94, 62.5, 62.12, 59.76, 63.27, 63.27, 61.13, 61.3]
2025/06/03 14:08:04 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 63.27


2025/06/03 14:08:04 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 15 / 18 =====



🏃 View run eval_full_13 at: http://localhost:5500/#/experiments/344816129373506955/runs/c246f1d59cf042debd80a551ccbc4909
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955
  0%|          | 0/40 [00:00<?, ?it/s]

[92m14:08:04 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m14:08:04 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:08:05 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:08:05 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:08:05 - LiteLLM:INF

Average Metric: 0.67 / 1 (66.7%):   2%|▎         | 1/40 [01:36<1:02:49, 96.66s/it]

[92m14:09:41 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:09:47 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:09:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:09:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 1.33 / 2 (66.7%):   5%|▌         | 2/40 [01:42<27:26, 43.33s/it]  

[92m14:09:47 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:09:49 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:09:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:09:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:09:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 2.00 / 3 (66.7%):   8%|▊         | 3/40 [01:45<15:17, 24.79s/it]

[92m14:09:50 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:09:52 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:09:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:09:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 2.67 / 4 (66.7%):  10%|█         | 4/40 [01:48<09:39, 16.10s/it]

[92m14:09:52 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:09:56 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:09:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:09:56 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:09:56 - 

Average Metric: 3.33 / 5 (66.7%):  12%|█▎        | 5/40 [01:52<06:49, 11.69s/it]

[92m14:09:56 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:09:58 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:09:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:09:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 3.81 / 6 (63.6%):  15%|█▌        | 6/40 [01:53<04:40,  8.24s/it]

[92m14:09:58 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:10:03 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:10:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:03 - 

Average Metric: 4.66 / 7 (66.6%):  18%|█▊        | 7/40 [01:58<04:00,  7.28s/it]

[92m14:10:03 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:10:06 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:10:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 5.33 / 8 (66.6%):  20%|██        | 8/40 [02:02<03:11,  5.98s/it]

[92m14:10:06 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:10:12 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:10:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 6.15 / 9 (68.3%):  22%|██▎       | 9/40 [02:07<03:00,  5.82s/it]

[92m14:10:12 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:10:13 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:10:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 7.03 / 10 (70.3%):  25%|██▌       | 10/40 [02:08<02:11,  4.37s/it]

[92m14:10:13 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:10:17 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:10:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:17 - 

Average Metric: 7.70 / 11 (70.0%):  28%|██▊       | 11/40 [02:13<02:06,  4.37s/it]

[92m14:10:17 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:10:21 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:10:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 8.47 / 12 (70.6%):  30%|███       | 12/40 [02:16<01:58,  4.24s/it]

[92m14:10:21 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:10:27 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:10:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:27 - 

Average Metric: 9.36 / 13 (72.0%):  32%|███▎      | 13/40 [02:22<02:05,  4.64s/it]

[92m14:10:27 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:10:28 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:10:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 10.03 / 14 (71.6%):  35%|███▌      | 14/40 [02:23<01:35,  3.66s/it]

[92m14:10:28 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:10:35 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:10:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:35 - 

Average Metric: 10.03 / 15 (66.9%):  38%|███▊      | 15/40 [02:31<01:59,  4.77s/it]

[92m14:10:36 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:10:36 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:10:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 10.89 / 16 (68.0%):  40%|████      | 16/40 [02:31<01:24,  3.51s/it]

[92m14:10:36 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:10:41 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:10:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 11.49 / 17 (67.6%):  42%|████▎     | 17/40 [02:37<01:33,  4.06s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:10:43 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:10:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 12.06 / 18 (67.0%):  45%|████▌     | 18/40 [02:39<01:16,  3.47s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:10:49 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:10:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 12.67 / 19 (66.7%):  48%|████▊     | 19/40 [02:44<01:26,  4.13s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:10:57 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:10:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:10:57 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 12.67 / 20 (63.4%):  50%|█████     | 20/40 [04:16<10:08, 30.44s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:12:25 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:12:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:12:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 12.67 / 21 (60.4%):  52%|█████▎    | 21/40 [04:21<07:09, 22.63s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:12:29 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:12:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:12:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:12:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 12.96 / 22 (58.9%):  55%|█████▌    | 22/40 [04:24<05:04, 16.92s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:12:35 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:12:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:12:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 13.44 / 23 (58.4%):  57%|█████▊    | 23/40 [04:31<03:53, 13.76s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:12:37 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:12:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:12:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:12:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:12:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 14.13 / 24 (58.9%):  60%|██████    | 24/40 [04:32<02:40, 10.05s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:12:45 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:12:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:12:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 14.97 / 25 (59.9%):  62%|██████▎   | 25/40 [04:40<02:22,  9.51s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:12:48 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:12:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:12:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:12:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:12:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 15.86 / 26 (61.0%):  65%|██████▌   | 26/40 [04:43<01:44,  7.48s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:12:57 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:12:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:12:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 16.40 / 27 (60.7%):  68%|██████▊   | 27/40 [04:52<01:42,  7.90s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:13:03 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:13:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:13:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:13:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 16.40 / 28 (58.6%):  70%|███████   | 28/40 [04:58<01:28,  7.40s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:13:05 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:13:05 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:13:05 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 17.12 / 29 (59.0%):  72%|███████▎  | 29/40 [05:00<01:02,  5.73s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:13:09 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:13:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:13:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:13:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 17.79 / 30 (59.3%):  75%|███████▌  | 30/40 [05:04<00:52,  5.27s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:13:10 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:13:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:13:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 18.12 / 31 (58.5%):  78%|███████▊  | 31/40 [05:06<00:37,  4.14s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:13:16 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:13:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:13:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:13:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 18.79 / 32 (58.7%):  80%|████████  | 32/40 [05:12<00:37,  4.74s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:13:18 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:13:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:13:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 19.29 / 33 (58.5%):  82%|████████▎ | 33/40 [05:13<00:26,  3.76s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:13:23 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:13:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:13:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:13:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 19.96 / 34 (58.7%):  85%|████████▌ | 34/40 [05:19<00:25,  4.25s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:13:24 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:13:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:13:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 20.56 / 35 (58.7%):  88%|████████▊ | 35/40 [05:20<00:16,  3.33s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:13:31 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:13:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:13:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:13:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:13:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 21.22 / 36 (59.0%):  90%|█████████ | 36/40 [05:26<00:16,  4.20s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:13:33 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:13:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:13:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 21.89 / 37 (59.2%):  92%|█████████▎| 37/40 [05:28<00:10,  3.62s/it]

[92m14:13:33 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m14:13:33 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m14:13:33 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:13:37 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:13:37 - LiteLLM:INFO[0m: cost

Average Metric: 22.56 / 38 (59.4%):  95%|█████████▌| 38/40 [05:32<00:07,  3.67s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:13:44 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:13:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:13:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 23.22 / 39 (59.5%):  98%|█████████▊| 39/40 [05:39<00:04,  4.60s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:13:48 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:13:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:13:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:13:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 23.22 / 39 (59.5%): 100%|██████████| 40/40 [05:55<00:00,  8.00s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:14:02 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:14:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:14:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:14:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:14:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 24.00 / 40 (60.0%): : 41it [05:57,  8.72s/it]                      

2025/06/03 14:14:02 INFO dspy.evaluate.evaluate: Average Metric: 23.99667723800551 / 40 (60.0%)
2025/06/03 14:14:02 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 59.99 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 9'].
2025/06/03 14:14:02 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [57.21, 60.52, 61.11, 60.85, 61.11, 55.55, 60.94, 62.5, 62.12, 59.76, 63.27, 63.27, 61.13, 61.3, 59.99]
2025/06/03 14:14:02 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 63.27


2025/06/03 14:14:02 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 16 / 18 =====



🏃 View run eval_full_14 at: http://localhost:5500/#/experiments/344816129373506955/runs/b85186b8bafa441ba756b1e953301349
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955
  0%|          | 0/40 [00:00<?, ?it/s]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:14:04 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:14:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:14:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:14:05 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m14:14:05 - 

Average Metric: 0.67 / 1 (66.7%):   2%|▎         | 1/40 [01:23<53:59, 83.07s/it]

[92m14:15:25 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:15:29 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:15:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:15:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:15:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 1.58 / 2 (78.8%):   5%|▌         | 2/40 [01:48<31:18, 49.43s/it]

[92m14:15:51 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:15:54 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:15:54 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:15:54 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 2.43 / 3 (81.1%):   8%|▊         | 3/40 [01:52<17:32, 28.46s/it]

[92m14:15:54 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:16:00 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:16:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 2.91 / 4 (72.8%):  10%|█         | 4/40 [01:57<11:33, 19.28s/it]

[92m14:16:00 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:16:06 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:16:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 3.53 / 5 (70.6%):  12%|█▎        | 5/40 [02:04<08:37, 14.78s/it]

[92m14:16:07 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:16:07 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:16:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:07 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 4.30 / 6 (71.7%):  15%|█▌        | 6/40 [02:05<05:38,  9.96s/it]

[92m14:16:07 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:16:12 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:16:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 4.97 / 7 (71.0%):  18%|█▊        | 7/40 [02:10<04:40,  8.51s/it]

[92m14:16:13 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:16:21 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:16:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:21 - 

Average Metric: 5.82 / 8 (72.7%):  20%|██        | 8/40 [02:18<04:27,  8.36s/it]

[92m14:16:21 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:16:21 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:16:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 6.66 / 9 (74.0%):  22%|██▎       | 9/40 [02:19<03:06,  6.03s/it]

[92m14:16:22 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:16:27 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:16:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:27 - 

Average Metric: 7.26 / 10 (72.6%):  25%|██▌       | 10/40 [02:25<02:59,  5.99s/it]

[92m14:16:27 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:16:30 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:16:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 7.26 / 11 (66.0%):  28%|██▊       | 11/40 [02:28<02:28,  5.11s/it]

[92m14:16:31 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:16:34 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:16:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:34 - 

Average Metric: 7.83 / 12 (65.3%):  30%|███       | 12/40 [02:31<02:09,  4.61s/it]

[92m14:16:34 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:16:38 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:16:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 8.50 / 13 (65.4%):  32%|███▎      | 13/40 [02:36<02:04,  4.59s/it]

[92m14:16:39 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:16:44 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:16:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 9.17 / 14 (65.5%):  35%|███▌      | 14/40 [02:42<02:08,  4.95s/it]

[92m14:16:44 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:16:47 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:16:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 9.83 / 15 (65.6%):  38%|███▊      | 15/40 [02:45<01:47,  4.31s/it]

[92m14:16:47 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:16:55 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:16:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:16:55 - 

Average Metric: 10.50 / 16 (65.6%):  40%|████      | 16/40 [02:55<02:25,  6.07s/it]

[92m14:16:57 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:17:03 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:17:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:17:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:17:03 - 

Average Metric: 11.17 / 17 (65.7%):  42%|████▎     | 17/40 [03:01<02:20,  6.09s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:17:05 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:17:05 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:17:05 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 12.02 / 18 (66.8%):  45%|████▌     | 18/40 [03:03<01:47,  4.91s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:17:13 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:17:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:17:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:17:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 12.90 / 19 (67.9%):  48%|████▊     | 19/40 [03:10<01:56,  5.55s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:17:17 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:17:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:17:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:17:17 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 13.57 / 20 (67.9%):  50%|█████     | 20/40 [04:24<08:41, 26.07s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:18:32 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:18:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:18:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:18:32 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 14.24 / 21 (67.8%):  52%|█████▎    | 21/40 [04:49<08:06, 25.60s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:18:55 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:18:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:18:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:18:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 15.13 / 22 (68.8%):  55%|█████▌    | 22/40 [04:58<06:16, 20.90s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:19:04 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:19:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:19:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:19:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 15.45 / 23 (67.2%):  57%|█████▊    | 23/40 [05:02<04:26, 15.65s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:19:11 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:19:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:19:11 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 16.11 / 24 (67.1%):  60%|██████    | 24/40 [05:08<03:25, 12.83s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:19:13 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:19:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:19:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:19:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:19:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 16.78 / 25 (67.1%):  62%|██████▎   | 25/40 [05:10<02:23,  9.59s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:19:23 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:19:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:19:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 17.64 / 26 (67.8%):  65%|██████▌   | 26/40 [05:20<02:15,  9.70s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:19:31 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:19:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:19:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:19:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 18.32 / 27 (67.9%):  68%|██████▊   | 27/40 [05:29<02:03,  9.47s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:19:32 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:19:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:19:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 18.86 / 28 (67.3%):  70%|███████   | 28/40 [05:29<01:21,  6.75s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:19:38 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:19:38 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:19:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:19:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[

Average Metric: 19.52 / 30 (65.1%):  72%|███████▎  | 29/40 [05:36<01:13,  6.71s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:19:45 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:19:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:19:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:19:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 20.19 / 31 (65.1%):  78%|███████▊  | 31/40 [05:43<00:45,  5.10s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:19:52 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:19:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:19:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 20.19 / 32 (63.1%):  80%|████████  | 32/40 [05:50<00:45,  5.64s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:19:55 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:19:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:19:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:19:55 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 20.19 / 33 (61.2%):  82%|████████▎ | 33/40 [05:52<00:33,  4.82s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:20:01 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:20:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:20:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:20:01 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 20.19 / 34 (59.4%):  85%|████████▌ | 34/40 [06:01<00:35,  5.96s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:20:09 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:20:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:20:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 20.79 / 35 (59.4%):  88%|████████▊ | 35/40 [06:06<00:28,  5.72s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:20:10 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:20:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:20:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:20:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:20:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 21.46 / 36 (59.6%):  90%|█████████ | 36/40 [06:08<00:18,  4.54s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:20:17 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:20:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:20:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 22.12 / 37 (59.8%):  92%|█████████▎| 37/40 [06:14<00:15,  5.06s/it]

[92m14:20:17 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m14:20:17 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m14:20:17 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:20:19 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:20:19 - LiteLLM:INFO[0m: cost

Average Metric: 22.79 / 38 (60.0%):  95%|█████████▌| 38/40 [06:17<00:08,  4.30s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:20:27 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:20:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:20:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:20:27 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 23.46 / 39 (60.1%):  98%|█████████▊| 39/40 [06:32<00:07,  7.37s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:20:34 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:20:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:20:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:20:34 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 23.46 / 39 (60.1%): 100%|██████████| 40/40 [06:39<00:00,  7.40s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:20:42 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:20:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:20:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 23.46 / 39 (60.1%): : 41it [06:40,  5.48s/it]                      

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:20:47 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:20:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:20:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:20:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 24.27 / 40 (60.7%): : 42it [06:45,  9.65s/it]

2025/06/03 14:20:47 INFO dspy.evaluate.evaluate: Average Metric: 24.27390037750569 / 40 (60.7%)
2025/06/03 14:20:47 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 60.68 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 8'].
2025/06/03 14:20:47 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [57.21, 60.52, 61.11, 60.85, 61.11, 55.55, 60.94, 62.5, 62.12, 59.76, 63.27, 63.27, 61.13, 61.3, 59.99, 60.68]
2025/06/03 14:20:47 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 63.27


2025/06/03 14:20:47 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 17 / 18 =====



🏃 View run eval_full_15 at: http://localhost:5500/#/experiments/344816129373506955/runs/7074516c57564994b8278c34360ec51a
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955
  0%|          | 0/40 [00:00<?, ?it/s]

[92m14:20:47 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m14:20:47 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m14:20:47 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:20:49 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:20:49 - LiteLLM:INFO[0m: cost

Average Metric: 0.67 / 1 (66.7%):   2%|▎         | 1/40 [01:25<55:37, 85.58s/it]

[92m14:22:13 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:22:15 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:22:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:22:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:22:15 - 

Average Metric: 1.33 / 2 (66.7%):   5%|▌         | 2/40 [01:35<26:09, 41.31s/it]

[92m14:22:23 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:22:25 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:22:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:22:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:22:25 - 

Average Metric: 1.33 / 3 (44.4%):   8%|▊         | 3/40 [01:42<15:47, 25.60s/it]

[92m14:22:30 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:22:33 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:22:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:22:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 2.11 / 4 (52.7%):  10%|█         | 4/40 [01:45<10:02, 16.73s/it]

[92m14:22:33 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:22:38 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:22:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:22:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:22:38 - 

Average Metric: 2.71 / 5 (54.2%):  12%|█▎        | 5/40 [01:50<07:14, 12.42s/it]

[92m14:22:38 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:22:43 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:22:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:22:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 3.37 / 6 (56.2%):  15%|█▌        | 6/40 [01:56<05:42, 10.07s/it]

[92m14:22:44 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:22:46 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:22:46 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:22:46 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:22:46 - 

Average Metric: 4.26 / 7 (60.9%):  18%|█▊        | 7/40 [01:58<04:07,  7.51s/it]

[92m14:22:46 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:22:50 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:22:50 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:22:50 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 5.11 / 8 (63.9%):  20%|██        | 8/40 [02:02<03:29,  6.53s/it]

[92m14:22:50 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:22:51 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:22:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:22:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:22:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 5.93 / 9 (65.9%):  22%|██▎       | 9/40 [02:03<02:29,  4.82s/it]

[92m14:22:51 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:23:00 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:23:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:23:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 6.53 / 10 (65.3%):  25%|██▌       | 10/40 [02:12<03:00,  6.01s/it]

[92m14:23:00 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:23:01 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:23:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:23:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:23:01 - 

Average Metric: 7.19 / 11 (65.4%):  28%|██▊       | 11/40 [02:13<02:10,  4.51s/it]

[92m14:23:01 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:23:10 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:23:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:23:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 7.97 / 12 (66.5%):  30%|███       | 12/40 [02:22<02:42,  5.80s/it]

[92m14:23:10 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:23:10 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:23:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:23:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:23:10 - 

Average Metric: 8.86 / 13 (68.2%):  32%|███▎      | 13/40 [02:23<01:53,  4.20s/it]

[92m14:23:10 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:23:16 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:23:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:23:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 9.65 / 14 (68.9%):  35%|███▌      | 14/40 [02:28<02:02,  4.70s/it]

[92m14:23:16 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:23:18 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:23:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:23:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:23:18 - 

Average Metric: 10.34 / 15 (68.9%):  38%|███▊      | 15/40 [02:30<01:36,  3.87s/it]

[92m14:23:18 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:23:23 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:23:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:23:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 11.00 / 16 (68.8%):  40%|████      | 16/40 [02:35<01:41,  4.22s/it]

[92m14:23:23 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:23:32 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:23:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:23:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:23:32 - 

Average Metric: 11.67 / 17 (68.7%):  42%|████▎     | 17/40 [02:45<02:15,  5.90s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:23:40 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:23:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:23:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:23:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:23:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 12.34 / 18 (68.5%):  45%|████▌     | 18/40 [02:55<02:37,  7.16s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:23:52 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:23:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:23:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:23:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:23:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 13.00 / 19 (68.4%):  48%|████▊     | 19/40 [03:09<03:10,  9.05s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:24:04 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:24:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:24:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:24:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 13.67 / 20 (68.4%):  50%|█████     | 20/40 [04:25<09:44, 29.22s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:25:17 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:25:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:25:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 13.67 / 21 (65.1%):  52%|█████▎    | 21/40 [04:29<06:51, 21.66s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:25:23 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:25:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:25:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:25:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 14.46 / 22 (65.7%):  55%|█████▌    | 22/40 [04:36<05:09, 17.22s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:25:30 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:25:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:25:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:25:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:25:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 14.75 / 23 (64.1%):  57%|█████▊    | 23/40 [04:47<04:20, 15.33s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:25:41 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:25:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:25:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:25:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 15.41 / 24 (64.2%):  60%|██████    | 24/40 [04:54<03:25, 12.84s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:25:42 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:25:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:25:42 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 16.30 / 25 (65.2%):  62%|██████▎   | 25/40 [04:54<02:17,  9.18s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:25:49 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:25:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:25:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:25:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 17.15 / 26 (66.0%):  65%|██████▌   | 26/40 [05:01<01:58,  8.45s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:26:08 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:26:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:26:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 17.78 / 27 (65.9%):  68%|██████▊   | 27/40 [05:20<02:31, 11.64s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:26:12 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:26:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:26:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:26:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:26:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 18.63 / 28 (66.5%):  70%|███████   | 28/40 [05:24<01:52,  9.40s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:26:15 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:26:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:26:15 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 18.96 / 29 (65.4%):  72%|███████▎  | 29/40 [05:27<01:21,  7.37s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:26:20 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:26:20 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:26:20 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:26:20 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 19.63 / 30 (65.4%):  75%|███████▌  | 30/40 [05:32<01:06,  6.69s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:26:20 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:26:20 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:26:20 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 19.63 / 31 (63.3%):  78%|███████▊  | 31/40 [05:33<00:43,  4.81s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:26:27 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:26:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:26:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:26:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 20.30 / 32 (63.4%):  80%|████████  | 32/40 [05:39<00:42,  5.35s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:26:28 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:26:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:26:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 20.96 / 33 (63.5%):  82%|████████▎ | 33/40 [05:41<00:29,  4.20s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:26:35 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:26:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:26:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:26:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 21.75 / 34 (64.0%):  85%|████████▌ | 34/40 [05:48<00:30,  5.06s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:26:38 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:26:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:26:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 22.64 / 35 (64.7%):  88%|████████▊ | 35/40 [05:51<00:21,  4.38s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:26:44 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:26:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:26:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:26:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 23.31 / 36 (64.7%):  90%|█████████ | 36/40 [05:57<00:19,  4.89s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:26:48 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:26:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:26:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:26:48 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 23.98 / 37 (64.8%):  92%|█████████▎| 37/40 [06:11<00:22,  7.66s/it]

[92m14:26:59 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m14:26:59 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m14:26:59 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:27:02 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:27:02 - LiteLLM:INFO[0m: cost

Average Metric: 24.87 / 38 (65.4%):  95%|█████████▌| 38/40 [06:15<00:13,  6.53s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:27:05 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:27:05 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:27:05 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 25.53 / 39 (65.5%):  98%|█████████▊| 39/40 [06:17<00:05,  5.31s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:27:09 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:27:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:27:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:27:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 26.31 / 40 (65.8%): 100%|██████████| 40/40 [06:21<00:00,  9.55s/it]

2025/06/03 14:27:09 INFO dspy.evaluate.evaluate: Average Metric: 26.306180868190257 / 40 (65.8%)
2025/06/03 14:27:09 INFO dspy.teleprompt.mipro_optimizer_v2: [92mBest full score so far![0m Score: 65.77
2025/06/03 14:27:09 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 65.77 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 4'].
2025/06/03 14:27:09 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [57.21, 60.52, 61.11, 60.85, 61.11, 55.55, 60.94, 62.5, 62.12, 59.76, 63.27, 63.27, 61.13, 61.3, 59.99, 60.68, 65.77]
2025/06/03 14:27:09 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 65.77


2025/06/03 14:27:09 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 18 / 18 =====



🏃 View run eval_full_16 at: http://localhost:5500/#/experiments/344816129373506955/runs/a68f92246a624aa187e47a692ec4d37a
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955
Average Metric: 2.15 / 3 (71.7%):   5%|▌         | 2/40 [00:00<00:11,  3.18it/s]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:27:12 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:27:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:27:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 26.31 / 40 (65.8%): 100%|██████████| 40/40 [00:06<00:00,  6.62it/s]

2025/06/03 14:27:15 INFO dspy.evaluate.evaluate: Average Metric: 26.306180868190257 / 40 (65.8%)
2025/06/03 14:27:15 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 65.77 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 4'].
2025/06/03 14:27:15 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [57.21, 60.52, 61.11, 60.85, 61.11, 55.55, 60.94, 62.5, 62.12, 59.76, 63.27, 63.27, 61.13, 61.3, 59.99, 60.68, 65.77, 65.77]
2025/06/03 14:27:15 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 65.77


2025/06/03 14:27:15 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 19 / 18 =====



🏃 View run eval_full_17 at: http://localhost:5500/#/experiments/344816129373506955/runs/15fd7a4cd8ba48d984544c18db0a4325
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955
Average Metric: 1.56 / 2 (77.8%):   2%|▎         | 1/40 [00:00<00:13,  2.84it/s]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m14:27:17 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m14:27:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:27:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m14:27:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 26.20 / 40 (65.5%): 100%|██████████| 40/40 [00:06<00:00,  5.83it/s]

2025/06/03 14:27:22 INFO dspy.evaluate.evaluate: Average Metric: 26.198653986469825 / 40 (65.5%)
2025/06/03 14:27:22 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 65.5 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 4'].
2025/06/03 14:27:22 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [57.21, 60.52, 61.11, 60.85, 61.11, 55.55, 60.94, 62.5, 62.12, 59.76, 63.27, 63.27, 61.13, 61.3, 59.99, 60.68, 65.77, 65.77, 65.5]
2025/06/03 14:27:22 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 65.77


2025/06/03 14:27:22 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 65.77!



🏃 View run eval_full_18 at: http://localhost:5500/#/experiments/344816129373506955/runs/c6d520b1aeb2449fbc760565d9735561
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 145.21it/s]

🏃 View run resilient-shrike-234 at: http://localhost:5500/#/experiments/344816129373506955/runs/a3b3616e10834427b5ac20b2cd63de46
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955





In [19]:
baseline = rag(question="cmd+tab does not work on hidden or minimized windows")
print(baseline.response)

[92m15:19:43 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:19:50 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:19:50 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:19:50 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:19:50 - 

You can use the following methods to switch between hidden or minimized windows on a Mac:
- Use `Control-CMD-F` to switch to full screen mode, which will show all open windows.
- Navigate to System Preferences > Mission Control, and uncheck "When switching to an application, switch to a Space with open windows for the application."
- Alternatively, you can use Automator Services or third-party applications like iCanHazShortcut to execute an `osascript` command that can handle hidden or minimized windows.
- Another approach is to use `Cmd+` (backtick) and press `Cmd+tab` while holding `Cmd`, which will show all open applications with animated displayed windows, allowing you to navigate using arrow keys.


In [20]:
pred = optimized_rag(question="cmd+tab does not work on hidden or minimized windows")
print(pred.response)

[92m15:19:52 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:20:02 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:20:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:20:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


To access minimized windows using cmd+tab, try the following workarounds:
- Use Control-CMD-F for full screen mode to access minimized windows.
- Switch to all open apps before hiding.
- Use the following AppleScript to switch to the first minimized window: `delay 0.5 set i to 0 tell application System Events set first_app to name of the first process whose frontmost is true repeat with p in every process if visible of p then set i to i + 1 end if end repeat repeat i - 1 times key down command key down shift keystroke tab delay 0.01 key up shift key up command delay 0.1 end repeat set visible of process first_app to false end tell
- Alternatively, use the following shortcut in iCanHazShortcut: `osascript <scriptname>`
- Navigate to the minimized Application by doing Command+Tab while still holding Command, then release both keys.
- Press cmd+` to access minimized windows, but first press cmd+tab to open the window, then press cmd+` to minimize it.


In [None]:
evaluate(optimized_rag)

  0%|          | 0/100 [00:00<?, ?it/s]

[92m15:20:13 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m15:20:13 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m15:20:16 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m15:20:16 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m15:20:16 - LiteLLM:INFO[0m: utils.py:2991 - 
L

Average Metric: 0.85 / 1 (84.7%):   1%|          | 1/100 [00:48<1:20:04, 48.53s/it]

[92m15:21:01 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:21:06 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:21:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:21:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:21:06 - 

Average Metric: 1.51 / 2 (75.7%):   2%|▏         | 2/100 [01:41<1:23:30, 51.13s/it]

[92m15:21:54 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:21:57 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:21:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:21:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:21:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 1.51 / 3 (50.5%):   3%|▎         | 3/100 [01:44<46:49, 28.97s/it]  

[92m15:21:57 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:22:03 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:22:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:22:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:22:03 - 

Average Metric: 1.85 / 4 (46.2%):   4%|▍         | 4/100 [01:52<33:18, 20.81s/it]

[92m15:22:05 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:22:10 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:22:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:22:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 1.85 / 5 (36.9%):   5%|▌         | 5/100 [01:57<23:44, 14.99s/it]

[92m15:22:10 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:22:16 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:22:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:22:16 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:22:16 - 

Average Metric: 2.33 / 6 (38.8%):   6%|▌         | 6/100 [02:02<18:35, 11.87s/it]

[92m15:22:16 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:22:18 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:22:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:22:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:22:18 - 

Average Metric: 2.99 / 7 (42.8%):   7%|▋         | 7/100 [02:10<16:15, 10.49s/it]

[92m15:22:23 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:22:28 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:22:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:22:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 3.57 / 8 (44.6%):   8%|▊         | 8/100 [02:15<13:23,  8.73s/it]

[92m15:22:28 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:22:36 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:22:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:22:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:22:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 4.23 / 9 (47.0%):   9%|▉         | 9/100 [02:25<13:41,  9.03s/it]

[92m15:22:38 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:22:43 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:22:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:22:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:22:43 - 

Average Metric: 4.80 / 10 (48.0%):  10%|█         | 10/100 [02:30<11:43,  7.82s/it]

[92m15:22:43 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:22:48 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:22:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:22:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 5.47 / 11 (49.7%):  11%|█         | 11/100 [02:35<10:20,  6.97s/it]

[92m15:22:48 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:22:52 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:22:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:22:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:22:52 - 

Average Metric: 6.24 / 12 (52.0%):  12%|█▏        | 12/100 [02:39<08:55,  6.08s/it]

[92m15:22:52 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:23:00 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:23:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:23:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 6.93 / 13 (53.3%):  13%|█▎        | 13/100 [02:47<09:32,  6.58s/it]

[92m15:23:00 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:23:04 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:23:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:23:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:23:04 - 

Average Metric: 6.93 / 14 (49.5%):  14%|█▍        | 14/100 [02:50<08:11,  5.72s/it]

[92m15:23:04 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:23:06 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:23:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:23:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 7.53 / 15 (50.2%):  15%|█▌        | 15/100 [02:53<06:48,  4.80s/it]

[92m15:23:06 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:23:09 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:23:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:23:09 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:23:09 - 

Average Metric: 8.30 / 16 (51.9%):  16%|█▌        | 16/100 [02:56<05:56,  4.25s/it]

[92m15:23:09 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:23:14 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:23:14 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:23:14 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 8.97 / 17 (52.8%):  17%|█▋        | 17/100 [03:01<06:07,  4.43s/it]

[92m15:23:14 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:23:18 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:23:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:23:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:23:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 9.74 / 18 (54.1%):  18%|█▊        | 18/100 [03:14<09:46,  7.15s/it]

[92m15:23:28 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:23:33 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:23:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:23:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:23:33 - 

Average Metric: 10.41 / 19 (54.8%):  19%|█▉        | 19/100 [03:30<13:01,  9.64s/it]

[92m15:23:43 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:23:46 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:23:46 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:23:46 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:23:46 - 

Average Metric: 10.94 / 20 (54.7%):  20%|██        | 20/100 [03:44<14:35, 10.95s/it]

[92m15:23:57 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:24:02 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:24:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:24:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:24:02 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 11.73 / 21 (55.9%):  21%|██        | 21/100 [04:36<30:46, 23.37s/it]

[92m15:24:50 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:24:54 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:24:54 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:24:54 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:24:54 - 

Average Metric: 12.51 / 22 (56.9%):  22%|██▏       | 22/100 [04:41<23:09, 17.81s/it]

[92m15:24:54 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:24:57 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:24:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:24:57 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:24:57 - 

Average Metric: 13.36 / 23 (58.1%):  23%|██▎       | 23/100 [04:51<19:52, 15.49s/it]

[92m15:25:04 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:25:05 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:25:05 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:25:05 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 14.02 / 24 (58.4%):  24%|██▍       | 24/100 [04:52<13:57, 11.02s/it]

[92m15:25:05 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:25:12 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:25:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:25:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:25:12 - 

Average Metric: 14.69 / 25 (58.8%):  25%|██▌       | 25/100 [04:58<12:08,  9.72s/it]

[92m15:25:12 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:25:12 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:25:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:25:12 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:25:12 - 

Average Metric: 15.36 / 26 (59.1%):  26%|██▌       | 26/100 [05:05<10:45,  8.73s/it]

[92m15:25:18 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:25:21 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:25:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:25:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 16.02 / 27 (59.3%):  27%|██▋       | 27/100 [05:08<08:36,  7.08s/it]

[92m15:25:21 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:25:29 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:25:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:25:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:25:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 16.81 / 28 (60.0%):  28%|██▊       | 28/100 [05:15<08:36,  7.18s/it]

[92m15:25:29 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:25:29 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:25:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:25:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:25:29 - 

Average Metric: 17.48 / 29 (60.3%):  29%|██▉       | 29/100 [05:23<08:38,  7.30s/it]

[92m15:25:36 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:25:37 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:25:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:25:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 18.30 / 30 (61.0%):  30%|███       | 30/100 [05:24<06:12,  5.32s/it]

[92m15:25:37 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:25:41 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:25:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:25:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:25:41 - 

Average Metric: 19.07 / 31 (61.5%):  31%|███       | 31/100 [05:28<05:45,  5.01s/it]

[92m15:25:41 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:25:43 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:25:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:25:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 19.07 / 32 (59.6%):  32%|███▏      | 32/100 [05:30<04:41,  4.14s/it]

[92m15:25:43 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:25:46 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:25:46 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:25:46 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:25:46 - 

Average Metric: 19.74 / 33 (59.8%):  33%|███▎      | 33/100 [05:33<04:05,  3.67s/it]

[92m15:25:46 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:25:52 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:25:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:25:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 19.74 / 34 (58.1%):  34%|███▍      | 34/100 [05:38<04:43,  4.29s/it]

[92m15:25:52 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:25:52 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:25:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:25:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:25:52 - 

Average Metric: 19.74 / 35 (56.4%):  35%|███▌      | 35/100 [05:39<03:26,  3.17s/it]

[92m15:25:52 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:26:01 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:26:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:26:01 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:26:01 - 

Average Metric: 20.63 / 36 (57.3%):  36%|███▌      | 36/100 [05:51<06:12,  5.82s/it]

[92m15:26:04 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:26:08 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:26:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:26:08 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:26:08 - 

Average Metric: 21.52 / 37 (58.1%):  37%|███▋      | 37/100 [06:00<07:02,  6.71s/it]

[92m15:26:13 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:26:17 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:26:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:26:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:26:17 - 

Average Metric: 22.20 / 38 (58.4%):  38%|███▊      | 38/100 [06:28<13:39, 13.22s/it]

[92m15:26:42 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:26:48 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:26:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:26:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 22.87 / 39 (58.6%):  39%|███▉      | 39/100 [06:34<11:21, 11.17s/it]

[92m15:26:48 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:26:51 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:26:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:26:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:26:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 23.53 / 40 (58.8%):  40%|████      | 40/100 [07:19<21:16, 21.28s/it]

[92m15:27:33 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:27:34 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:27:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:27:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:27:35 - 

Average Metric: 24.20 / 41 (59.0%):  41%|████      | 41/100 [07:32<18:18, 18.62s/it]

[92m15:27:45 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:27:45 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:27:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:27:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:27:45 - 

Average Metric: 24.53 / 42 (58.4%):  42%|████▏     | 42/100 [07:32<12:40, 13.12s/it]

[92m15:27:45 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:27:51 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:27:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:27:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 25.20 / 43 (58.6%):  43%|████▎     | 43/100 [07:38<10:21, 10.90s/it]

[92m15:27:51 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:27:58 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:27:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:27:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:27:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 25.80 / 44 (58.6%):  44%|████▍     | 44/100 [07:45<09:03,  9.71s/it]

[92m15:27:58 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:28:00 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:28:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:28:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 25.80 / 45 (57.3%):  45%|████▌     | 45/100 [07:47<06:50,  7.46s/it]

[92m15:28:00 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:28:06 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:28:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:28:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:28:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 26.47 / 46 (57.5%):  46%|████▌     | 46/100 [07:55<07:00,  7.80s/it]

[92m15:28:09 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:28:14 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:28:14 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:28:14 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:28:14 - 

Average Metric: 27.07 / 47 (57.6%):  47%|████▋     | 47/100 [08:05<07:27,  8.44s/it]

[92m15:28:19 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:28:21 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:28:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:28:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:28:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 27.67 / 48 (57.6%):  48%|████▊     | 48/100 [08:08<05:48,  6.70s/it]

[92m15:28:22 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:28:25 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:28:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:28:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 28.51 / 49 (58.2%):  49%|████▉     | 49/100 [08:11<04:48,  5.65s/it]

[92m15:28:25 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:28:29 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:28:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:28:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:28:29 - 

Average Metric: 29.11 / 50 (58.2%):  50%|█████     | 50/100 [08:16<04:27,  5.34s/it]

[92m15:28:29 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:28:36 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:28:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:28:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 29.11 / 51 (57.1%):  51%|█████     | 51/100 [08:23<04:41,  5.74s/it]

[92m15:28:36 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:28:37 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:28:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:28:37 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:28:37 - 

Average Metric: 29.78 / 52 (57.3%):  52%|█████▏    | 52/100 [08:24<03:29,  4.37s/it]

[92m15:28:37 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:28:43 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:28:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:28:43 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 30.63 / 53 (57.8%):  53%|█████▎    | 53/100 [08:30<03:53,  4.97s/it]

[92m15:28:44 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:28:47 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:28:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:28:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:28:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 31.30 / 54 (58.0%):  54%|█████▍    | 54/100 [08:33<03:26,  4.49s/it]

[92m15:28:47 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:28:50 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:28:50 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:28:50 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:28:50 - 

Average Metric: 31.96 / 55 (58.1%):  55%|█████▌    | 55/100 [08:42<04:12,  5.61s/it]

[92m15:28:55 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:28:58 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:28:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:28:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 32.63 / 56 (58.3%):  56%|█████▌    | 56/100 [08:45<03:33,  4.86s/it]

[92m15:28:58 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:29:04 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:29:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:29:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:29:04 - 

Average Metric: 33.30 / 57 (58.4%):  57%|█████▋    | 57/100 [09:16<09:07, 12.74s/it]

[92m15:29:29 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:29:35 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:29:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:29:35 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:29:35 - 

Average Metric: 33.91 / 58 (58.5%):  58%|█████▊    | 58/100 [09:27<08:33, 12.23s/it]

[92m15:29:40 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:29:45 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:29:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:29:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:29:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 34.68 / 59 (58.8%):  59%|█████▉    | 59/100 [10:16<15:49, 23.17s/it]

[92m15:30:29 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:30:31 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:30:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:30:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:30:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 35.35 / 60 (58.9%):  60%|██████    | 60/100 [10:22<12:01, 18.04s/it]

[92m15:30:35 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:30:40 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:30:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:30:40 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:30:40 - 

Average Metric: 36.20 / 61 (59.3%):  61%|██████    | 61/100 [10:26<09:08, 14.05s/it]

[92m15:30:40 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:30:41 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:30:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:30:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 36.20 / 62 (58.4%):  62%|██████▏   | 62/100 [10:28<06:26, 10.16s/it]

[92m15:30:41 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:30:47 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:30:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:30:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:30:47 - 

Average Metric: 36.87 / 63 (58.5%):  63%|██████▎   | 63/100 [10:34<05:32,  8.99s/it]

[92m15:30:47 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:30:52 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:30:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:30:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 37.53 / 64 (58.6%):  64%|██████▍   | 64/100 [10:39<04:38,  7.74s/it]

[92m15:30:52 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:30:58 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:30:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:30:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:30:58 - 

Average Metric: 38.31 / 65 (58.9%):  65%|██████▌   | 65/100 [10:46<04:28,  7.66s/it]

[92m15:31:00 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:31:04 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:31:04 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:31:05 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:31:05 - 

Average Metric: 38.92 / 66 (59.0%):  66%|██████▌   | 66/100 [10:55<04:30,  7.95s/it]

[92m15:31:08 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:31:10 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:31:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:31:10 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:31:10 - 

Average Metric: 38.92 / 67 (58.1%):  67%|██████▋   | 67/100 [10:57<03:25,  6.23s/it]

[92m15:31:10 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:31:17 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:31:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:31:17 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 39.49 / 68 (58.1%):  68%|██████▊   | 68/100 [11:04<03:27,  6.48s/it]

[92m15:31:17 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:31:18 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:31:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:31:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:31:18 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 40.09 / 69 (58.1%):  69%|██████▉   | 69/100 [11:05<02:27,  4.76s/it]

[92m15:31:18 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:31:24 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:31:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:31:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 40.76 / 70 (58.2%):  70%|███████   | 70/100 [11:10<02:30,  5.03s/it]

[92m15:31:24 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:31:24 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:31:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:31:24 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:31:24 - 

Average Metric: 41.08 / 71 (57.9%):  71%|███████   | 71/100 [11:11<01:47,  3.72s/it]

[92m15:31:24 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:31:30 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:31:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:31:30 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 41.85 / 72 (58.1%):  72%|███████▏  | 72/100 [11:16<01:58,  4.23s/it]

[92m15:31:30 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:31:31 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:31:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:31:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:31:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 42.52 / 73 (58.2%):  73%|███████▎  | 73/100 [11:18<01:28,  3.30s/it]

[92m15:31:31 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:31:39 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:31:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:31:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:31:39 - 

Average Metric: 43.09 / 74 (58.2%):  74%|███████▍  | 74/100 [11:29<02:32,  5.87s/it]

[92m15:31:43 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:31:47 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:31:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:31:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 43.76 / 75 (58.3%):  75%|███████▌  | 75/100 [11:34<02:13,  5.35s/it]

[92m15:31:47 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:31:58 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:31:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:31:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:31:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 44.51 / 76 (58.6%):  76%|███████▌  | 76/100 [12:05<05:12, 13.02s/it]

[92m15:32:18 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:32:25 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:32:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:32:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:32:25 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Ll

Average Metric: 45.17 / 77 (58.7%):  77%|███████▋  | 77/100 [12:12<04:22, 11.39s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:32:29 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:32:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:32:29 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:32:29 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 45.77 / 78 (58.7%):  78%|███████▊  | 78/100 [13:02<08:22, 22.85s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:33:19 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:33:19 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:33:19 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:33:19 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 46.62 / 79 (59.0%):  79%|███████▉  | 79/100 [13:12<06:41, 19.12s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:33:31 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:33:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:33:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:33:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:33:31 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 47.10 / 80 (58.9%):  80%|████████  | 80/100 [13:17<04:59, 14.99s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:33:33 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:33:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:33:33 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 47.10 / 81 (58.2%):  81%|████████  | 81/100 [13:19<03:30, 11.05s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:33:38 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:33:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:33:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:33:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:33:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 47.77 / 82 (58.3%):  82%|████████▏ | 82/100 [13:25<02:50,  9.49s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:33:44 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:33:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:33:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 48.44 / 83 (58.4%):  83%|████████▎ | 83/100 [13:31<02:21,  8.34s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:33:48 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:33:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:33:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:33:48 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 48.97 / 84 (58.3%):  84%|████████▍ | 84/100 [13:41<02:23,  8.95s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:34:23 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:34:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:34:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:34:23 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 49.58 / 85 (58.3%):  85%|████████▌ | 85/100 [14:10<03:43, 14.89s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:34:27 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:34:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:34:27 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:34:27 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP 

Average Metric: 50.25 / 86 (58.4%):  86%|████████▌ | 86/100 [14:17<02:56, 12.61s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:34:34 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:34:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:34:34 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 51.02 / 87 (58.6%):  87%|████████▋ | 87/100 [14:21<02:08,  9.92s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:34:41 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:34:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:34:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:34:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:34:41 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 51.56 / 88 (58.6%):  88%|████████▊ | 88/100 [14:28<01:48,  9.01s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:34:44 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:34:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:34:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 51.56 / 89 (57.9%):  89%|████████▉ | 89/100 [14:30<01:17,  7.08s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:34:47 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:34:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:34:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:34:47 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 52.22 / 90 (58.0%):  90%|█████████ | 90/100 [14:34<00:59,  5.95s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:34:52 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:34:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:34:52 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 53.08 / 91 (58.3%):  91%|█████████ | 91/100 [14:39<00:51,  5.72s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:34:58 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:34:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:34:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:34:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:34:58 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 53.62 / 92 (58.3%):  92%|█████████▏| 92/100 [14:45<00:45,  5.73s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:34:59 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:34:59 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:34:59 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 54.22 / 93 (58.3%):  93%|█████████▎| 93/100 [14:45<00:29,  4.21s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:35:06 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:35:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:35:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:35:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:35:06 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 54.99 / 94 (58.5%):  94%|█████████▍| 94/100 [14:55<00:34,  5.79s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:35:13 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:35:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:35:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:35:13 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 55.66 / 95 (58.6%):  95%|█████████▌| 95/100 [15:14<00:48,  9.76s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:35:32 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:35:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:35:32 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 56.23 / 96 (58.6%):  96%|█████████▌| 96/100 [15:19<00:33,  8.38s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:35:36 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:35:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:35:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:35:36 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m

Average Metric: 56.83 / 97 (58.6%):  97%|█████████▋| 97/100 [15:23<00:20,  6.96s/it]

[92m15:35:36 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m15:35:36 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
[92m15:35:36 - LiteLLM:INFO[0m: utils.py:2991 - 
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= models/Llama-3.2-3B-Instruct-Q8_0.gguf; provider = openai
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:35:39 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:35:39 - LiteLLM:INFO[0m: cost

Average Metric: 57.60 / 98 (58.8%):  98%|█████████▊| 98/100 [15:26<00:11,  5.85s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:35:44 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:35:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:35:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:35:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:35:44 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: model

Average Metric: 58.46 / 99 (59.0%):  99%|█████████▉| 99/100 [15:30<00:05,  5.43s/it]

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:35:45 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:35:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:35:45 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf


Average Metric: 59.31 / 100 (59.3%): 100%|██████████| 100/100 [15:32<00:00,  9.32s/it]

2025/06/03 15:35:45 INFO dspy.evaluate.evaluate: Average Metric: 59.30611045825858 / 100 (59.3%)





Unnamed: 0,question,example_response,gold_doc_ids,reasoning,pred_response,SemanticF1
0,does using == in javascript ever make sense?,"Yes, using `==` in JavaScript can make sense and is convenient in ...","[5778, 5791, 5818]",The use of `==` in JavaScript can be misleading due to type coerci...,"Yes, using `==` in JavaScript can make sense in certain situations...",✔️ [0.667]
1,what is the difference between a virus and trojan?,The terms have a great deal of overlap and aren't necessarily mutu...,"[3768, 3769, 3888, 3890, 4046]","A virus and a Trojan are both types of malware, but they differ in...",A virus and a Trojan are different types of malware. A virus is a ...,✔️ [0.774]


🏃 View run eval at: http://localhost:5500/#/experiments/344816129373506955/runs/b07c1bab32b4464e83ab35e35a59514f
🧪 View experiment at: http://localhost:5500/#/experiments/344816129373506955


59.31

INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:35:51 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m15:35:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:35:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:httpx:HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
[92m15:35:52 - LiteLLM:INFO[0m: utils.py:1213 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[

In [22]:
optimized_rag.save("optimized_rag.json")

loaded_rag = RAG()
loaded_rag.load("optimized_rag.json")

loaded_rag(question="cmd+tab does not work on hidden or minimized windows")

Prediction(
    reasoning='The issue with cmd+tab not working on hidden or minimized windows is due to a change in the behavior of the Mission Control system preference. The default setting now hides tabs when switching to an application, making it difficult to access minimized windows using cmd+tab. Various workarounds have been suggested, including using Control-CMD-F for full screen mode, switching to all open apps before hiding, or using third-party applications like iCanHazShortcut or Automator Services.',
    response='To access minimized windows using cmd+tab, try the following workarounds:\n- Use Control-CMD-F for full screen mode to access minimized windows.\n- Switch to all open apps before hiding.\n- Use the following AppleScript to switch to the first minimized window: `delay 0.5 set i to 0 tell application System Events set first_app to name of the first process whose frontmost is true repeat with p in every process if visible of p then set i to i + 1 end if end repeat rep

[92m15:35:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:35:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:35:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: openai/models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:35:51 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: models/Llama-3.2-3B-Instruct-Q8_0.gguf
INFO:LiteLLM:selected model name for cost calculation: models/Llama-3.2-3B-Instruct-Q8_0.gguf
[92m15:35:57 - LiteLLM:INFO[0m: co