-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use custom dataset for evaluate? #1383
Comments
have you know how to eval in own dataset? |
One way is to evaluate the models responses on the test set. I have an example here: https://github.com/rasbt/LLM-workshop-2024/blob/main/06_finetuning/06_part-4.ipynb Note that the code is not 100% complete there as this is for a workshop where part of it is an exercise, but let me post the missing parts below: IN: from litgpt import LLM
from tqdm import tqdm
llm = LLM.load("path/to/your/model")
for i in tqdm(range(len(test_data))):
response = llm.generate(format_input(test_data[i]))
test_data[i]["response"] = response OUT:
IN: with open("test_with_response.json", "w") as json_file:
json.dump(test_data, json_file, indent=4)
del llm
llm = LLM.load("meta-llama/Meta-Llama-3-8B-Instruct", access_token="...")
def generate_model_scores(json_data, json_key):
scores = []
for entry in tqdm(json_data, desc="Scoring entries"):
prompt = (
f"Given the input `{format_input(entry)}` "
f"and correct output `{entry['output']}`, "
f"score the model response `{entry[json_key]}`"
f" on a scale from 0 to 100, where 100 is the best score. "
f"Respond with the integer number only."
)
score = llm.generate(prompt, max_new_tokens=50)
try:
scores.append(int(score))
except ValueError:
continue
return scores
scores = generate_model_scores(json_data, "response")
print(f"\n{model}")
print(f"Number of scores: {len(scores)} of {len(json_data)}")
print(f"Average score: {sum(scores)/len(scores):.2f}\n") OUT:
|
Hey all, I added a more verbose explanation of this approach to the docs here: https://github.com/Lightning-AI/litgpt/blob/main/tutorials/evaluation.md#evaluating-on-a-custom-test-set |
No description provided.
The text was updated successfully, but these errors were encountered: