Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Evaluation] Consider "variance" over LLM output trials #204

Open
jonathanlastmileai opened this issue Nov 13, 2023 · 0 comments
Open

[Evaluation] Consider "variance" over LLM output trials #204

jonathanlastmileai opened this issue Nov 13, 2023 · 0 comments
Labels

Comments

@jonathanlastmileai
Copy link
Contributor

@rben01 pointed out something very interesting: due to the stochasticity of LLM outputs, any given metric will in fact be an RV, i.e. have some nonzero variance, which is undesirable because it implies low precision in evaluating your LLM. A lower-variance estimator can be implemented by brute force, analogous to bootstrapping: run the LLM N times and measure the sample's "variance" using some technique simpler than the LLM itself.

This will apply when either the LLM API used does not guarantee reproducible outputs, or stochasticity is explicitly requested via temperature or some related inference parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant