GitHub - kevinyaobytedance/llm_eval: LLM evaluation.

Code for "Trustworthy LLMs: a Survey and Guideline for Evaluations and Alignments"

scripts/: Scripts for testing LLM trustworthiness in Section 11 of the paper.

test_hallucination.py: Test LLM Hallucination (Section 11.2).
test_safety.py: Test the safety of LLM responses (Section 11.3).
test_fairness.py: Test the fairness of LLM responses (Section 11.4).
test_confident_eval_fair.py and test_confident_eval.py: Test miscalibration of LLMs' confidence (Section 11.5).
test_misuse.py: Test the resistence to misuse in LLMs (Section 11.6).
test_copytight.py: Test the copyright-protected data leakage in LLMs (Section 11.7).
test_causal.py: Test LLMs' causal reasoning ability (Section 11.8).
test_typo.py: Test LLMs' robustness against typo attacks (Section 11.9).

gen_data/: Generated data and results from our testing.

Note that we omit copyright results because it contains copyright-protected text.

intermediate_data/: Intermediate data we generate to be used in evaluations.

Citation:

@inproceedings{liu2023trustllm, 
title={Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment}, 
author={Liu, Yang and Yao, Yuanshun Yao and Ton, Jean-Francois Ton and Zhang, Xiaoying and Guo, Ruocheng and Klochkov, Yegor and Taufiq, Muhammad Faaiz and Li, Hang}, 
booktitle={preprint}, 
year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
data		data
gen_data		gen_data
intermediate_data		intermediate_data
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

gen_data

gen_data

intermediate_data

intermediate_data

scripts

scripts

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Code for "Trustworthy LLMs: a Survey and Guideline for Evaluations and Alignments"

About

Releases

Packages

Languages

License

kevinyaobytedance/llm_eval

Folders and files

Latest commit

History

Repository files navigation

Code for "Trustworthy LLMs: a Survey and Guideline for Evaluations and Alignments"

About

Resources

License

Stars

Watchers

Forks

Languages