I am a Ph.D. student at Penn State University advised by Dr. Rui Zhang. I’m interested in building reliable and trustworthy NLP systems.
[Personal Website] [Google Scholar] [Semantic Scholar]
-
ReaLMistake [huggingface dataset] [code]
- Paper: Evaluating LLMs at Detecting Errors in LLM Responses (COLM 2024)
- Benchmark for evaluating error detection methods that detect mistakes in LLM responses
- Expert error annotations on responses from GPT-4 and Llama 2 70B on three tasks
-
WiCE [dataset and code]
- Paper: WiCE: Real-World Entailment for Claims in Wikipedia (EMNLP2023)
- Dataset for document-level NLI
- Fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia
- Shortcomings of Question Answering Based Factuality Frameworks for Error Localization [human annotation]