Paper: https://arxiv.org/abs/2410.03182
- Create a Python (3.12) venv, then
pip install -r requirements.txt - Get API keys in .env:
cp .env.example .envand populate your.envfile (withOPENAI_API_KEYandREPLICATE_API_KEY) - Run the notebooks
From a list of candidate words, generate example sentneces using GPT-4o and Llama-3.1-405B.
Interpret annotation ratings: inter-annotator agreement, performance per model and per language.
Calculate correlations between example GDEX ratings and pre-trained metrics (perplexity, mask probability, entropy).
Rate an example using an LLM, using 10 previous ratings for in-context learning (ICL), to align the LLM with a specific annotator.
Annotated examples are in select_examples_[gpt4,llama]_[fra,ind,tdt]_eng_rated_A[1,2].tsv