In this project we are performing unsupervised evaluation of LLM translation on instances of Adjective + Noun Pairs.
The source of Model Outputs is gpt-4o under a set prompt. We'll use several NLP models for corpora access and processing; as well as Wiktionary dumps for definitions and direct translations. Detailed project info can be found in README

Starting with processing raw JSON dictionaries for desired target and source languages. Those are manually downloaded from https://kaikki.org/dictionary/rawdata.html

In [1]:
from pathlib import Path

from src.factors.eval_factors import dictionary_translation_match

project_root = Path(__file__).resolve().parents[2]
data = project_root/'data'

  from .autonotebook import tqdm as notebook_tqdm


NameError: name '__file__' is not defined

Set of parameters for the algorithm - Source_lang, source_lang_code, Target_lang, target_lang_code, part of speech (plural)

In [None]:
%run processing/processor/filter.py 'English', 'en', 'Russian', 'ru', 'adjectives'
%run processing/processor/filter.py 'Russian', 'ru', 'English', 'en', 'adjectives'
#...

In [None]:
import pandas as pd

for chunk in pd.read_csv(project_root/'data'/'processed'/'English_nouns.csv', chunksize=10):
    print(chunk)

Now let's import the GPT wrapper and request a set for evaluation. The pairs will then be converted into a custom class of AdjectiveNounPair (subclass of Word). All pairs will then be treated as instances of this class.

In [None]:
from models.OpenAIModel import *
from objects.AdjectiveNounPair import AdjectiveNounPair

input_source, input_target = gpt.request('20 English and translate to Russian')
# can be custom request with required parameters of target language or code and number

source_to_target = dict()
for i in range(len(input_source)):
    source = AdjectiveNounPair(input_source[i], 'en')
    target = AdjectiveNounPair(input_target[i], 'ru')
    source_to_target[source] = target

print(source_to_target)

Now for each of the pairs will be processed with functions from eval_factors.py to get output scores.

In [None]:
from src.factors.eval_factors import *

evals = pd.DataFrame(columns=[
    'source',
    'target',
    'cosine_similarity',
    'natural_fluency',
    'dictionary_match',
    'commonness_match'])

Will process all evals for all input pairs parsing results into a DataFrame

In [None]:
for source in source_to_target:

    cosine_score = cosine_similarity(source, source_to_target[source])
    #cosine similarity using 'paraphrase-multilingual-MiniLM-L12-v2' for pretty accurate vector similarity score
    natflu_score = natural_fluency(source_to_target[source])
    #a set of determinants as 'case' match, correct order of two words, etc. combined into a whole score
    dictionary_score = dictionary_translation_match(source, source_to_target[source])
    #dictionary will often have translations of source words to target language, we assess how they match
    commonness_score = commonness_match(source, source_to_target[source])
    #psuedo perplexity check to determine how well commonness of source words relates to
    #commonness of target words
    instance = pd.Series([
    source.original,
    source_to_target[source].original,
    cosine_score,
    natflu_score,
    dictionary_score,
    commonness_score])

    evals = pd.concat([evals, instance], ignore_index=True)

In [None]:
print(evals)