### Import everything

In [1]:
import os
import json
import pandas as pd

### Read in JSONS

In [2]:
def read_json_files(file_names):
    data = []
    for file_name in file_names:
        with open(file_name, 'r') as file:
            data.append(json.load(file))
    return data

# read everyone's elo comparison files
file_names = ["comparison_jsons/elo/h1_elo.json", 
              "comparison_jsons/elo/h6_elo.json", "comparison_jsons/elo/h7_elo.json", "comparison_jsons/elo/h5_elo.json", 
              "comparison_jsons/elo/h4_elo.json", "comparison_jsons/elo/h3_elo.json", "comparison_jsons/elo/h2_elo.json"]
data = read_json_files(file_names)

### Separate all twenty sample sentences into separate DFs

In [3]:
# labelers
labelers = ['llama2', 'llama3', 'mistral', 'Human1', 'Human2', 'Human3', 'Human4', 'Human5', 'Human6', 'Human7']

# sentences and IDs: this was the way I wanted to do it
sample_sentences = {
    1: "It is expected that both ecological and social compensation measures improve the overall acceptance of a local infrastructure, thus anchoring the latter within the surrounding community.",
    2: "The prompt acceleration of food aid pledges and deliveries once the media story on the famine broke, even though the worst of the crisis was over by then, and the poorer response to non-food emergency needs, which received less press attention, lends weight to another familiar hypothesis: that one of these non-humanitarian criteria is media pressure.",
    3: "Some men were hesitant to assist with household chores because they believed it would create an expectation of ongoing involvement with chores after pregnancy.",
    4: "Pastoralists in the arid and semi-arid regions of Mali continue to face increasing risk due to low levels of rainfall.",
    5: "The recent investment in improving existing roads that connect administrative centres to regional capitals, combined with the advance in use of mobile phone communications, have facilitated the development of new and efficient cereal trade routes and reduced marketing margins.",
    6: "Wind turbine foundations may act as artificial reefs, which could increase both the number of shellfish and the animals that feed on them, including other fish and marine mammals.",
    7: "However, wars have been waged to reduce demographic pressures arising from the scarcity of arable land, the clearest example being the move to acquire Lebensraum (living space) that motivated Nazi Germany aggression toward Poland and Eastern Europe.",
    8: "The direct and indirect effects from offshore wind farm development have been less well documented, but they include similar risks to bird and bat collisions; disruption of marine mammal corridors as well as harm to marine mammals, fish, and sea turtles from construction of bottom-mounted turbine towers; the potential for scour and sediment resuspension around the foundations of bottom-mounted wind turbines; and some evidence of displacement or barrier effects because of the presence of large offshore wind farms.",
    9: "The study shows that the cultivation of crops, a critical aspect of food security, is gravely under threat given low cultivation of crops as a result of BH atrocities.",
    10: "Various sources attest that elevated mortality stemming from conflict is due more to population displacement, deficiencies of clean water and sanitation, exposure to disease, and public-health failures rather than to direct blows from violence.",
    11: "Civilians may also expand agricultural production due to cuts in a stable food supply.",
    12: "The failure of programmes to reintegrate people into production implemented in the 1990s within the framework of the implementation of the 1992 Peace Accords forced many former combatants to resort to criminal and antisocial behaviour. As we have pointed out elsewhere (1996), the failure of these programmes has to do with the fact that in spite of the majority of them targeting farming, the serious structural crisis the agricultural sector was undergoing was underestimated in their design and implementation.",
    13: "On the supply side, both Huthi/Saleh forces and the Hadi government and its Saudi-led coalition allies repeatedly have hindered the movement of aid and commercial goods to the population. Huthi/Saleh violations are most egregious in the city of Taiz, where their fighters have enforced a full or partial blockade since 2015, with devastating humanitarian consequences. They routinely interfere with the work of humanitarians, at times demanding the diversion of aid to themselves or denying aid workers access to populations in need, revoking visas or even detaining them. They heavily tax all imports into their areas in part to finance the war effort and also run a black market in fuel, enriching military elites while driving prices up for transport of vital commodities.",
    14: "Increased grievances against the state, when agricultural deficits at the state level result in losses of tax revenues and higher food prices.",
    15: "Borrowing food was observed to be the first step to mitigate the adverse effect of food insecurity at the household level in Yemen.",
    16: "ASHA must tell the woman during pregnancy so that the woman must have that information with her. So, as soon as the child takes birth he/she will be fed with colostrum which will prevent the child from diseases.",
    17: "For me, when I got married and my wife was pregnant, I registered her in general hospital, and also in a traditional Centre. Because my understanding is that, there are medications in the hospital and also another type of medications from the traditional.",
    18: "Even, we do not want to be uncovered so that our private parts do not be seen.",
    19: "The use of contraception was linked vicariously to promiscuity: Promiscuous women were linked to the use of contraception, multiple abortions and disease or infections.",
    20: "I had vaginal discharge four months ago and I used a traditional medicine it stopped."
}

# separate data for each sample sentence
sentence_data = {sentence_id: [] for sentence_id in sample_sentences.keys()}
for file_data in data:
    print(file_data)
    for key, value in file_data.items():
        sentence_text, labelers_pair, sentence_id, winner = value
        if sentence_id in sample_sentences.keys():
            sentence_data[sentence_id].append(value)

{'0': ['Pastoralists in the arid and semi-arid regions of Mali continue to face increasing risk due to low levels of rainfall.', ['Human6', 'mistral'], 4, 'Human6'], '1': ['The prompt acceleration of food aid pledges and deliveries once the media story on the famine broke, even though the worst of the crisis was over by then, and the poorer response to non-food emergency needs, which received less press attention, lends weight to another familiar hypothesis: that one of these non-humanitarian criteria is media pressure.', ['Human7', 'Human2'], 2, 'Human2'], '2': ['For me, when I got married and my wife was pregnant, I registered her in general hospital, and also in a traditional Centre. Because my understanding is that, there are medications in the hospital and also another type of medications from the traditional.', ['Human6', 'Human4'], 17, 'Human6'], '3': ['The direct and indirect effects from offshore wind farm development have been less well documented, but they include similar ri

In [4]:
# demonstrate that 1: contains all of the instances of sentence 1 from all files
sentence_data[1]

[['It is expected that both ecological and social compensation measures improve the overall acceptance of a local infrastructure, thus anchoring the latter within the surrounding community.',
  ['llama2', 'Human4'],
  1,
  'Human4'],
 ['It is expected that both ecological and social compensation measures improve the overall acceptance of a local infrastructure, thus anchoring the latter within the surrounding community.',
  ['Human7', 'Human3'],
  1,
  'Human3'],
 ['It is expected that both ecological and social compensation measures improve the overall acceptance of a local infrastructure, thus anchoring the latter within the surrounding community.',
  ['llama3', 'Human6'],
  1,
  'Tie'],
 ['It is expected that both ecological and social compensation measures improve the overall acceptance of a local infrastructure, thus anchoring the latter within the surrounding community.',
  ['mistral', 'Human4'],
  1,
  'mistral'],
 ['It is expected that both ecological and social compensation me

### Perform ELO for each of the twenty sentences' dfs to determine ground truth

In [5]:
# ELO rating update functions from Dr. ChatGPT
def update_elo(winner_elo, loser_elo, k=32):
    expected_winner = 1 / (1 + 10 ** ((loser_elo - winner_elo) / 400))
    expected_loser = 1 - expected_winner
    winner_elo += k * (1 - expected_winner)
    loser_elo += k * (0 - expected_loser)
    return winner_elo, loser_elo

def calculate_elo_for_sentence(sentence_id, comparisons):
    elo_ratings = {labeler: 1000 for labeler in labelers}
    
    # process comparisons for this sentence ID
    for value in comparisons:
        sentence, labelers_pair, _, winner = value
        if winner != "Tie":
            loser = labelers_pair[1] if winner == labelers_pair[0] else labelers_pair[0]
            elo_ratings[winner], elo_ratings[loser] = update_elo(elo_ratings[winner], elo_ratings[loser])
    
    # prepare data for DataFrame
    elo_df_data = {'sentence_start': [sample_sentences[sentence_id]]}
    for labeler in labelers:
        elo_df_data[labeler] = [elo_ratings[labeler]]
    elo_df_data['winner'] = [max(elo_ratings, key=elo_ratings.get)]
    
    return pd.DataFrame(elo_df_data)  # return DF of the scores/winners for each sentence

In [6]:
# elo calculations for all 20 sentences
sentence_dfs = []
for sentence_id, comparisons in sentence_data.items():
    sentence_dfs.append(calculate_elo_for_sentence(sentence_id, comparisons))

# concatenate DFs
final_df = pd.concat(sentence_dfs, ignore_index=True)
final_df.index = range(1, len(final_df) + 1)
final_df.index.name = 'sentence ID'

### Display results

In [7]:
# Display the final DataFrame
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)
print(final_df.iloc[:,1:])  # ignore sentence column

                  llama2       llama3      mistral       Human1       Human2  \
sentence ID                                                                    
1             911.752855  1069.140509  1037.010242  1034.104691  1034.671978   
2            1029.529978   970.329772  1031.515572   892.285353  1116.109299   
3            1024.852354   891.709846   952.246685   915.193939  1013.844112   
4            1015.636074  1047.552381  1043.942990   930.772495  1045.782044   
5             900.194717   918.843927   922.947856  1080.443476   956.008924   
6             878.537629   956.446692   975.499404  1029.875274  1016.208313   
7             945.260318   892.730461  1001.992705   986.977424  1073.525225   
8             958.532522   890.639909   892.192046  1100.599298  1016.607988   
9            1059.591856   996.927805   969.139095  1023.341983  1057.836703   
10            910.291110   914.146934   918.780711  1064.497773  1053.878049   
11            984.813125  1032.068295   

### Save as a csv

In [8]:
# Save the final DataFrame to a CSV file
final_df.to_csv('final_elo_rankings.csv', index=False)