### Import data

In [1]:
import pandas as pd
df = pd.read_csv('../KPM/data/KPM_test_data.csv')
print(df.columns)

Index(['topic', 'stance', 'arg_id', 'argument', 'key_point_id', 'key_point',
       '4_7_8_15', '0_1_2_3', '5_6_9_10', '11_12_13_14', '16_17_18_19',
       '20_21_22_23', '24_25_26_27', 'avg', 'gpt4', 'llama2', 'qwen'],
      dtype='object')


### Calculate confidence score with LLama2

In [None]:
import yaml
with open("../conf/index.yaml") as f:
    credentials = yaml.safe_load(f)
Llama2_api_token = credentials['environment_variables']['LLAMA2_API_TOKEN']

In [2]:
import ast
import re
def generating(topic, argument,kp):
    try:
        import replicate
        replicate = replicate.Client(api_token=Llama2_api_token)
        output = replicate.run(
            "meta/llama-2-70b-chat",
            input={
                "debug": False,
                "top_p": 1,
                "prompt": "argument:"+argument+", kp:"+kp,
                "temperature": 0.5,
                "system_prompt": f"""
            You need to evaluate to what extend the 2 sentences matches to each other based on the topic "{topic}" and return a confidence score between 0 and 1. 0 represents totally not match, 1 represents very match. Your return should be in this format:Confidence Score: number
            """,
                "max_new_tokens": 500,
                "min_new_tokens": -1,
                "prompt_template": "[INST] <<SYS>>\n{system_prompt}\n<</SYS>>\n\n{prompt} [/INST]",
                "repetition_penalty": 1.15
            }
        )
        # print(output)
        # The meta/llama-2-70b-chat model can stream output as it's running.
        # The predict method returns an iterator, and you can iterate over that output.
        output_sentence = ' '
        print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
        for item in output:
            output_sentence+=item
            # print(item, end="")

        print(output_sentence)
        confidence_score_match = re.findall(r'Confidence Score: (\d+\.\d+)', output_sentence)

        if confidence_score_match:
            score = float(confidence_score_match[0])
            print("KP score: "+str(score))
            print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
            return score
        else:
            print(f"Error. Regenerating string...")
            return generating(topic, argument, kp)
    except (SyntaxError, ValueError) as e:
        print(f"Error: {e}. Regenerating string...")
        return generating(topic, argument, kp)


In [26]:
import ast
for index,row in df.iloc[3000:].iterrows():
    topic = row['topic']
    kp = row['key_point']
    argument = row['argument']
    confidence_score =generating(topic,argument,kp)
    print(index)
    print("Confidence Score: "+ str(confidence_score))
    df.at[index,'llama2'] = confidence_score

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Confidence Score: 0.8

The two sentences match to a significant extent, as they both relate to the idea of the United States being a good country to live in due to its democratic rights and freedoms. The first sentence specifically mentions the freedom of religion, speech, press, peaceful assembly, and petition, which are all fundamental rights that are protected in the United States. The second sentence, on the other hand, highlights the US's freedoms and democratic rights. Therefore, there is a strong connection between the two sentences, and they both contribute to the topic of the USA being a good country to live in. However, there is still a small degree of uncertainty, as the second sentence does not explicitly mention the same specific freedoms as the first sentence.
KP score: 0.8
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3000
Confidence Score: 0.8
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Confidence Score: 0.5

The two sentences do 

In [28]:
df.to_csv('../KPM/data/KPM_test_data_1.csv', index=False)
print("Add new data successfully!!!!!")

Add new data successfully!!!!!
Add new data successfully!!!!!


### Calulate averrage

In [1]:
import pandas as pd

df = pd.read_csv('../data/ground_truth_test.csv')
print(df.columns)

Index(['topic', 'argument', 'key_point', 'stance', 'label',
       'cos_similarity_bert_embedding(gpt4)', 'gpt4', 'llama2', 'qwen'],
      dtype='object')


In [2]:
score = df['llama2'].values.tolist()
print(sum(score)/len(score))

0.7213153153153465
