## Script for creating json for manual annotation

The goal is to create an organized json for the 13 sample cases from 2024 that we are working with. The format for each json will be as follows:

```
[
    {
        transcript_id: 'year.docket_num-t01',
        'petitioner': {
            'opening_statement': 'text_of_opening_statement',
            'full_text': 'full_text_of_statement'
            'sotomayor': {
                'true_questions': [{
                    'model_name': 'llama-3.1-70B',
                    'coherent': [<list_of_actual_coherent_questions_by_sotomayor>],
                    'incoherent': [<list_of_actual_incoherent_questions_by_sotomayor>],
                }],
                'predicted_questions': [
                    {
                        'model_name': 'llama-3.1-70B',
                        'model_questions': [<list_of_questions_generated_by_model_sotomayor>]
                    },
                    {
                        'model_name': 'gpt4o',
                        'model_questions': [<list_of_questions_generated_by_model_sotomayor>]
                    },
                ]
            },
            'alito': {
                'true_questions': [{
                    'model_name': 'llama-3.1-70B',
                    'coherent': [<list_of_actual_coherent_questions_by_alito>],
                    'incoherent': [<list_of_actual_incoherent_questions_by_alito>],
                }],
                'predicted_questions': [
                    {
                        'model_name': 'llama-3.1-70B',
                        'model_questions': [<list_of_questions_generated_by_model_alito>]
                    },
                    {
                        'model_name': 'gpt4o',
                        'model_questions': [<list_of_questions_generated_by_model_alito>]
                    },
                ]
            },
        'respondent': {
            'opening_statement': 'text_of_opening_statement',
            'full_text': 'full_text_of_statement'
            'sotomayor': {
                'true_questions': [{
                    'model_name': 'llama-3.1-70B',
                    'coherent': [<list_of_actual_coherent_questions_by_sotomayor_to_respondent>],
                    'incoherent': [<list_of_actual_incoherent_questions_by_sotomayor_to_respondent>],
                }],
                'predicted_questions': [
                    {
                        'model_name': 'llama-3.1-70B',
                        'model_questions': [<list_of_questions_generated_by_model_sotomayor>]
                    },
                    {
                        'model_name': 'gpt4o',
                        'model_questions': [<list_of_questions_generated_by_model_sotomayor>]
                    },
                ]
            },
            'alito': {
                'true_questions': [{
                    'model_name': 'llama-3.1-70B',
                    'coherent': [<list_of_actual_coherent_questions_by_alito>],
                    'incoherent': [<list_of_actual_incoherent_questions_by_alito>],
                }],
                'predicted_questions': [
                    {
                        'model_name': 'llama-3.1-70B',
                        'model_questions': [<list_of_questions_generated_by_model_alito>]
                    },
                    {
                        'model_name': 'gpt4o',
                        'model_questions': [<list_of_questions_generated_by_model_alito>]
                    },
                ]
            },        
        }
    },
]

```

In [16]:
import pandas as pd
import json

In [17]:
df_llm_questions = pd.read_csv('../datasets/2024_full_text_sotomayor_alito_questions.csv')
df_llm_questions.head()

Unnamed: 0.1,Unnamed: 0,transcript_id,petitioner_opening_text,petitioner_full_text,respondent_opening_statement,respondent_full_text,questions_sotomayor_petitioner,questions_sotomayor_respondent,questions_alito_respondent,questions_alito_petitioner
0,0,2024.23-621-t01,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...",<speaker>Anthony A. Yang</speaker><text> Mr. C...,<speaker>Anthony A. Yang</speaker><text> Mr. C...,"[""Can you explain why you believe the Court sh...","[""Counselor, how do you respond to the argumen...","[""Can you explain how your proposed bright-lin...","[""How do you respond to the argument that the ..."
1,1,2024.23-365 -t01,<speaker>Lisa S. Blatt</speaker><text> Thank y...,"<speaker>John G. Roberts, Jr.</speaker><text> ...",<speaker>Easha Anand</speaker><text> Mr. Chief...,<speaker>Easha Anand</speaker><text> Mr. Chief...,"[""Isn't it true that a preliminary injunction ...","[""Counselor, how do you respond to the concern...","[""Counselor, how do you respond to the argumen...","[""Isn't it true that some circuits have adopte..."
2,2,2024.23-852-t01,<speaker>Elizabeth B. Prelogar</speaker><text>...,"<speaker>John G. Roberts, Jr.</speaker><text> ...",<speaker>Peter A. Patterson</speaker><text> Mr...,<speaker>Peter A. Patterson</speaker><text> Mr...,"[""Counselor, can you explain how the bright-li...","[""How do you respond to the argument that a pr...","[""Isn't your proposed bright-line rule overly ...","[""Can you clarify why you believe that a preli..."
3,3,2024.23-980-t01,<speaker>Kannon K. Shanmugam</speaker><text> T...,"<speaker>John G. Roberts, Jr.</speaker><text> ...",<speaker>Kevin K. Russell</speaker><text> Mr. ...,<speaker>Kevin K. Russell</speaker><text> Mr. ...,"[""Counselor, doesn't your argument rely too he...","[""How do you respond to the argument that the ...","[""Can you explain why the legislative history ...","[""How do you respond to the argument that a pr..."
4,4,2024.23-191 -t01,<speaker>Adam G. Unikowsky</speaker><text> Mr....,"<speaker>John G. Roberts, Jr.</speaker><text> ...","<speaker>Edmund G. LaCour, Jr.</speaker><text>...","<speaker>Edmund G. LaCour, Jr.</speaker><text>...","[""Counselor, doesn't the fact that a prelimina...","[""Counselor, isn't it true that your proposed ...","[""Counselor, how do you respond to the argumen...","[""Doesn't the fact that a preliminary injuncti..."


In [18]:
df_actual_questions = pd.read_csv('../datasets/2024_all_questions_coherence_labeled_Meta-Llama-3.1-70B-Instruct.csv')
df_actual_questions.head()

Unnamed: 0,transcript_id,question_addressee,justice,question_text,opening_statement,full_text,label
0,2024.23-621-t01,petitioner,Clarence Thomas,You --can a consent decree or a default judgm...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...",incoherent
1,2024.23-621-t01,petitioner,Clarence Thomas,But I thought your argument hinged on a court...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...",coherent
2,2024.23-621-t01,petitioner,"John G. Roberts, Jr.",What do you do with the formulation by your f...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...",incoherent
3,2024.23-621-t01,petitioner,Elena Kagan,"Well, it's -- it's true that it's only a lik...",<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...",coherent
4,2024.23-621-t01,petitioner,Ketanji Brown Jackson,But it's not that determination that's making...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...",coherent


#### Create json

In [19]:
transcript_ids = df_llm_questions['transcript_id'].unique()
len(transcript_ids)

13

In [20]:
df_llm_questions.set_index('transcript_id', inplace=True)
# df_llm_questions.head()

In [21]:
def get_actual_justice_questions(df, t_id, addressee, justice, coherence):
    df = df[df['justice'] == justice]
    df = df[df['label'] == coherence]
    df = df[df['transcript_id'] == t_id]
    df = df[df['question_addressee'] == addressee]
    return df['question_text'].to_list()

In [22]:
results = []

for t_id in transcript_ids:
    row = df_llm_questions.loc[t_id]
    result = {
        'transcript_id': t_id,
        'petitioner': {
            'opening_statement': row['petitioner_opening_text'],
            'full_text': row['petitioner_full_text'],
            'sotomayor': {
                'true_questions': [
                    {
                        'model_name': 'Meta-Llama-3.1-70B-Instruct',
                        'coherent': get_actual_justice_questions(df_actual_questions, t_id, 'petitioner', 'Sonia Sotomayor', 'coherent'),
                        'incoherent': get_actual_justice_questions(df_actual_questions, t_id, 'petitioner', 'Sonia Sotomayor', 'incoherent'),
                    }
                ],
                'predicted_questions': [
                    {
                        'model_name': 'Meta-Llama-3.1-70B-Instruct',
                        'model_questions': json.loads(row['questions_sotomayor_petitioner'])
                    },
                    # {
                    #     'model_name': 'gpt4o',
                    #     'model_questions': [<list_of_questions_generated_by_model_sotomayor>]
                    # },
                ]
            },
            'alito': {
                'true_questions': [
                    {
                        'model_name': 'Meta-Llama-3.1-70B-Instruct',
                        'coherent': get_actual_justice_questions(df_actual_questions, t_id, 'petitioner', 'Samuel A. Alito, Jr.', 'coherent'),
                        'incoherent': get_actual_justice_questions(df_actual_questions, t_id, 'petitioner', 'Samuel A. Alito, Jr.', 'incoherent'),
                    },
                ],
                'predicted_questions': [
                    {
                        'model_name': 'Meta-Llama-3.1-70B-Instruct',
                        'model_questions': json.loads(row['questions_alito_petitioner'])
                    },
                    # {
                    #     'model_name': 'gpt4o',
                    #     'model_questions': [<list_of_questions_generated_by_model_alito>]
                    # },
                ]
            },
        },
        'respondent': {
            'opening_statement': row['respondent_opening_statement'],
            'full_text': row['respondent_full_text'],
            'sotomayor': {
                'true_questions': [
                    {
                        'model_name': 'Meta-Llama-3.1-70B-Instruct',
                        'coherent': get_actual_justice_questions(df_actual_questions, t_id, 'respondent', 'Sonia Sotomayor', 'coherent'),
                        'incoherent': get_actual_justice_questions(df_actual_questions, t_id, 'respondent', 'Sonia Sotomayor', 'incoherent'),
                    }
                ],
                'predicted_questions': [
                    {
                        'model_name': 'Meta-Llama-3.1-70B-Instruct',
                        'model_questions': json.loads(row['questions_sotomayor_respondent'])
                    },
                    # {
                    #     'model_name': 'gpt4o',
                    #     'model_questions': [<list_of_questions_generated_by_model_sotomayor>]
                    # },
                ]
            },
            'alito': {
                'true_questions': [
                    {
                        'model_name': 'Meta-Llama-3.1-70B-Instruct',
                        'coherent': get_actual_justice_questions(df_actual_questions, t_id, 'respondent', 'Samuel A. Alito, Jr.', 'coherent'),
                        'incoherent': get_actual_justice_questions(df_actual_questions, t_id, 'respondent', 'Samuel A. Alito, Jr.', 'incoherent'),
                    },
                ],
                'predicted_questions': [
                    {
                        'model_name': 'Meta-Llama-3.1-70B-Instruct',
                        'model_questions': json.loads(row['questions_alito_respondent'])
                    },
                    # {
                    #     'model_name': 'gpt4o',
                    #     'model_questions': [<list_of_questions_generated_by_model_alito>]
                    # },
                ]
            },
        }
    }
    results.append(result)

Save file:

In [24]:
with open('../datasets/2024_questions_for_eval.json', 'w') as f:
    json.dump(results, f, indent=4)