## Generate Metacog Classification tags using della inference API

Metacog categories gained from semantic clustering using Llama70B:
```
[{'category_name': 'statutory_interpretation',
  'description': 'Properties related to the interpretation and application of statutes.'},
 {'category_name': 'precedent_and_doctrine',
  'description': 'Properties related to the examination and application of precedents and doctrines.'},
 {'category_name': 'case_facts_and_context',
  'description': 'Properties related to the examination of case facts and context.'},
 {'category_name': 'judicial_role_and_review',
  'description': 'Properties related to the examination of the judicial role and review.'},
 {'category_name': 'argumentation_and_clarification',
  'description': 'Properties related to the examination of argumentation and clarification.'},
 {'category_name': 'constitutional_issues',
  'description': 'Properties related to the examination of constitutional issues.'},
 {'category_name': 'procedural_matters',
  'description': 'Properties related to the examination of procedural matters.'}]
```

This notebook uses the Llama-3.1-70B to generate classification judgments for these questions into one of these. The model is exposed through an API on della using the instructions outlined in the [`della-inference` repository](https://github.com/benediktstroebl/della-inference)

To run this notebook on della, first you will need to run the following command to set up SSH port forwarding on the della node you're on, replacing `nnadeem` with your username and `della-l05g6` with the name of the gpu node where you set up the inference API model:

```
ssh -N -L localhost:12257:della-l03g13:8000 nnadeem@della-l03g13
```

To test that the endpoint is running/SSH port forwarding is set up correctly, try the following curl command in a terminal window on the della node you're running notebook in:

```
curl http://localhost:12257/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer token-abc123" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.1-70B-Instruct",
    "messages": [
      {"role": "system", "content": "Respond friendly to the user."},
      {"role": "user", "content": "Hello World!"}
    ]
  }'
```

In [24]:
import requests
import json
import pandas as pd

In [25]:
# port = '12257'### CHANGE THIS AS NEEDED TO WHERE SSH FORWARDING IS SETUP
port = '37041' ### CHANGE THIS AS NEEDED TO WHERE SSH FORWARDING IS SETUP
url = f"http://localhost:{port}/v1/chat/completions"

headers={
    "Content-Type": "application/json",
    "Authorization": "token-abc123"
}

model_name = 'meta-llama/Meta-Llama-3.1-70B-Instruct'

In [26]:
def get_metacog_classification_prompt(opening_statement, question):
    system_prompt = """You are an expert assistant trained to classify the purpose of questions asked by judges during oral arguments. Your task is to identify the primary purpose of a given question based on the advocate's opening statement and the text of the question itself.

        ### Instructions:
        Judge's questions at oral arguments typically fall into one of the following categories:

        - 'statutory_interpretation': Related to the interpretation and application of statutes
        - 'precedent_and_doctrine': Related to the examination and application of precedents and doctrines.
        - 'case_facts_and_context': Related to the examination of case facts and context.
        - 'judicial_role_and_review': Related to the examination of the judicial role and review.
        - 'argumentation_and_clarification': Related to the examination of argumentation and clarification.
        - 'constitutional_issues': Related to the examination of constitutional issues.
        - 'procedural_matters': Related to the examination of procedural matters.

        Your output should classify the judge's question into one of these categories and explain your reasoning.


        ### Output format:
        Your response must follow this JSON format:
        {
        "classification": "<Category Name>",
        "reasoning": "<A brief explanation for the classification>"
        }

    """
    
    user_prompt = f"""### Your Task:
        Opening Statement: {opening_statement}
        Question: {question}

        ### Response:
    """

    messages = [
            {
                "role": "system",
                "content": system_prompt,
            },
            {"role": "user", "content": user_prompt}
        ]
    return messages

In [27]:
def get_model_response(messages):

    payload = {
        "model": model_name,
        "messages": messages
    }

    response = requests.post(url, data=json.dumps(payload), headers=headers)
    return response

def parse_response(response):
    decoded = response.content.decode('utf-8')
    response_data = json.loads(decoded)
    content = response_data['choices'][0]['message']['content']

    # try:
    #     tags = json.dumps(content)
    # except Exception as e:
    #     print(f"Unable to jsonify response, saving string itself. ERROR: {e}")
    #     tags = str(content)
    # return tags
    
    return content

In [28]:
def classify_questions_metacog(opening_statement, question):
    messages = get_metacog_classification_prompt(opening_statement, question)
    response = get_model_response(messages)
    tags = parse_response(response)
    return tags

In [29]:
input_dir = '../datasets/2024_questions/with_qid'
out_dir = '../datasets/llm_outputs/metacog/classification'

### Run on LLM generated questions

In [16]:
# model_suffix = model_name.split('/')[1] # UNCOMMENT FOR LLAMA
model_suffix = 'gpt-4o-2024-08-06' # UNCOMMENT FOR GPT
input_fp = f'{input_dir}/2024_llm_questions_{model_suffix}_qid.csv'
df_llm = pd.read_csv(input_fp)
df_llm.head()

Unnamed: 0.1,Unnamed: 0,question_id,transcript_id,question_addressee,justice,question_text,opening_statement,full_text
0,0,q_f4341b19,2024.23-621-t01,petitioner,sotomayor,Can you elaborate on why a preliminary injunct...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ..."
1,1,q_bb41909e,2024.23-621-t01,petitioner,sotomayor,In cases where a preliminary injunction result...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ..."
2,2,q_61c81902,2024.23-621-t01,petitioner,sotomayor,How do you address situations where the only r...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ..."
3,3,q_f1f17a25,2024.23-621-t01,petitioner,sotomayor,What impact would your proposed bright-line ru...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ..."
4,4,q_40a17162,2024.23-621-t01,petitioner,sotomayor,Are there specific precedents from this Court ...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ..."


In [17]:
# GENERATE FOR ALL
df_llm['metacog_classification_raw'] = df_llm.apply(
    lambda row: classify_questions_metacog(row['opening_statement'], row['question_text']), axis=1
)
df_llm.head()

Unnamed: 0.1,Unnamed: 0,question_id,transcript_id,question_addressee,justice,question_text,opening_statement,full_text,metacog_classification_raw
0,0,q_f4341b19,2024.23-621-t01,petitioner,sotomayor,Can you elaborate on why a preliminary injunct...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...","{\n""classification"": ""argumentation_and_clarif..."
1,1,q_bb41909e,2024.23-621-t01,petitioner,sotomayor,In cases where a preliminary injunction result...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...","{\n""classification"": ""argumentation_and_clarif..."
2,2,q_61c81902,2024.23-621-t01,petitioner,sotomayor,How do you address situations where the only r...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...","{\n""classification"": ""argumentation_and_clarif..."
3,3,q_f1f17a25,2024.23-621-t01,petitioner,sotomayor,What impact would your proposed bright-line ru...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...","{\n ""classification"": ""argumentation_and_clar..."
4,4,q_40a17162,2024.23-621-t01,petitioner,sotomayor,Are there specific precedents from this Court ...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...","```json\n{\n ""classification"": ""precedent_and..."


Save file:

In [18]:
out_fp = f'{out_dir}/2024_llm_questions_{model_suffix}_metacog_classification.csv'

In [19]:
df_llm.to_csv(out_fp, index=False)

### Run on actual questions

#### Get labels for 2024 'coherent' questions

In [30]:
input_fp = '../datasets/2024_questions/with_qid/2024_all_questions_coherence_labeled_Meta-Llama-3.1-70B-Instruct_qid.csv'
df = pd.read_csv(input_fp)
df.head()

Unnamed: 0.1,Unnamed: 0,question_id,transcript_id,question_addressee,justice,question_text,opening_statement,full_text,label
0,0,q_fb9deabd,2024.23-621-t01,petitioner,Clarence Thomas,You --can a consent decree or a default judgm...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...",incoherent
1,1,q_dd1235f1,2024.23-621-t01,petitioner,Clarence Thomas,But I thought your argument hinged on a court...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...",coherent
2,2,q_efc9f721,2024.23-621-t01,petitioner,"John G. Roberts, Jr.",What do you do with the formulation by your f...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...",incoherent
3,3,q_e843e146,2024.23-621-t01,petitioner,Elena Kagan,"Well, it's -- it's true that it's only a lik...",<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...",coherent
4,4,q_e052c4b2,2024.23-621-t01,petitioner,Ketanji Brown Jackson,But it's not that determination that's making...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...",coherent


In [31]:
df_coherent = df[df['label'] == 'coherent']
df_coherent = df_coherent.copy()

In [32]:
# GENERATE FOR ALL
df_coherent['metacog_classification_raw'] = df_coherent.apply(
    lambda row: classify_questions_metacog(row['opening_statement'], row['question_text']), axis=1
)
df_coherent.head()

Unnamed: 0.1,Unnamed: 0,question_id,transcript_id,question_addressee,justice,question_text,opening_statement,full_text,label,metacog_classification_raw
1,1,q_dd1235f1,2024.23-621-t01,petitioner,Clarence Thomas,But I thought your argument hinged on a court...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...",coherent,"{\n""classification"": ""argumentation_and_clarif..."
3,3,q_e843e146,2024.23-621-t01,petitioner,Elena Kagan,"Well, it's -- it's true that it's only a lik...",<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...",coherent,"{\n""classification"": ""argumentation_and_clarif..."
4,4,q_e052c4b2,2024.23-621-t01,petitioner,Ketanji Brown Jackson,But it's not that determination that's making...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...",coherent,"{\n""classification"": ""argumentation_and_clarif..."
5,5,q_6a41e1e0,2024.23-621-t01,petitioner,Ketanji Brown Jackson,When you think about the difference between m...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...",coherent,"{\n ""classification"": ""statutory_interpretati..."
7,7,q_8ddfdd01,2024.23-621-t01,petitioner,Ketanji Brown Jackson,But didn't Sole open -- leave open that --th...,<speaker>Erika L. Maley</speaker><text> Mr. Ch...,"<speaker>John G. Roberts, Jr.</speaker><text> ...",coherent,"{\n ""classification"": ""precedent_and_doctrine..."


Save File:

In [33]:
out_fp = f'{out_dir}/2024_coherent_metacog_classification.csv'
df_coherent.to_csv(out_fp, index=False)