## Generate csv for question coherence judgments

Merge the following two csvs for 2024 transcripts to get a single csv for easy coherence judgment generation:
* `2024_all_questions.csv` (questions)
* `2024_full_text_transcripts.csv` (full_text)

In the merged csv, each row corresponds to a single question and we have the following columns:

* `transcript_id` - same as questions/full_text
* `question_addressee` - same as questions
* `justice` - same as questions
* `question_text` - same as questions
* `opening_statement` - same as either petitioner_opening_text or respondent_opening_statement in full_text
* `full_text` - same as either petitioner_full_text or respondent_full_text in full_text

In [None]:
import pandas as pd

def get_full_text(row):
    """
    Add a column for the full text corresponding to the addressee for each item in dataframe.
    """
    if row['question_addressee'] == 'petitioner':
        return row['petitioner_full_text']
    elif row['question_addressee'] == 'respondent':
        return row['respondent_full_text']
    return None

def get_opening_text(row):
    """
    Add a column for the opening statement corresponding to the addressee for each item in dataframe.
    """
    if row['question_addressee'] == 'petitioner':
        return row['petitioner_opening_text']
    elif row['question_addressee'] == 'respondent':
        return row['respondent_opening_statement']
    return None

# get csvs and merge
all_qs_24_df = pd.read_csv('../datasets/2024_all_questions.csv')
full_text_df = pd.read_csv('../datasets/2024_full_text_transcripts.csv')
merged_df = all_qs_24_df.merge(full_text_df, on="transcript_id")

# choose proper opening statement and full text depending on if petitioner or respondent is presenting
merged_df['opening_statement'] = merged_df.apply(get_opening_text, axis=1)
merged_df['full_text'] = merged_df.apply(get_full_text, axis=1)

# drop extraneous columns from the merging step
columns = ['transcript_id', 'question_addressee', 'justice', 'question_text', 'opening_statement', 'full_text']
merged_df_questions = merged_df[columns]

# output merged csv
out_fp = '2024_all_questions_full_text_merged.csv'
merged_df_questions.to_csv(out_fp, index=False)