# "Scalable Qualitative Coding with LLMs: Chain-of-Thought Reasoning Matches Human Performance in Some Hermeneutic Tasks"

Zackary Okun Dunivin, 24/01/2024

Most of the workflow is in this file. You are encouraged to adapt as desired.
Assumes use with OpenAI GPT API.

#### Imports and GPT key

In [None]:
import pandas as pd
import json, os, re, glob


import openai
api_key = 'yourkeyhere'
openai.api_key = api_key

#### Prompt components for "Full Codebook"

In [None]:
task_description_preface = """You are tasked with applying qualitative codes to articles, book reviews, and opinion pieces referencing W.E.B. Du Bois. The purpose of this task is to track how Du Bois is represented in news media over time. There are 3 categories of code and 9 codes in total. You should apply every code you identify within a passage. Most passages will only relate to a few codes and it is unlikely that you will encounter more than 5 in a single passage. The categories and codes are as follows:

Characterization of Du Bois (2 codes)
1. Scholar
2. Activist

General Themes (3 codes)
1. Monumental Memorialization
2. Mention of Scholarly Work
3. Social/Political Advocacy

Canonization Processes (4 codes)
1. Coalition Building
2. Out of the Mouth of Academics
3. Out of the Mouth of Activists
4. Collective Synecdoche


Below I will explain how to apply each code:
"""

codes_full = [
    {
        'title': 'Scholar',
        'category': 'Characterization',
        'definition': 'Applies when Du Bois is described as a scholar or intellectual, especially in connection to Black politics, racial identity, or social theory. When Du Bois is invoked through his ideas on social theory, he should be classified as a scholar, not an activist, unless it is call to action, related to his organizing, or other non-scholarly political activity. Do not apply when Du Bois is merely the focus the context of historical and academic study or his scholarship is only implied by loose connections to other scholars.',        #'concepts':'W.E.B. Du Bois Institute for African American Research at Harvard, mentioning substantive research contributions to academia or relating to his work at universities/colleges or collecting data labeled as a sociologist or discussed alongside other sociologists. References towards his scholarly work like “double consciousness” or race relations in the US',
        'examples':'“900-page anthology of black history and culture and a call to "condemn racial discrimination and appreciate the ... accomplishments of a long-suffering people." Its 150 contributors included Theodore Dreiser, Zora Neale Hurston, W.E.B. Du Bois and Langston Hughes.”',
    },
    {
        "title": "Activist",
        "category": "Characterization",
        "definition": 'Apply this code when Du Bois is explicitly called an "activist" or "leader", or when his political or social activism is either explicitly noted or clearly implied through context. Examples include being mentioned in the context of leadership, activism, developing activist organizations, giving public speeches, participating in meetings with politicians and organizers, running for office, or promoting a candidate, organization, or initiative.',
        "examples": "“Liberia in the antebellum era to W.E.B. Du Bois and the radical political refugees who gathered in Ghana in the 1950s and 1960s - sought freedom and identity in trans-Atlantic...",
    },
    {
        "title": "Monumental Memorialization",
        "category": "General Themes",
        "definition": "When an enduring cultural object is named after Du Bois. Such objects include  prizes/awards, named professorships, buildings or rooms, geographical features, institutes, schools, or activist organizations. Do not apply when Du Bois is mentioned in the title of a book or theater production.",
        "examples": '“The Du Bois Center for African American Studies.”\n“W. E. B. Du Bois High School.”',
    },
    {
        "title": "Mention of Scholarly Work",
        "category": "General Themes",
        'definition': "Apply this code when academic works or major theoretical concepts by W.E.B. Du Bois are mentioned or quoted. This includes explicit naming or direct quotations of his writings and references to his key academic ideas, even if unnamed, provided they are clearly attributed to him. Only use when a quote comes from a scholarly work; use context to determine whether a quote comes from a scholarly work, such as a history or social theory, or some other piece, such as a letter or speech. Avoid using this code for general references to Du Bois’s influence, body of writings, or activities outside of his scholarly work. Do not apply it when discussing others' work, unless Du Bois's scholarly concepts or writings are explicitly and centrally mentioned.",
        "examples": "“This goes beyond W.E.B. Du Bois’s notion of “double consciousness.”",
    },
    {
        "title": "Living Achievement",
        "category": "General Themes",
        "definition": "Apply this code Du Bois is lauded for a specific academic or professional achievements. Such achievements include a prize, job, degree, or book. Do not apply when referencing a book written by Du Bois unless the passage is announcing a new book. Do not apply in reference to the achievements of others even if they involve or reference Du Bois.",
        "examples": '“Du Bois won a Harvard Dissertation award.”',
    },
        {
        "title": "Social/Political Advocacy",
        "category": "General Themes",
        "definition": "This code applies when a passage mentions or implies any form of social or political activism, advocacy, critique, or discourse, including discussions about current or historical social problems. This includes not only direct activism of Du Bois and others, but also the framing and challenging of social norms, historical narratives, and racial or cultural identities. Apply this code when Du Bois's work, persona, or ideas are invoked in discussions that critically engage with Black identity, positionality, or broader systemic circumstances of Black people. Adjacency to other activists such as inclusion in a list, is insufficient; advocacy must be explicitly mentioned in the passage.",
        "examples": "“Between me and the other world, there is ever an unasked question, W.E.B. Du Bois famously said back in 1897: \"How does it feel to be a problem?\" White people are generally allowed to have problems, and they’ve historically been granted the power to define and respond to them. But people of color — in this \"land of the free\"”",
    },
    {
        "title": "Coalition Building",
        "category": "Canonization",
        "definition": "Du Bois is described as an agent establishing his reputation through organizational and institutional sponsorship.",
        "examples": "“W.E.B. Du Bois, professor of Sociology and Economics at the University of Georgia”\n“Du Bois organized five meetings of the Pan African National Congress.”\n“W.E.B. Du Bois was chairman of the Peace Information Center.”",
    },
    {
        "title": "Out of the Mouth of Academics",
        "category": "Canonization",
        "definition": "Apply this code when a specific member of an academic organization is engaging with Du Bois’s work or Du Bois as representing a concept. Apply also when an activist organization has named itself or a subdivision of itself after Du Bois, such as an institute or named professorship. It is not sufficient to use this code when Du Bois is described as a member of an academic organization. This code represents when a member of an academic organization discusses Du Bois. It is not sufficient to pressume that the entity discussing or promoting Du Bois is academic.",
        "examples": "“Black leaders such as Charles V. Hamilton, professor of political science at Columbia University, have been expressing concern about placing Negro children in “educationally racist” white classrooms, an ap prehension expressed by W. E. B. Du Bois in the 1930's.”",
    },
    {
        "title": "Out of the Mouth of Activists",
        "category": "Canonization",
        "definition": "Apply this code when an individual described as a leader, activist, or politician, or as member of a specific political or activist organization organization (e.g., political parties, the NAACP, the Black church, Black Lives Matter) references or draws upon W.E.B. Du Bois’s work or legacy. This includes instances where the person's role as a leader, activist, or politician is implied through their actions or affiliations, even if not explicitly stated. The code also applies when governments, political parties, or activist organizations connect their agenda to Du Bois, such as commemorating Du Bois by naming initiatives (like foundations or prizes) after him. Do not apply Du Bois is mentioned as a member of an organization, unless a representative organization is referencing Du Bois' membership to connect their agenda to Du Bois.",
        "examples": "“Du Bois later sold the house because he could not afford to keep it up, and the property was eventually turned over to the Du Bois Foundation, which dedicated it as a memorial park in 1969.”\n“Chairman of the state NAACP Howard Roberts reflected on Du Bois legacy.””",
    },
    {
        "title": "Collective Synecdoche",
        "category": "Canonization",
        "definition": "Mentioned with other famous intellectuals, activists, or public figures in order to represent some facet of a culture, era, or ideology.\nExamples include representing race scholarship, civil rights activism, left political leaders, Black excellence, and 20th century political commentators.",
        "examples": "“…and the former home of luminaries like Jackie Robinson, W.E.B. Du Bois and Ella Fitzgerald, is now a historic district, New York City’s 102nd.”\n“He recalls that he read James Baldwin, Ralph Ellison, Langston Hughes, Richard Wright and W.E.B. Du Bois when he was an adolescent in an effort to come to terms with his racial identity.”",
    },
]

codes = [code for code in codes_full]

def get_code_definitions(codes,fields=["title", "category", "definition"]):
    string = ''
    field_label = {
        "title": "Title",
        "category": "Category",
        "definition": "Definition",
        "examples": "Example(s)",
        }
    for code in codes:
        for field in fields:
            if code[field]:
                string += '%s: %s\n' % (field_label[field],code[field])
        string += '\n'

    return string

code_definitions = get_code_definitions(codes)


examples_few_shot = """USER: "In a recent magazine interview, Ms. Portman explained what it feels like to be a celebrity, by employing W.E.B. Du Bois's famous statement that blacks always suffer from a certain "double consciousness," from a constant awareness of how others are looking at them. She feels the remark was misunderstood: "It came out a really stupid thing -- as: 'I'm not black, but I know how it feels,"' she says. Of course she understands that if you're a celebrity "people think really good things about you -- heightened things," but she also thinks the quality of self-consciousness is similar to the one Du Bois captured. (The realization that unfortunate remarks are extra-unfortunate when you're a celebrity naturally contributes to the self-consciousness she was describing.)"
GPT: Scholar       Mention of Scholarly Work          Social/Political Activism           Public Media

USER: "On Oct. 24 he will be inducted into the New Jersey Literary Hall of Fame along with the Pulitzer Prize-winning historian David Levering Lewis; Mr. Lewis is the biographer of W.E.B. Du Bois and taught at Rutgers from 1985 to 2003."
GPT: Scholar                Prestige            Organizational

USER: "Henry Louis Gates Jr., director of the W.E.B. Du Bois Institute for African and African American Research at Harvard, said in an interview that one of the paradoxes of the success of the civil rights movement is that 'the contributions of thousands of brave and courage pioneers -- like people who risked their lives to integrate a lunch counter, of all things -- have been blended into symbols like a Dr. Martin Luther King."
GPT: Scholar                Prestige        Organizational

USER: "HENRY LOUIS GATES JR., the noted scholar of African-American studies, has always thought that standard television biographies of the usual suspects (Martin Luther King Jr., W.E.B. Du Bois, Michael Jordan) don't really do justice to Black History Month. "I haven't felt that the forms of presentation have been as sophisticated as the experience represented," Professor Gates said."
GPT: Activist       Broadly Listed      Social/Political Activism       Collective Achievement

USER: "But as he makes clear, most of the Africans and African-Americans who populate his text were all too familiar with racist orthodoxy and its violent, dehumanizing consequences. Each generation of visitors and migrants - from the black colonists who established Liberia in the antebellum era to W.E.B. Du Bois and the radical political refugees who gathered in Ghana in the 1950s and 1960s - sought freedom and identity in a trans-Atlantic world that was a complex mix of hope and despair."
GPT: Activist       Broadly Listed      Social/Political Activism           Public Media

"""


formatting_with_justification = """When you evaluate the passage, provide a 1 sentence justification of why you did or did not apply each code. You can format like this:


After you list all codes and justifications, list all applied codes in the following fashion:

1. **Characterization of Du Bois**:
   - Scholar: Applicable [justification here].
   - Activist: Not directly applicable [justification here].

2. ** General Themes**:
    - Monumental Memorialization: Not applicable [justification here].
    - Social/Political Advocacy: Applicable [justification here].

**Codes Applied**:
- Scholar
- Social/Political Advocacy

Do not write anything in your reply after listing the "Codes Applied:\""""

formatting_without_justification = """

After analyzing the passage, list all applied codes in the following fashion:

**Codes Applied**:
- Activist
- Monumental Memorialization
- Social/Political Advocacy
- Out of the Mouths of Academics

Do not write anything in your reply before listing the "Codes Applied"

Do not write anything in your reply after listing the "Codes Applied:\""""


instructions_zero_shot = """I will give you a passage and ask you to return the correct codes to me. """

instructions_few_shot = "Here are you several example passages and the corresponding codes." + examples_few_shot + instructions_zero_shot + formatting_without_justification

instructions_zero_shot_with_justification = instructions_zero_shot + formatting_with_justification

instructions_zero_shot_without_justification = instructions_zero_shot + formatting_without_justification



prompt_zeroshot_with_justification = '\n'.join([task_description_preface,code_definitions,instructions_zero_shot_with_justification])
prompt_zeroshot_without_justification = '\n'.join([task_description_preface,code_definitions,instructions_zero_shot_without_justification])

prompt_few_shot = '\n'.join([task_description_preface,code_definitions,instructions_few_shot])


#### Additional components for "Per Code"

In [None]:
task_description_preface_per_code = """You are tasked with applying qualitative codes to articles, book reviews, and opinion pieces referencing W.E.B. Du Bois. The purpose of this task is to track how Du Bois is represented in news media over time.

Below I will explain how to apply the code:

"""


def get_single_code_definition(code, fields=["title", "category", "definition"]):
  definitions = get_code_definitions([code],fields = fields)

  return(definitions[0])



instructions_per_code = """I will give you a passage and ask you to return the correct codes to me. """ 

def generate_prompt_per_code(code, with_justification=True):
    
    example_justification = """
"Here is an example passage and an example of the exact form I want my answers:

User: But she's got relatives and friends broke as the 10 Commandments,'' he said. ''Just focusing on Oprah won't show us how the race problem can be solved. It's more complex than that. Du Bois said begin with art, because art tries to take us outside of ourselves. It's a matter of trying to create an atmosphere and a context so conversation can flow back and forth, and we can be influenced by each other.

Assistant: In the provided passage, Du Bois' ideas are mentioned in the context of solving the race problem, emphasizing the role of art in creating an atmosphere for conversation and mutual influence. This aligns with the code 'Social/Political Activism', as it reflects on Du Bois' scholarly work being invoked in the context of addressing social issues, specifically racial issues. The reference to Du Bois' suggestion to "begin with art" as a means to facilitate dialogue and understanding is an example of using his ideas to frame discussions around Black political struggle and societal change.

**Codes Applied:**
    - Social/Political Activism
    
"""

    formatting_with_justification = f"""When you evaluate the passage, provide a justification of why you did or did not apply the code.

Then list the code in the following fashion if you applied the code:

**Justification:** [insert 2-3 sentence reasoning for applying the code here]

**Codes Applied:**
- {code['title']}

Otherwise you can format it this:

**Justification:** [insert 2-3 sentence reasoning for not applying the code here]

**Codes Applied:**
    - None

Do not write anything in your reply after listing the \"Codes Applied:\"

"""

    formatting_without_justification = f"""After analyzing the passage, list the code in the following fashion if you applied the code:

**Codes Applied:**
- {code['title']}

Otherwise you can format it like this:

**Codes Applied:**
    - None

Do not write anything in your reply before listing the "Codes Applied:"

Do not write anything in your reply after listing the "Codes Applied:\"

"""

    if with_justification:
        formatting_string = formatting_with_justification
    
    else:
        formatting_string = formatting_without_justification

    prompt = task_description_preface_per_code + get_code_definitions([code],fields=["title", "definition", "negation", "notes"]) + '\n' + instructions_per_code + formatting_string

    return prompt


#### GPT API caller

In [None]:
def make_and_save_gpt_calls(prompt,messages,path,temperature=0,top_p=1,model="gpt-3.5-turbo-1106"):
    indices = messages.index.tolist()
    #messages = messages.tolist()
   
    # Store all the output
    all_messages_and_replies = []

    # Loop to make n subsequent calls
    for msg in messages:
        i = indices.pop(0)
        
        # check if we have already retrieved this call
        if os.path.exists(f"{path}/message_{i}.txt"):
            continue

        
        # Add a new user message to the conversation history
        conversation_history = [{"role": "system", "content": prompt}, {"role": "user", "content": msg}]
        
        # Make the API call with the updated conversation history
        response = openai.ChatCompletion.create(
            model=model,
            messages=conversation_history,
            # functions   = funct,
            # function_call = "auto",
            temperature = temperature,
            #max_tokens  = max_tokens,  # maximum response length
            stop        = "",
            top_p       = top_p,
            presence_penalty = 0.0,  # penalties -2.0 - 2.0
            frequency_penalty = 0.0,  # frequency = cumulative score
            n           = 1,
            stream      = False,
            logit_bias  = {"100066": -1},  # example, '～\n\n' token
            #user        = "site_user-id"
        )

        reply = response['choices'][0]['message']['content']
        
        # Add the input and output to the conversation history
        all_messages_and_replies.append({"role": "user", "content": msg})
        all_messages_and_replies.append({"role": "assistant", "content": reply})

        # Save each message and reply as its own text file
        with open(f"{path}/message_{i}.txt", "w") as message_file:
            message_file.write(f"User: {msg}\n")
            message_file.write(f"Assistant: {reply}\n")

    # write all_messages_and_replies
    with open(f"{path}/all_messages.json", "w") as json_file:
        json.dump(all_messages_and_replies, json_file, indent=4)

def code_to_path(code):
     code = code.lower()
     code = re.sub(r'[^a-zA-Z0-9]+', '-', code)
    
     return code

#### GPT Apply Codes for "Full Codebook"

In [None]:
passages_path = "passages.csv"
coded_articles = pd.read_csv(passages_path)

# specify indices of passages in the test set
passages_include = range(9,120)
messages = coded_articles.loc[coded_articles.index.isin(passages_include), 'passage']

# parameters
model = 'gpt-4-1106-preview'
#model="gpt-3.5-turbo-1106"
temperature = 0.5
top_p = 0.5

if 'gpt-4' in model:
    model_name = 'gpt-4'
if 'gpt-3' in model:
    model_name = 'gpt-3.5'

with_justification=True
prompt = '\n'.join([task_description_preface,code_definitions,instructions_zero_shot_without_justification])
condition = 'zeroshot_without_justification'
    

path = f'output/{condition}_t={temperature}_top_p={top_p}_model={model_name}'

if not os.path.exists(path):
    os.makedirs(path)

params['prompt'] = prompt
params['condition'] = condition
params['temperature'] = temperature
params['top_p'] = top_p
params['model'] = model
params['codes'] =  [code['title'] for code in codes]
json.dump(params,open(f'{path}/params.json','w'))


print(prompt)
if not os.path.exists(path):
    os.makedirs(path)

make_and_save_gpt_calls(prompt,messages,path,temperature=temperature,top_p=top_p,model=model)


#### GPT Apply Codes for "Per Code"

In [None]:
passages_path = "passages.csv"
coded_articles = pd.read_csv(passages_path)

# specify indices of passages in the test set
passages_include = range(9,120)
messages = coded_articles.loc[coded_articles.index.isin(passages_include), 'passage']

# parameters
model = 'gpt-4-1106-preview'
#model="gpt-3.5-turbo-1106"
temperature = 0
top_p = 1

if 'gpt-4' in model:
        model_name = 'gpt-4'
if 'gpt-3' in model:
    model_name = 'gpt-3.5'

with_justification=True

if with_justification:
    with_justification_str = 'with-justification'         
    base_path = f'output/per-code-with-justification_t={temperature}_top_p={top_p}_model={model_name}'
else:
    with_justification_str = 'without-justification'  
    base_path = f'output/per-code-without-justification_t={temperature}_top_p={top_p}_model={model_name}'
   

codes_names = ['Scholar', 'Activist'
         'Monumental Memorialization', 'Mention of Scholarly Work' 
         'Social/Political Advocacy', 
         'Coalition Building', 'Out of the Mouth of Academics', 'Out of the Mouth of Activists', 
         'Collective Synecdoche']

codes_names = [code for code in codes if code['title'] in codes_names]

# Individual calls per code
def make_and_save_gpt_calls_per_code(messages,path,temperature=temperature,top_p=top_p,model=model):

    for code in codes_names:
        code_title = code_to_path(code['title'])
        # make suitable for path names
        
        code_path = f'{path}/{code_title}'
        if not os.path.exists(code_path):
            os.makedirs(code_path)
            
        prompt = generate_prompt_per_code(code, with_justification=with_justification, with_compliance = False)
        print(prompt)
        make_and_save_gpt_calls(prompt,system_prompt,messages,code_path,temperature=temperature,top_p=top_p,model=model)
    
    params = {}
    params['prompt'] = prompt
    params['temperature'] = temperature
    params['top_p'] = top_p
    params['model'] = model
    params['with_justification'] = with_justification
    params['codes'] =  [code['title'] for code in codes]
    json.dump(params,open(f'{base_path}/params.json','w'))


base_path = f'output/per-code-{with_justification_str}_t={temperature}_top_p={top_p}_model={model_name}'
make_and_save_gpt_calls_per_code(messages,base_path,temperature=temperature,top_p=top_p,model=model)

#### Process GPT output into tabular format

In [None]:

def extract_doc_id(filename):
    """Extracts the integer between '_' and '.txt' in the filename."""
    match = re.search(r'_(\d+)\.txt', filename)
    return int(match.group(1)) if match else None

def process_file_content(file_content):
    """Finds the last instance of 'codes applied' and stores all text after it."""
    file_content_lower = file_content.lower()
    codes_start = file_content_lower.rfind('codes applied')
    return file_content_lower[codes_start + len('codes applied'):].strip() if codes_start != -1 else ""

def match_pattern(pattern, string):
    # Match "scholar" only as a standalone word
    if pattern == "scholar":
        return bool(re.search(r'\bscholar\b', string))

    # Exact match for "scholarly work"
    if pattern == "scholarly work" and pattern == string:
        return True

    # Substring match for other patterns
    return bool(re.search(pattern, string))

def extract_codes_from_gpt_responses(path,codes):
    file_pattern = os.path.join(path, '*.txt')
    data = []

    # Read each .txt file and process the content
    codes_lower = [code.lower() for code in codes]
    for file_path in glob.glob(file_pattern):
        with open(file_path, 'r', encoding='utf-8') as file:
            content = file.read()
            doc_id = extract_doc_id(file_path)
            codes_applied_string = process_file_content(content)
            # Check for the presence of each code
            code_presence = [1 if match_pattern(code,codes_applied_string) else 0 for code in codes_lower]

            # Append the result to the data list
            data.append([doc_id] + code_presence + [content])

    # Create a DataFrame
    columns = ['id'] + codes + ['file_contents']
    df = pd.DataFrame(data, columns=columns)

    print(df.head())  # Display the first few rows
    df.to_csv(f'{path}/processed_responses.csv', index=False)

def extract_codes_from_gpt_responses_per_code(path,codes):
    file_pattern = os.path.join(path, '*.txt')
    data = {}

    # Read each .txt file and process the content
    for code in codes:
        file_pattern= os.path.join(path, code_to_path(code), '*.txt')
        code_lower = code.lower()
        code_vector = []
        ids = []
        passages = []
        for file_path in glob.glob(file_pattern):
            with open(file_path, 'r', encoding='utf-8') as file:
                content = file.read()
                doc_id = extract_doc_id(file_path)
                codes_applied_string = process_file_content(content)
                # Check for the presence of each code
                code_presence = 0
                if match_pattern(code_lower,codes_applied_string):
                    code_presence = 1
                code_vector.append(code_presence)
                ids.append(int(doc_id))
                match = re.search(r'User:\s*(.*?)[\s\n]*Assistant:', content,re.DOTALL)
                passage = match.group(1) if match else ""
                passages.append(passage)

        data[code] = {'code':code_vector,'id':ids,'passage':passages}

    def make_and_merge_dfs(data):
        # Create a DataFrame for each code_name
        dfs = []
        for code_name, values in data.items():
            df = pd.DataFrame(values)
            df = df.rename(columns={"code": code_name})
            dfs.append(df)

        # Merge all DataFrames on 'ids' and 'passage'
        df_final = pd.DataFrame()
        for df in dfs:
            if df_final.empty:
                df_final = df
            else:
                df_final = pd.merge(df_final, df.drop('passage',axis=1), on=["id"], how="outer")
        
        columns = list(df_final.columns)
        columns.remove('id')
        columns.remove('passage')
        rearranged_columns = ['id'] + columns + ['passage']
        df_final = df_final[rearranged_columns]

        return df_final

    df = make_and_merge_dfs(data)

    df.to_csv(f'{path}/processed_responses.csv', index=False)


code_names = [code['title'] for code in codes]
print(code_names)

path='output/per-code-with-justification_t=0_top_p=1_model=gpt-4'
extract_codes_from_gpt_responses_per_code(path,code_names)

#### Measure intercoder relialibility with Gold Standard

In [None]:
from sklearn.metrics import cohen_kappa_score
import krippendorff
import numpy as np
from pycm import ConfusionMatrix


def load_and_align_data(files, ids, id_column=None):
    """
    Load data from multiple files and align them based on the specified id column or index.
    Only include rows where the id is in the provided ids list.
    If the id column does not exist, create it using the DataFrame index.
    """
    data_frames = []
    for file in files:
        df = pd.read_csv(file)

        # If id_column is not specified or doesn't exist, create it from the index
        if not id_column or id_column not in df.columns:
            df['id'] = df.index
            id_column = 'id'

        # Filter the DataFrame based on the id_column
        df = df[df[id_column].isin(ids)]

        # Replace empty cells with 0
        df.fillna(0, inplace=True)
        df.set_index(id_column, inplace=True)
        df.index = df.index.astype(str)
        data_frames.append(df)

    # Align dataframes
    aligned_df = pd.concat(data_frames, axis=1, keys=range(len(data_frames)))

    return aligned_df


def calculate_cohens_kappa(df, columns):
    """Calculate Cohen's Kappa for each pair of coders for each column."""
    n = len(df.columns.levels[0])
    kappa_scores = {}

    for col in columns:
        scores = []
        for i in range(n):
            for j in range(i+1, n):
                kappa = cohen_kappa_score(df[i][col], df[j][col])
                scores.append((f'Coder {i+1} vs Coder {j+1}', kappa))
        kappa_scores[col] = scores

    return kappa_scores

def calculate_krippendorffs_alpha(df, columns):
    """Calculate Krippendorff's Alpha for each column."""

    alpha_scores = {}
    for col in columns:
        reliability_data = np.array([df[coder][col] for coder in df.columns.levels[0]])
        alpha = krippendorff.alpha(reliability_data)
        alpha_scores[col] = alpha
    return alpha_scores

def calculate_gwets_ac1(df, columns):
    n = len(df.columns.levels[0])
    gwets_ac1_scores = {}

    for col in columns:
        scores = []
        for i in range(n):
            for j in range(i+1, n):
                coder_i_series = df[i][col].reset_index(drop=True)
                coder_j_series = df[j][col].reset_index(drop=True)

                # Convert series to numpy arrays
                coder_i_ratings = coder_i_series.to_numpy().astype(np.int32)
                coder_j_ratings = coder_j_series.to_numpy().astype(np.int32)

                # Create the confusion matrix
                cm = ConfusionMatrix(coder_i_ratings, coder_j_ratings)
                
                ac1 = cm.AC1
                scores.append((f'Coder {i+1} vs Coder {j+1}', ac1))
        gwets_ac1_scores[col] = scores

    return gwets_ac1_scores


def calculate_percent_agreement(df, columns):
    """Calculate percent agreement for each pair of coders for each column."""
    n = len(df.columns.levels[0])
    percent_agreement_scores = {}

    for col in columns:
        scores = []
        for i in range(n):
            for j in range(i+1, n):
                agreement = np.mean(df[i][col] == df[j][col])
                scores.append((f'Coder {i+1} vs Coder {j+1}', agreement))
        percent_agreement_scores[col] = scores

    return percent_agreement_scores


def calculate_intercoder_reliability(aligned_df, coded_columns):  
    kappa_scores = calculate_cohens_kappa(aligned_df, coded_columns)
    alpha_scores = calculate_krippendorffs_alpha(aligned_df, coded_columns)
    percent_agreement_scores = calculate_percent_agreement(aligned_df, coded_columns)
    gwets_ac1_scores = calculate_gwets_ac1(aligned_df, coded_columns)

    return kappa_scores, alpha_scores, percent_agreement_scores, gwets_ac1_scores



def calculate_and_print_intercoder_reliability(aligned_df):
    kappa_scores, alpha_scores, percent_agreement_scores, gwets_ac1_scores = calculate_intercoder_reliability(aligned_df)
    print("Cohen's Kappa Scores:")
    for col, kappas in kappa_scores.items():
        print(f"{col}: {kappas}")

    print("\nKrippendorff's Alpha Scores:")
    for col, alpha in alpha_scores.items():
        print(f"{col}: {alpha}")

    print("\nPercent Agreement Scores:")
    for col, agreements in percent_agreement_scores.items():
        print(f"{col}: {agreements}")

def calculate_differential_count(df, coder1, coder2, codes):
    """
    Calculate the sum of applications for each code by two coders and the difference between these sums.
    """
    differential_data = []

    for code in codes:
        coder1_sum = df[coder1][code].sum()
        coder2_sum = df[coder2][code].sum()
        diff = abs(coder1_sum - coder2_sum)

        differential_data.append({
            'Code': code,
            'Coder 1': coder1_sum,
            'Coder 2': coder2_sum,
            'Difference': diff
        })

    return pd.DataFrame(differential_data)

def rank_by_disagreement(df, coder1, coder2, codes):
    """
    Calculate the total number of disagreements between two coders and return a DataFrame
    sorted by this measure in descending order.
    """
    total_disagreements = df[coder1][codes].ne(df[coder2][codes]).sum(axis=1)
    df_disagreements = df[coder1].copy()
    df_disagreements['Total Disagreements'] = total_disagreements
    df_disagreements.sort_values(by='Total Disagreements', ascending=False, inplace=True)

    return df_disagreements

def get_ir_report_df(coding_pair,coded_columns,ids=None,id_column=None):
    aligned_df = load_and_align_data(coding_pair, ids=ids,id_column=id_column)
    #return(aligned_df)
    #calculate_and_print_intercoder_reliability(aligned_df)
    
    results_df = calculate_differential_count(aligned_df, 0, 1, codes=coded_columns)

    kappa_scores, alpha_scores, percent_agreement_scores, gwets_ac1_scores = calculate_intercoder_reliability(aligned_df,coded_columns)

    results_df['% Agreement'] = [v[0][1] for v in  percent_agreement_scores.values()]
    results_df['Kappa'] = [v[0][1] for v in  kappa_scores.values()]
    results_df['Alpha'] = alpha_scores.values()
    results_df['Gwets AC1'] = [v[0][1] for v in gwets_ac1_scores.values()]
    
    return(results_df)


code_names = []
for code in codes:
    code_names.append(code['title'])

coded_columns = ['Scholar', 'Activist',
         'Monumental Memorialization','Mention of Scholarly Work',
         'Social/Political Advocacy', 'Coalition Building',
         'Out of the Mouth of Academics','Out of the Mouth of Activists',
         'Collective Synecdoche']

gpt_coding = 'output/per-code-with-justification_t=0_top_p=1_model=gpt-4/processed_responses.csv'
gpt_coding_df = pd.read_csv(gpt_coding)

gold_standard_coding = 'gold_standard_coding.csv'
gs_gpt = [gold_standard_coding,gpt_coding]


row_ids = range(9,120)

print('GS vs. GPT')
report_df = get_ir_report_df(gs_gpt,coded_columns,ids=row_ids,id_column='id')
report_df

In [None]:
def gold_standard_vs_gpt_df():
    dirs = {
            'Per Code w/ Justification':'per-code-with-justification_t=0_top_p=1_model=gpt-4',
            'Per Code w/out Justification':'per-code-without-justification_t=0_top_p=1_model=gpt-4',
            'Full w/ Justification': 'zeroshot_with_justification_without_compliance_t=0_top_p=1_model=gpt-4',
            'Full w/out Justification': 'zeroshot_without_justification_without_compliance_t=0_top_p=1_model=gpt-4',
            'GPT 3.5 w/ Justification': 'per-code-with-justification_t=0_top_p=1_model=gpt-3.5',
            'GPT 3.5 w/out Justification': 'per-code-without-justification_t=0_top_p=1_model=gpt-3.5',
            }

    coded_columns = ['Scholar', 'Activist',
            'Monumental Memorialization', 'Mention of Scholarly Work', 
            'Social/Political Advocacy',  'Coalition Building',
            'Out of the Mouth of Academics','Out of the Mouth of Activists',
            'Collective Synecdoche']

    row_ids = range(9,120)
    all_code_comparison = {}
    for k,dir in dirs.items():

        if not '.csv' in dir:
            dir_path = f'output/{dir}'
            file_path = f'{dir_path}/processed_responses.csv'
        else:
            file_path=dir


        pair = ['gold_standard.csv',file_path]
        
        report_df = get_ir_report_df(pair,coded_columns,ids=row_ids,id_column='id')
        all_code_comparison['Code'] = report_df['Code']
        all_code_comparison['Gold Standard Count'] = report_df['Coder 1']
        all_code_comparison[k] = report_df['Kappa']

        #all_code_comparison[k] = report_df['Gwets AC1']

    df = pd.DataFrame(all_code_comparison)
    numeric_columns = df.select_dtypes(include=['number']).columns
    average_row = df[numeric_columns].mean().to_frame().T
    average_row.index = ['Average']
    df = pd.concat([df, average_row])
    
    # print in LaTex format
    #print(df.to_latex(index=False))

    return df

gold_standard_vs_gpt_df()