In [55]:
import numpy as np
import pandas as pd
import matplotlib as plt

In [62]:
# Demo data file from Eric (example of output from IAA)
iaa = pd.read_csv("IAA-file-demo.csv").head(3)
iaa

Unnamed: 0,article_num,article_sha256,article_id,article_filename,source_task_uuid,tua_uuid,namespace,schema_sha256,question_Number,answer_uuid,...,agreed_Answer,coding_perc_agreement,highlighted_indices,agreement_score,num_users,num_answer_choices,target_text,question_text,answer_text,answer_content
0,100059,4b537e0ed21179a29ed28da28057d338e67330ae12123c...,raw_304b537e0ed21179a29ed28da28057d338e67330ae...,Covid_article_for_PE_S&S&S.txt,270ea60c-f961-402b-96d5-c015fac1960d,a23b77bd-9eae-4050-9b5d-e38ecb126226,Covid_Evidence2020_03_21,45dce5251bd3ea6e908fa33ac9e6a8e17e6830215912ce...,1,73d7a14a-9ec6-404c-b2b7-a55508af3b76,...,1,1.0,"[3764, 3765, 3766, 3767, 3768, 3769, 3770, 377...",1.0,3,3,"EspeciallyÂ at risk are the elderly, who both ...",Is a general or singular causal claim made? Hi...,"General Causation (In general, X causes Y.)",
1,100059,4b537e0ed21179a29ed28da28057d338e67330ae12123c...,raw_304b537e0ed21179a29ed28da28057d338e67330ae...,Covid_article_for_PE_S&S&S.txt,270ea60c-f961-402b-96d5-c015fac1960d,a23b77bd-9eae-4050-9b5d-e38ecb126226,Covid_Evidence2020_03_21,45dce5251bd3ea6e908fa33ac9e6a8e17e6830215912ce...,1,0,...,L,0.0,[],0.0,3,3,L,Is a general or singular causal claim made? Hi...,L,
2,100059,4b537e0ed21179a29ed28da28057d338e67330ae12123c...,raw_304b537e0ed21179a29ed28da28057d338e67330ae...,Covid_article_for_PE_S&S&S.txt,270ea60c-f961-402b-96d5-c015fac1960d,a23b77bd-9eae-4050-9b5d-e38ecb126226,Covid_Evidence2020_03_21,45dce5251bd3ea6e908fa33ac9e6a8e17e6830215912ce...,1,0,...,L,0.0,[],0.0,3,3,L,Is a general or singular causal claim made? Hi...,L,


Important columns:
- article_num (maybe not needed)
- article_sha256 (part of unique answer identifier)
- tua_uuid (part of unique answer identifier)
- schema_namespace (human readable schema type)
- schema_sha256 (part of unique answer identifier)
- num_answer_choices (helpful for scoring ordinal questions)
- question_Number (part of unique answer identifier)
- answer_uuid (what we want to find)
- agreed_answer (is a number if )
- question_type (checklist, nominal, ordinal)

Function that takes in **article_sha256** (string), **tua_uuid** (string), **schema_sha256** (string), and **question_Number** (int), and returns the **corresponding consensus answer** from IAA.

The return value is a single int instead of the "T1.Q1.A1" format because I believe IAA processes and outputs just the number. 

(also let me know if there's a more efficient way to index the df)

In [63]:
def get_consensus(article_sha256, tua_uuid, schema_sha256, question_Number):
    answer_row = iaa.loc[(iaa["article_sha256"] == article_sha256)
                         & (iaa["tua_uuid"] == tua_uuid)
                         & (iaa["schema_sha256"] == schema_sha256)
                         & (iaa["question_Number"] == question_Number)]
    
    if (answer_row["agreed_Answer"].iloc[0].isdigit()):
        return int(answer_row["agreed_Answer"].iloc[0])
    else:
        return -1

In [64]:
get_consensus('4b537e0ed21179a29ed28da28057d338e67330ae12123ccceba6724f35bd68a4',
              'a23b77bd-9eae-4050-9b5d-e38ecb126226',
              '45dce5251bd3ea6e908fa33ac9e6a8e17e6830215912ce1626cf4206e159819c',
              1)

1

Function that takes in **article_sha256** (string), **tua_uuid** (string), **schema_sha256** (string), and **question_Number** (int), and returns a **tuple containing the question type and number of answer choices, will help with scoring questions**.

If the question_type and num_answer_choices works as expected, we shouldn't need to hardcode the question schema data anymore.

In [65]:
def get_question_meta(article_sha256, tua_uuid, schema_sha256, question_Number):
    answer_row = iaa.loc[(iaa["article_sha256"] == article_sha256)
                         & (iaa["tua_uuid"] == tua_uuid)
                         & (iaa["schema_sha256"] == schema_sha256)
                         & (iaa["question_Number"] == question_Number)]
    
    return (answer_row["question_type"].iloc[0], answer_row["num_answer_choices"].iloc[0])

In [66]:
get_question_meta('4b537e0ed21179a29ed28da28057d338e67330ae12123ccceba6724f35bd68a4',
              'a23b77bd-9eae-4050-9b5d-e38ecb126226',
              '45dce5251bd3ea6e908fa33ac9e6a8e17e6830215912ce1626cf4206e159819c',
              1)

('checklist', 3)