/annotationcontains all MTurk annotation templates/datacontains all data folders for train, dev, test sets/modelscontains all lightning modules and our pretrained BART modelEmpathicSimilarityModeltakes in a story pair (2 stories) and fine tunes on empathic similarity scoreEmpathicSummaryModeltakes in a single story and fine tunes on empathy reasons (main event + emotion description + moral)
/configcontains yaml config files for different model training settings/user_studycontains the frontend and server side code for our user study interfacedataset.pycontains the dataloadersspecial_tokens.pydefinitions of special tokenstrainer.pycontains training code and input of config files for different model training settingsutils.pycontains extra model utilitiesevaluator.pycontains an evaluation class to compute all evaluation metrics
Data Source: which data source the story came fromstory: raw text of the storystory_formatted: the story formatted with breaksstory_summary: ChatGPT summarized storycomments: (if pulled from social media), top level comments to the storyurl: (if pulled from social media), the original url of the storypost_id: (if pulled from social media), the original id of the storypost_time: (if pulled from social media), the time the story was postedpost_score: (if pulled from Reddit), the score of the posttoxicity_score: toxicity score rated by DetoxifyWorkerId: worker ID of annotatorLifetimeApprovalRate: annotator's lifetime approval rateAcceptTime: when the annotator accepted the HITSubmitTime: when the annotator submitted the HITWorkTimeInSeconds: how long the annotator took for the HITAge: annotator ageGender: annotator genderRace: annotator raceArousal: annotator's arousal before the task (1-10)Valence: annotator's valence before the task (1-10)Main Event: main event of the story as rated by human annotatorEmotion Description: emotion of the story as rated by human annotatorMoral: moral of the story as rated by human annotatorEmpathy Reasons: reasons why people may empathize with the story as rated by human annotatorMain Event (gpt3.5): main event of the story as rated by ChatGPTEmotion Description (gpt3.5): emotion of the story as rated by ChatGPTMoral (gpt3.5): moral of the story as rated by ChatGPTEmpathy Reasons (gpt3.5): reasons why people may empathize with the story as rated by ChatGPTEmpathizable: how generally "empathizable" the story isWell-Written: how well-written the story isfake_score: how likely the post is written by AI tools, as predicted by the Writer AI Content Detectornum_sentences: number of sentences in the storynum_words: number of words in the storynum_sentences_event: number of sentences in the eventnum_words_event: number of words in the eventnum_sentences_emotion: number of sentences in the emotionnum_words_emotion: number of words in the emotionnum_sentences_moral: number of sentences in the moralnum_words_moral: number of words in the moralnum_sentences_empathy_reasons: number of sentences in the empathy reasonsnum_words_empathy_reasons: number of words in the empathy reasons
pairs: pair ID (matches with story file index)binned: which sampled bin the pair belongs to (based on SBERT sampling)story_A: first story in story pairstory_B: second story in story pairstory_A_summary: summary of first story in story pairstory_B_summary: summary of second story in story pairEmpathic Similarity (gpt3.5): empathic similarity score as rated by ChatGPTEmpathic Similarity Binned (gpt3.5): binned empathic similarity score as rated by ChatGPTEmpathic Similarity Reasons (gpt3.5): reasons why two stories are empathically similar as rated by ChatGPTsimilarity_empathy_human_AGG: empathic similarity score as rated by human annotatorssimilarity_event_human_AGG: event similarity score as rated by human annotatorssimilarity_emotion_human_AGG: emotion similarity score as rated by human annotatorssimilarity_moral_human_AGG: moral similarity score as rated by human annotators