Survey Question Generator
---
Author: Peter Zhang

This notebook generates a `.json` file of questions that is added to the Qualtrics survey. Make sure to run `create_dataset.py` first to create a dataset of MMLU questions.

In [12]:
import json
from os.path import join
import random

import pandas as pd
import yaml

import config

# Data

Load the results of the 02/18/2023 scratchpad evaluation.

In [13]:
results_folder = "../data/model_output"
results_file = "results_scratchpad_0218.csv"
results_df = pd.read_csv(join(results_folder, results_file))
results_df = results_df[results_df["topic"].isin(config.topics)] # subset to only target topics

## Questions

We replace every single quote (`'`) with a backtick (`) to avoid a Javascript parse error when uploading.

In [14]:
columns = [
    "question",
    "choice_A",
    "choice_B",
    "choice_C",
    "choice_D",
    "answer",
    "justification",
    "correct_answer",
    "topic"
]
for col in columns:
    results_df[col] = results_df[col].str.replace("'", "`")

We manually map the question indices because the original version involved manual ordering.

In [15]:
mapping = json.load(open("survey_remap/mapping.json"))
qualtrics_json = {}
for i, row in results_df.iterrows():
    qid = mapping[str(i)]
    qualtrics_json[qid] = list(row[columns])

We use three shuffles of the questions in the survey because Qualtrics shows questions in order of the index. This version is the **Phase 1** file.

In [16]:
phase1_qualtrics_file = "../data/survey_questions/qualtrics_0413.json"
json.dump(qualtrics_json, open(phase1_qualtrics_file, "w"))

For the **Phase 2** version, we switch the first and second halves.

In [17]:
# switch the first and second half of results dataframe
results_df = pd.concat([results_df.iloc[results_df.shape[0]//2:], results_df.iloc[:results_df.shape[0]//2]])
results_df.reset_index(drop=True, inplace=True)
qualtrics_json = {}
for i, row in results_df.iterrows():
    qualtrics_json[str(i)] = list(row[columns])
phase2_qualtrics_file = "../data/survey_questions/qualtrics_0417.json"
json.dump(qualtrics_json, open(phase2_qualtrics_file, "w"))

Finally, for the **Phase 3** version, we perform a random shuffle.

In [18]:
random.seed(0)
values = list(qualtrics_json.values())
random.shuffle(values)
questions_json = dict(zip(qualtrics_json.keys(), values))
phase3_qualtrics_file = "../data/survey_questions/qualtrics_0421.json"
json.dump(questions_json, open(phase3_qualtrics_file, "w"))