# Generate Qualtrics input file

This notebook shows how we generated a file with all the questions to import in Qualtrics. 

**Input**
The input to the script below consists of two files, both provided by the original authors: 
    - `./Resources/definitions.json` contains all automatically generated definitions.
    - `./Resources/terms.json` contains all information about the terms.

**Output**
Our script produces files that can be directly imported into the Qualtrics website. This way, we avoid as many input errors as possible. There is still some manual labor involved, though, since the QSF format does not let us specify the flow of the questionnaire.

## Step 1: setting the stage

We import two modules from the standard library, and define a function to split the data into smaller lists. This enables us to divide the annotation work over multiple sessions. Randomisation is done to ensure that the annotators do not label the data on a system-by-system basis.

In [1]:
import json
import random


def chunks(lst, n):
    """
    Yield successive n-sized chunks from lst.
    
    Source: https://stackoverflow.com/a/312464
    """
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

## Step 2: loading the data and randomisation

This step speaks for itself. We randomise the order of the items using a random seed so that the ordering is reproducible.

In [2]:
####################################################
# Load the original data.

with open("Resources/definitions.json") as f:
    definitions = json.load(f)

with open("Resources/terms.json") as f:
    terms = json.load(f)

####################################################
# Organise data:

term_index = {term['id']: term for term in terms}

# Enrich definitions:

for entry in definitions:
    term = term_index[entry['term_id']]
    entry['term_text'] = term['term_text']
    entry['category']  = term['category']

####################################################
# Prepare for experiment

random.seed(1)
random.shuffle(definitions)
all_chunks = list(chunks(definitions,30))

## Step 3: Prepare and output the questionnaire.

Here we specify the format of the questionnaire and output the data in the Advanced Qualtrics format.

* We add two questions for task management purposes: **1.** Participant ID. We assign IDs to the annotators, which they should enter each time they carry out their work. This enables us to study rater effects. **2.** Task selection. We ask annotators what set of items they want to work on, since the data has been split up into ten chunks of 30 items.
    
* We employ one special trick to get the output we want: to have empty labels for intermediate points on the rating scales, we use an invisible unicode character. This is because Qualtrics does not allow empty labels in its format, and whitespace does not count as a character.

In [3]:
####################################################
# Prepare answer template.

preface = "[[AdvancedFormat]]"

# Instructions based on the screenshot provided by the authors.
intro = """[[Question:Text]]
<h1>Instructions</h1><br />

<p>You will be given 300 terms with their definitions and asked to rate how fluent the definitions are.</p>

<p>You will be asked to rate how fluent the definition is on a scale from <b>Not at all</b> to <b>Very</b>.</p>

<br />
<p>Examples of very fluent definitions:</p>
<p><b>Term</b>:  Acanthoma</p>
<b>Definition</b>:  An acanthoma is a skin neoplasm composed of squamous or epidermal cells.  It is located in the prickle cell layer.</p>

<br />
<p><b>Term</b>:  Transformer</p>
<p><b>Definition</b>:  The Transformer is a deep learning model architecture relying entirely on an attention mechanism to draw global dependencies between input and output.</p>

<br />
<p>Examples of not at all fluent definitions:</p>
<p><b>Term</b>:  Acanthoma</p>
<p><b>Definition</b>:  Broad Line Region.</p>

<br />
<p><b>Term</b>:  Transformer</p>
<p><b>Definition</b>:  Transformer attention rely.</p>
"""

single_questions = """[[Question:TE:SingleLine]]
[[ID:participant_id]]
What is your participant ID?

[[Question:MC:Dropdown]]
[[ID:list_choice]]
What set of items would you like to work on?
[[Choices]]
1
2
3
4
5
6
7
8
9
10
"""


# The original survey has two labeled end points, but no intermediate labels.
# Qualtrics does not offer an easy way to have empty labels for answers.
# If you enter a space, it just skips the answer.
# But... if you add an invisible unicode character it does work!
# So here we add the invisible character \u2062 (INVISIBLE TIMES)
# This processed by Qualtrics, but it is not rendered by the browser.

question_template = """[[Question:MC:SingleAnswer:Horizontal]]
[[ID:{qid}]]
<p>Please rate the fluency of the definition on a scale from <b>Not at all</b> to <b>Very</b>. 
If a definition's text only says 'nan', please rate it as Not fluent at all.</p>
<hr>
<p><b>Term:</b> {term}</p>
<p><b>Definition:</b> {definition}</p>
<hr>
<p>How fluent is this definition?</p>
[[AdvancedChoices]]
[[Choice:1]]
Not at all
[[Choice:2]]
⁢
[[Choice:3]]
⁢
[[Choice:4]]
Very
"""

####################################################
# Write question files.
block = "[[Block]]\n\n"
single_questionnaire = [preface, single_questions, intro]
for i, chunk in enumerate(all_chunks, start=1):
    questionnaire = [preface]
    single_questionnaire.append(block)
    for item in chunk:
        text = question_template.format_map({"qid": item['id'], 
                                             "term": item['term_text'],
                                             "definition": item['def_text']})
        questionnaire.append(text)
        single_questionnaire.append("[[PageBreak]]")
        single_questionnaire.append(text)

    full_text = "\n\n".join(questionnaire)

    with open(f'Questions/Reference/questions_chunk{i}.txt','w') as f:
        f.write(full_text)

with open('Questions/all_questions.txt','w') as f:
    full_text = "\n\n".join(single_questionnaire)
    f.write(full_text)