# **Harmony**: Questionnaire Parsing Algorithm Improvement Challenge

**NLP challenge** | [Visit the challenge page](https://doxaai.com/competition/harmony-parsing)

Your challenge is to develop an improved algorithm for identifying mental health survey questions and selectable answers in plain text that can be integrated into the [Harmony tool](https://harmonydata.ac.uk/developer-guide/).

This Jupyter notebook will introduce you to the challenge and guide you through the process of making your first submission to the [DOXA AI platform](https://doxaai.com/competition/harmony-parsing).

**Before you get started, make sure to [sign up for an account](https://doxaai.com/sign-up) if you do not already have one and [enrol to take part](https://doxaai.com/competition/harmony-parsing) in the challenge.**

**If you have any questions, feel free to ask them in the [Harmony community Discord server](https://discord.com/invite/harmonydata).**


## Installing and importing useful packages

Before you get started, please make sure you have [PyTorch](https://pytorch.org/get-started/locally/) installed in your Python environment. If you do not have `pandas`, `transformers` or `intervaltree`, uncomment the code in the following cell to install them.


In [None]:
# %pip install "pandas>=2.2.2" "transformers>=4.43.1" "intervaltree>=3.1.0" "datasets>=3.0.2"

In [None]:
# Install the latest version of the DOXA CLI
%pip install -U doxa-cli

Collecting doxa-cliNote: you may need to restart the kernel to use updated packages.





  Using cached doxa_cli-0.1.8-py3-none-any.whl.metadata (4.5 kB)
Collecting halo>=0.0.31,~=0.0.31 (from doxa-cli)
  Using cached halo-0.0.31.tar.gz (11 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting pyyaml>=6.0 (from doxa-cli)
  Using cached PyYAML-6.0.2-cp312-cp312-win_amd64.whl.metadata (2.1 kB)
Collecting requests-toolbelt~=0.10.1 (from doxa-cli)
  Using cached requests_toolbelt-0.10.1-py2.py3-none-any.whl.metadata (14 kB)
Collecting requests~=2.26.0 (from doxa-cli)
  Using cached requests-2.26.0-py2.py3-none-any.whl.metadata (4.8 kB)
Collecting typer>=0.9.0 (from typer[all]>=0.9.0->doxa-cli)
  Using cached typer-0.15.1-py3-none-any.whl.metadata (15 kB)
Collecting log_symbols>=0.0.14 (from

In [1]:
import os
import json

import pandas as pd

pd.set_option("display.max_colwidth", None)

## Loading the data

In [3]:
with open("train_raw.txt", encoding="utf8") as f:
    raw_train = f.read()

with open("train_clean.txt", encoding="utf8") as g:
    clean_train = g.read()

with open("train_labels.json", encoding="utf8") as h:
    labels_train = json.load(h)

## Exploring the data

Let's get started by taking a look at what the data looks like. The data comes in two forms:
- **The raw plain text** where questions and answers have been manually tagged with `<q>`/`</q>` and `<a>`/`</a>` by the Harmony team
- **A clean version** where the tags have been removed (with the question and answer ranges provided separately)

In [4]:
print(raw_train[:515])

Tampa Scale for Kinesiophobia
(Miller , Kori and Todd 1991)
1 = <a>strongly disagree</a>
2 = <a>disagree</a>
3 = <a>agree</a>
4 = <a>strongly agree</a>
1. <q>I'm afraid that I might injury myself if I exercise</q> 1 2 3 4
2. <q>If I were to try to overcome it, my pain would
increase</q>
1 2 3 4
3. <q>My body is telling me I have something
dangerously wrong</q>
1 2 3 4
4. <q>My pain would probably be relieved if I were to
exercise</q>
1 2 3 4
5. <q>People aren't taking my medical condition
seriously enough</q>



In [5]:
print(clean_train[:451])

Tampa Scale for Kinesiophobia
(Miller , Kori and Todd 1991)
1 = strongly disagree
2 = disagree
3 = agree
4 = strongly agree
1. I'm afraid that I might injury myself if I exercise 1 2 3 4
2. If I were to try to overcome it, my pain would
increase
1 2 3 4
3. My body is telling me I have something
dangerously wrong
1 2 3 4
4. My pain would probably be relieved if I were to
exercise
1 2 3 4
5. People aren't taking my medical condition
seriously enough


The `labels_train` dictionary has starting indexes (inclusive) and ending indexes (exclusive) for the clean text that correspond to the tagged questions and answers in the raw text. For example, to pick out the first question in the raw text `"I'm afraid that I might injury myself if I exercise"`, you can do the following:

In [6]:
start, end = labels_train["q"][0]

clean_train[start:end]

"I'm afraid that I might injury myself if I exercise"

To make it significantly faster to query whether a word in a certain range is a question or an answer, we will build up two interval trees:

In [7]:
%pip install intervaltree

Collecting intervaltree
  Downloading intervaltree-3.1.0.tar.gz (32 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting sortedcontainers<3.0,>=2.0 (from intervaltree)
  Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl.metadata (10 kB)
Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Building wheels for collected packages: intervaltree
  Building wheel for intervaltree (setup.py) ... [?25l[?25hdone
  Created wheel for intervaltree: filename=intervaltree-3.1.0-py2.py3-none-any.whl size=26097 sha256=0de1a297f1270c62cdaad5b0cc0cd657fa7b02ff5713424353c21762bb4b8e8c
  Stored in directory: /root/.cache/pip/wheels/31/d7/d9/eec6891f78cac19a693bd40ecb8365d2f4613318c145ec9816
Successfully built intervaltree
Installing collected packages: sortedcontainers, intervaltree
Successfully installed intervaltree-3.1.0 sortedcontainers-2.4.0


In [8]:
from intervaltree import Interval, IntervalTree

tree_q = IntervalTree(
    Interval(start, end) for start, end in labels_train["q"] if start != end
)

tree_a = IntervalTree(
    Interval(start, end) for start, end in labels_train["a"] if start != end
)

In [9]:
labels_train['q'][0]

[127, 178]

In [10]:
tree_q.overlap(156, 200)

{Interval(127, 178), Interval(190, 245)}

## Tokenising the text

Now, we'll tokenise the clean text and match up the question and answer ranges so that we can fine-tune a pre-trained DistilBERT model for our task. DistilBERT has a max token length of 512, so we have to also at the same time split up the training text into smaller chunks.

In [11]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")

tokenizer

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

DistilBertTokenizerFast(name_or_path='distilbert/distilbert-base-uncased', vocab_size=30522, model_max_length=512, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'}, clean_up_tokenization_spaces=False, added_tokens_decoder={
	0: AddedToken("[PAD]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	100: AddedToken("[UNK]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	101: AddedToken("[CLS]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	102: AddedToken("[SEP]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	103: AddedToken("[MASK]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
)

In [12]:
label_list = ["other", "question", "answer"]

id2label = {k: v for k, v in enumerate(label_list)}
label2id = {v: k for k, v in enumerate(label_list)}

In [13]:
MAX_LENGTH = 512
STRIDE = 32


def tokenize(text, tokenizer, tree_q, tree_a):
    encodings = tokenizer(
        text,
        return_offsets_mapping=True,
        return_overflowing_tokens=True,
        truncation=True,
        max_length=MAX_LENGTH,
        stride=STRIDE,
        add_special_tokens=True,  # Includes the [CLS] and [SEP] tokens
    )

    all_token_labels = []
    for batch_index, (input_ids, offsets) in enumerate(
        zip(encodings["input_ids"], encodings["offset_mapping"])
    ):
        word_ids = encodings.word_ids(batch_index=batch_index)

        token_labels = []
        current_word_idx = None

        for word_id, (start, end) in zip(word_ids, offsets):
            if word_id is None:  # Special tokens like [CLS] or [SEP]
                token_labels.append(-100)
            elif word_id != current_word_idx:  # New word
                if len(tree_q.overlap(start, end)) > 0:
                    label = "question"
                elif len(tree_a.overlap(start, end)) > 0:
                    label = "answer"
                else:
                    label = "other"

                token_labels.append(label2id[label])
                current_word_idx = word_id
            else:  # Subword token
                token_labels.append(-100)

        all_token_labels.append(token_labels)

    encodings["labels"] = all_token_labels

    return encodings

In [15]:
%pip install datasets

Collecting datasets
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.2.0-py3-none-any.whl (480 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m12.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2024.9.0-py3-none-any.whl 

In [16]:
from datasets import Dataset

tokenized_dataset = tokenize(clean_train, tokenizer, tree_q, tree_a)

dataset = Dataset.from_dict(
    {
        "input_ids": tokenized_dataset["input_ids"],
        "attention_mask": tokenized_dataset["attention_mask"],
        "labels": tokenized_dataset["labels"],
    }
)

dataset = dataset.train_test_split(test_size=0.2)
training_dataset = dataset['train']
test_dataset = dataset['test']

In [17]:
training_dataset

Dataset({
    features: ['input_ids', 'attention_mask', 'labels'],
    num_rows: 760
})

In [18]:
test_dataset

Dataset({
    features: ['input_ids', 'attention_mask', 'labels'],
    num_rows: 190
})

Great &ndash; now that our data has been prepared, we can inspect the tokens that have been produced and labelled:

In [19]:
for i, (input_ids, labels) in enumerate(  # type: ignore
    zip(tokenized_dataset["input_ids"], tokenized_dataset["labels"])  # type: ignore
):
    tokens = tokenizer.convert_ids_to_tokens(input_ids)
    for token, label in zip(tokens, labels):
        print(f"Token: {token:<20} Label: {id2label.get(label)}")

    if i > 32:
        break

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Token: d                    Label: other
Token: .                    Label: other
Token: your                 Label: other
Token: child                Label: other
Token: is                   Label: other
Token: a                    Label: other
Token: person               Label: other
Token: :                    Label: other
Token: a                    Label: other
Token: psychological        Label: other
Token: approach             Label: other
Token: to                   Label: other
Token: childhood            Label: other
Token: without              Label: other
Token: guilt                Label: other
Token: .                    Label: other
Token: the                  Label: other
Token: viking               Label: other
Token: press                Label: other
Token: ,                    Label: other
Token: new                  Label: other
Token: york                 Label: other
Token: :                    Label

## Fine-tuning a token classification model

We are now ready to fine-tune a pre-trained DistilBERT model to perform this token classification task!

First, we need to load the model:

In [20]:
import torch
torch.cuda.is_available()


True

In [21]:
%pip install evaluate

Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Downloading evaluate-0.4.3-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.3


In [23]:
%pip install seqeval

Collecting seqeval
  Downloading seqeval-1.2.2.tar.gz (43 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/43.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: seqeval
  Building wheel for seqeval (setup.py) ... [?25l[?25hdone
  Created wheel for seqeval: filename=seqeval-1.2.2-py3-none-any.whl size=16161 sha256=db3af784f9f770a925415a96be4f769689c7ee4b4c0e43c0ece757fc23612363
  Stored in directory: /root/.cache/pip/wheels/bc/92/f0/243288f899c2eacdfa8c5f9aede4c71a9bad0ee26a01dc5ead
Successfully built seqeval
Installing collected packages: seqeval
Successfully installed seqeval-1.2.2


In [33]:
%pip install wandb



In [34]:
import wandb

In [39]:
%env WANDB_LOG_MODEL=checkpoint

env: WANDB_LOG_MODEL=checkpoint


In [37]:
%env WANDB_PROJECT=harmony-questionnaire-parsingharmony

env: WANDB_PROJECT=harmony-questionnaire-parsingharmony


In [35]:
wandb.login()


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [24]:
import evaluate
import numpy as np

seqeval = evaluate.load("seqeval")

In [25]:
def compute_metrics(p):
    predictions, labels = p
    predictions = np.argmax(predictions, axis=2)

    true_predictions = [
        [label_list[p] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]
    true_labels = [
        [label_list[l] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]

    results = seqeval.compute(predictions=true_predictions, references=true_labels)
    return {
        "precision": results["overall_precision"],
        "recall": results["overall_recall"],
        "f1": results["overall_f1"],
        "accuracy": results["overall_accuracy"],
    }

In [26]:
from transformers import (
    DataCollatorForTokenClassification,
    AutoModelForTokenClassification,
)

data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)

model = AutoModelForTokenClassification.from_pretrained(
    "distilbert/distilbert-base-uncased",
    num_labels=len(label_list),
    id2label=id2label,
    label2id=label2id,
)

model

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


DistilBertForTokenClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): DistilBertSdpaAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
   

Now that the model has been loaded, we are now ready to start fine-tuning it! You may want to experiment with the training arguments (just remember not to accidentally save models you do not want to submit in the `submission/` directory).

In [40]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="checkpoints",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=10,
    weight_decay=0.01,
    logging_steps=10,
    eval_strategy="epoch",
    do_eval=True,
    save_strategy="epoch",
    load_best_model_at_end=True,
    report_to="wandb",
    run_name="initial_run_16_01_2024"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=training_dataset,
    eval_dataset=test_dataset,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

trainer.train()

[34m[1mwandb[0m: Currently logged in as: [33mnickhobbs09[0m ([33mnickhobbs09-nickhobbs[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
1,0.2039,0.207586,0.602179,0.704432,0.649304,0.923095
2,0.1188,0.194801,0.62488,0.721468,0.669709,0.93047
3,0.1123,0.193281,0.638378,0.730332,0.681266,0.932559
4,0.1002,0.194304,0.652578,0.739889,0.693496,0.933643
5,0.0803,0.198242,0.667285,0.747784,0.705245,0.935798
6,0.0709,0.203852,0.659016,0.751662,0.702297,0.93663
7,0.0649,0.210174,0.669825,0.753878,0.709371,0.937538
8,0.0569,0.206919,0.676126,0.759003,0.715171,0.937352
9,0.0584,0.214918,0.672349,0.76482,0.715609,0.937341
10,0.0539,0.21169,0.679985,0.761357,0.718374,0.938325


[34m[1mwandb[0m: Adding directory to artifact (./checkpoints/checkpoint-48)... Done. 6.0s
[34m[1mwandb[0m: Adding directory to artifact (./checkpoints/checkpoint-96)... Done. 4.8s
[34m[1mwandb[0m: Adding directory to artifact (./checkpoints/checkpoint-144)... Done. 6.2s
[34m[1mwandb[0m: Adding directory to artifact (./checkpoints/checkpoint-192)... Done. 5.4s
[34m[1mwandb[0m: Adding directory to artifact (./checkpoints/checkpoint-240)... Done. 9.5s
[34m[1mwandb[0m: Adding directory to artifact (./checkpoints/checkpoint-288)... Done. 11.5s
[34m[1mwandb[0m: Adding directory to artifact (./checkpoints/checkpoint-336)... Done. 9.3s
[34m[1mwandb[0m: Adding directory to artifact (./checkpoints/checkpoint-384)... Done. 6.4s
[34m[1mwandb[0m: Adding directory to artifact (./checkpoints/checkpoint-432)... Done. 8.4s
[34m[1mwandb[0m: Adding directory to artifact (./checkpoints/checkpoint-480)... Done. 6.0s
[34m[1mwandb[0m: Adding directory to artifact (./checkpoin

TrainOutput(global_step=480, training_loss=0.0905678960805138, metrics={'train_runtime': 577.2257, 'train_samples_per_second': 13.166, 'train_steps_per_second': 0.832, 'total_flos': 992981514240000.0, 'train_loss': 0.0905678960805138, 'epoch': 10.0})

Now that our model has finished training, we can use it to make some predictions for the text we have:

In [42]:
trainer.evaluate()



Error: You must call wandb.init() before wandb.log()

In [2]:
import wandb
wandb.finish()


In [43]:
import torch

# Change this to try out your model with your own text!
text_example = clean_train

inputs = tokenizer(
    text_example,
    return_offsets_mapping=True,
    return_overflowing_tokens=True,
    truncation=True,
    padding=True,
    max_length=MAX_LENGTH,
    stride=STRIDE,
    add_special_tokens=True,
    return_tensors="pt",
).to(model.device)

with torch.inference_mode():
    predictions = torch.argmax(
        model(
            input_ids=inputs["input_ids"][:1],  # type: ignore
            attention_mask=inputs["attention_mask"][:1],  # type: ignore
        ).logits,
        dim=-1,
    )
    predicted_token_class = [model.config.id2label[t.item()] for t in predictions[0]]

    for cls, (start, end) in zip(predicted_token_class, inputs["offset_mapping"][0]):  # type: ignore
        print(f"{text_example[start:end]:<20}", cls)

                     other
Tampa                other
Scale                other
for                  other
Kin                  other
es                   other
io                   other
phobia               other
(                    other
Miller               other
,                    other
Ko                   other
ri                   other
and                  other
Todd                 other
1991                 other
)                    other
1                    other
=                    other
strongly             answer
disagree             answer
2                    other
=                    other
disagree             answer
3                    other
=                    other
agree                answer
4                    other
=                    other
strongly             answer
agree                answer
1                    other
.                    other
I                    question
'                    question
m                    question
afraid       

## Producing a submission package

**Now, we will move onto creating your first submission!**

When you upload your work to the DOXA AI platform, your code will be run in an environment with no internet access. As such, your submission needs to contain any models you want to use as part of the submission, as well as any code necessary to use those models (including tokenisers).

Currently, the `submission/` folder contains three files:

- `submission/competition.py`: this contains competition-specific code used to interface with the platform
- `submission/doxa.yaml`: this is a configuration file used by the DOXA CLI when you make a submission
- `submission/run.py`: this is the Python script that gets run when your work gets evaluated (**you will need to edit this to implement your solution!**)

First, we will save model and tokeniser into our `submission/` directory:

In [None]:
tokenizer.save_pretrained("submission/tokenizer")
trainer.save_model("submission/model")

When you upload your submission to the platform, based on the current configuration in `doxa.yaml`, the `run.py` entrypoint file will be run. If you take a look at `run.py`, you will see the following:

```py
class Evaluator(BaseEvaluator):
    def predict(
        self, text: str
    ) -> Generator[Tuple[int, int, Literal["Q", "A"]], Any, None]:
        # Load the saved tokeniser and model
        tokenizer = AutoTokenizer.from_pretrained(directory / "tokenizer")
        model = AutoModelForTokenClassification.from_pretrained(directory / "model")

        # Tokenise the input text
        inputs = tokenizer(
            text,
            return_offsets_mapping=True,
            return_overflowing_tokens=True,
            truncation=True,
            padding=True,
            max_length=512,
            stride=16,
            add_special_tokens=True,
            return_tensors="pt",
        ).to(model.device)

        # Chunks overlap, so we want to keep track of predictions we have already made
        done = set()

        # Produce predictions for each example (in inference mode)
        with torch.inference_mode():
            for input_ids, attention_mask, offsets in zip(inputs["input_ids"], inputs["attention_mask"], inputs["offset_mapping"]):  # type: ignore
                predictions = torch.argmax(
                    model(input_ids=input_ids, attention_mask=attention_mask).logits,
                    dim=2,
                )

                for t, (start, end) in zip(predictions[0], offsets):
                    if (start, end) in done or (start == 0 and end == 0):
                        continue

                    done.add((start, end))

                    predicted_token_class = model.config.id2label[t.item()]
                    if predicted_token_class == "question":
                        yield (start, end, "Q")
                    elif predicted_token_class == "answer":
                        yield (start, end, "A")
```

In the `predict()` method, we load the tokeniser and the model we had just been fine-tuning and then use them to produce predictions for the test set. You only need to output where you believe the questions and answers are, and the starting and ending ranges can be larger than a single token (i.e. you could produce a single prediction for a whole question or multiple predictions for each individual word, and the platform will match them up).

**When you come to implement your own solution, you will likely need to edit `predict()` in `run.py` to work with your model. Also, make sure you include the right model in your submission!**

You can edit `predict()` however you wish, as long as it produces question and answer range predictions that are contained within the document! If your submission is slow to evaluate on the platform, you may wish to edit `predict()` to perform inference in batches rather than chunk by chunk, but this will use more RAM. Note that in addition to the RAM limit, there is a submission size limit, so make sure you are only uploading models that are relevant to your current submisison.

## Uploading your submission to the platform

You are now ready to make your first submission to the platform! 👀

**Make sure to [enrol to take part](https://doxaai.com/competition/harmony-parsing) in the challenge if you have not already done so.**

First, we need to make sure we are logged in:


In [None]:
!doxa login

And then, we can submit our work for evaluation:


In [None]:
!doxa upload submission

**Congratulations!** 🥳

By this point, you will now have just made your first submission for this challenge on the DOXA AI platform!

If everything went well, your submission will now be queued up for evaluation. It will first be run on a small validation set to make sure that your submission does not crash on the full test set. If your submission runs into an issue at this point, you will be able to see the error logs from this phase. Otherwise, if your submission passes this stage, it will be evaluated on the full test set, and you will soon appear on the [competition scoreboard](https://doxaai.com/competition/harmony-parsing/scoreboard)!


## Next steps

**Now, it is up to you as to where you go from here to solve this challenge!**

Here are some ideas you might want to test out:

- How could you improve the training process to boost performance?
- What other [pre-trained models](https://huggingface.co/models?pipeline_tag=token-classification&sort=trending) in HuggingFace transformers could you use?
- How could you provide a `compute_metrics` function to the `Trainer` to produce additional metrics? (e.g. accuracy)
- How could you make better use of the training data provided?

If you are new to fine-tuning language models, take a look at the excellent [HuggingFace `transformers` documentation](https://huggingface.co/docs/transformers/en/training)!

**We look forward to seeing what you build!** We would love to hear about what you are working on for this challenge, so do let us know how you are finding the challenge on the [Harmony community Discord server](https://discord.com/invite/harmonydata) or the [DOXA AI community Discord server](https://discord.gg/MUvbQ3UYcf). 😎
