# OpenBook DeBERTaV3-Large with an updated model

This work is based on the great [work](https://www.kaggle.com/code/nlztrk/openbook-debertav3-large-baseline-single-model) of [nlztrk](https://www.kaggle.com/nlztrk).

I trained a model offline using the dataset I shared [here](https://www.kaggle.com/datasets/mgoksu/llm-science-exam-dataset-w-context). I just added my model to the original notebook. The model is available [here](https://www.kaggle.com/datasets/mgoksu/llm-science-run-context-2).

I also addressed the problem of [CSV Not Found at submission](https://www.kaggle.com/competitions/kaggle-llm-science-exam/discussion/434228) with this notebook by clipping the context like so:

`test_df["prompt"] = test_df["context"].apply(lambda x: x[:1500]) + " #### " +  test_df["prompt"]`

You can probably get more than 1500 without getting an OOM.

In [1]:
# installing offline dependencies
!pip install -U /kaggle/input/faiss-gpu-173-python310/faiss_gpu-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
!cp -rf /kaggle/input/sentence-transformers-222/sentence-transformers /kaggle/working/sentence-transformers
!pip install -U /kaggle/working/sentence-transformers
!pip install -U /kaggle/input/blingfire-018/blingfire-0.1.8-py3-none-any.whl

!pip install --no-index --no-deps /kaggle/input/llm-whls/transformers-4.31.0-py3-none-any.whl
!pip install --no-index --no-deps /kaggle/input/llm-whls/peft-0.4.0-py3-none-any.whl
!pip install --no-index --no-deps /kaggle/input/llm-whls/datasets-2.14.3-py3-none-any.whl
!pip install --no-index --no-deps /kaggle/input/llm-whls/trl-0.5.0-py3-none-any.whl

Processing /kaggle/input/faiss-gpu-173-python310/faiss_gpu-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Installing collected packages: faiss-gpu
Successfully installed faiss-gpu-1.7.2
Processing ./sentence-transformers
  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: sentence-transformers
  Building wheel for sentence-transformers (setup.py) ... [?25ldone
[?25h  Created wheel for sentence-transformers: filename=sentence_transformers-2.2.2-py3-none-any.whl size=126134 sha256=7e7af9bc944f1a02a3ad0b7df94ff7cd14139382b31fe88c90db271778cbd4a2
  Stored in directory: /root/.cache/pip/wheels/6c/ea/76/d9a930b223b1d3d5d6aff69458725316b0fe205b854faf1812
Successfully built sentence-transformers
Installing collected packages: sentence-transformers
Successfully installed sentence-transformers-2.2.2
Processing /kaggle/input/blingfire-018/blingfire-0.1.8-py3-none-any.whl
Installing collected packages: blingfire
Successfully installed blingfir

In [2]:
import os
import gc
import pandas as pd
import numpy as np
import re
from tqdm.auto import tqdm
import blingfire as bf
from __future__ import annotations

from collections.abc import Iterable

import faiss
from faiss import write_index, read_index

from sentence_transformers import SentenceTransformer

import torch
import ctypes
libc = ctypes.CDLL("libc.so.6")

from dataclasses import dataclass
from typing import Optional, Union

import torch
import numpy as np
import pandas as pd
from datasets import Dataset
from transformers import AutoTokenizer
from transformers import AutoModelForMultipleChoice, TrainingArguments, Trainer
from transformers.tokenization_utils_base import PreTrainedTokenizerBase, PaddingStrategy
from torch.utils.data import DataLoader

caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl6StatusC1EN10tensorflow5error4CodeESt17basic_string_viewIcSt11char_traitsIcEENS_14SourceLocationE']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZTVN10tensorflow13GcsFileSystemE']


In [3]:
def process_documents(documents: Iterable[str],
                      document_ids: Iterable,
                      split_sentences: bool = True,
                      filter_len: int = 7,
                      disable_progress_bar: bool = False) -> pd.DataFrame:
    """
    Main helper function to process documents from the EMR.

    :param documents: Iterable containing documents which are strings
    :param document_ids: Iterable containing document unique identifiers
    :param document_type: String denoting the document type to be processed
    :param document_sections: List of sections for a given document type to process
    :param split_sentences: Flag to determine whether to further split sections into sentences
    :param filter_len: Minimum character length of a sentence (otherwise filter out)
    :param disable_progress_bar: Flag to disable tqdm progress bar
    :return: Pandas DataFrame containing the columns `document_id`, `text`, `section`, `offset`
    """
    
    df = sectionize_documents(documents, document_ids, disable_progress_bar)

    if split_sentences:
        df = sentencize(df.text.values, 
                        df.document_id.values,
                        df.offset.values, 
                        filter_len, 
                        disable_progress_bar)
    return df


def sectionize_documents(documents: Iterable[str],
                         document_ids: Iterable,
                         disable_progress_bar: bool = False) -> pd.DataFrame:
    """
    Obtains the sections of the imaging reports and returns only the 
    selected sections (defaults to FINDINGS, IMPRESSION, and ADDENDUM).

    :param documents: Iterable containing documents which are strings
    :param document_ids: Iterable containing document unique identifiers
    :param disable_progress_bar: Flag to disable tqdm progress bar
    :return: Pandas DataFrame containing the columns `document_id`, `text`, `offset`
    """
    processed_documents = []
    for document_id, document in tqdm(zip(document_ids, documents), total=len(documents), disable=disable_progress_bar):
        row = {}
        text, start, end = (document, 0, len(document))
        row['document_id'] = document_id
        row['text'] = text
        row['offset'] = (start, end)

        processed_documents.append(row)

    _df = pd.DataFrame(processed_documents)
    if _df.shape[0] > 0:
        return _df.sort_values(['document_id', 'offset']).reset_index(drop=True)
    else:
        return _df


def sentencize(documents: Iterable[str],
               document_ids: Iterable,
               offsets: Iterable[tuple[int, int]],
               filter_len: int = 7,
               disable_progress_bar: bool = False) -> pd.DataFrame:
    """
    Split a document into sentences. Can be used with `sectionize_documents`
    to further split documents into more manageable pieces. Takes in offsets
    to ensure that after splitting, the sentences can be matched to the
    location in the original documents.

    :param documents: Iterable containing documents which are strings
    :param document_ids: Iterable containing document unique identifiers
    :param offsets: Iterable tuple of the start and end indices
    :param filter_len: Minimum character length of a sentence (otherwise filter out)
    :return: Pandas DataFrame containing the columns `document_id`, `text`, `section`, `offset`
    """

    document_sentences = []
    for document, document_id, offset in tqdm(zip(documents, document_ids, offsets), total=len(documents), disable=disable_progress_bar):
        try:
            _, sentence_offsets = bf.text_to_sentences_and_offsets(document)
            for o in sentence_offsets:
                if o[1]-o[0] > filter_len:
                    sentence = document[o[0]:o[1]]
                    abs_offsets = (o[0]+offset[0], o[1]+offset[0])
                    row = {}
                    row['document_id'] = document_id
                    row['text'] = sentence
                    row['offset'] = abs_offsets
                    document_sentences.append(row)
        except:
            continue
    return pd.DataFrame(document_sentences)

In [4]:
SIM_MODEL = '/kaggle/input/sentencetransformers-allminilml6v2/sentence-transformers_all-MiniLM-L6-v2'
DEVICE = 0
MAX_LENGTH = 512
BATCH_SIZE = 8

WIKI_PATH = "/kaggle/input/wikipedia-20230701"
wiki_files = os.listdir(WIKI_PATH)

# Relevant Title Retrieval

In [5]:
trn = pd.read_csv("/kaggle/input/kaggle-llm-science-exam/test.csv").drop("id", 1)
trn.head()

  trn = pd.read_csv("/kaggle/input/kaggle-llm-science-exam/test.csv").drop("id", 1)


Unnamed: 0,prompt,A,B,C,D,E
0,Which of the following statements accurately d...,MOND is a theory that reduces the observed mis...,MOND is a theory that increases the discrepanc...,MOND is a theory that explains the missing bar...,MOND is a theory that reduces the discrepancy ...,MOND is a theory that eliminates the observed ...
1,Which of the following is an accurate definiti...,Dynamic scaling refers to the evolution of sel...,Dynamic scaling refers to the non-evolution of...,Dynamic scaling refers to the evolution of sel...,Dynamic scaling refers to the non-evolution of...,Dynamic scaling refers to the evolution of sel...
2,Which of the following statements accurately d...,The triskeles symbol was reconstructed as a fe...,The triskeles symbol is a representation of th...,The triskeles symbol is a representation of a ...,The triskeles symbol represents three interloc...,The triskeles symbol is a representation of th...
3,What is the significance of regularization in ...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...
4,Which of the following statements accurately d...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...


In [6]:
model = SentenceTransformer(SIM_MODEL, device='cuda')
model.max_seq_length = MAX_LENGTH
model = model.half()

In [7]:
sentence_index = read_index("/kaggle/input/wikipedia-2023-07-faiss-index/wikipedia_202307.index")

In [8]:
## question에 해당하는 prompt만 가지고 embedding을 만듦

prompt_embeddings = model.encode(trn.prompt.values, batch_size=BATCH_SIZE, device=DEVICE, show_progress_bar=True, convert_to_tensor=True, normalize_embeddings=True)
prompt_embeddings = prompt_embeddings.detach().cpu().numpy()
_ = gc.collect()

Batches:   0%|          | 0/25 [00:00<?, ?it/s]

In [9]:
## Get the top 6 pages that are likely to contain the topic of interest
## question embedding과 가장 유사한 index를 뽑는다.

search_score, search_index = sentence_index.search(prompt_embeddings, 6)

In [10]:
## Save memory - delete sentence_index since it is no longer necessary
del sentence_index
del prompt_embeddings
_ = gc.collect()
libc.malloc_trim(0)

1

In [11]:
search_score

array([[0.8242638 , 0.94124174, 0.98174715, 0.9998764 , 1.0061644 ,
        1.0151274 ],
       [0.38855505, 0.79623413, 0.81992173, 0.8434491 , 0.8605889 ,
        0.8640441 ],
       [0.76912475, 0.9601331 , 0.9661969 , 0.97057855, 1.0014999 ,
        1.0034419 ],
       ...,
       [0.98541725, 1.0154207 , 1.0192184 , 1.035223  , 1.0528091 ,
        1.0543661 ],
       [0.8719076 , 0.87460625, 0.874792  , 0.8862244 , 0.88910675,
        0.9024073 ],
       [0.6469635 , 0.69145715, 0.7300302 , 0.78801167, 0.8041347 ,
        0.84459054]], dtype=float32)

In [12]:
search_index

array([[3573843, 4906500, 1830796, 3408267, 3260726, 1429137],
       [1431454, 5135549, 5135229, 5135548, 1431498, 5094879],
       [5819511, 5806421, 5810490, 5815906,  885478, 5845219],
       ...,
       [4799422, 1114209, 1059625, 3487902, 1106851, 1059279],
       [4557948, 2958565, 3795798,   51689, 3795692, 4795172],
       [1464446,  363737, 1464453, 1464449, 4831426,  956354]])

# Getting Sentences from the Relevant Titles

In [13]:
df = pd.read_parquet("/kaggle/input/wikipedia-20230701/wiki_2023_index.parquet",
                     columns=['id', 'file'])

In [14]:
## Get the article and associated file location using the index
wikipedia_file_data = []

for i, (scr, idx) in tqdm(enumerate(zip(search_score, search_index)), total=len(search_score)):
    scr_idx = idx
    _df = df.loc[scr_idx].copy()
    _df['prompt_id'] = i
    wikipedia_file_data.append(_df)
wikipedia_file_data = pd.concat(wikipedia_file_data).reset_index(drop=True)
wikipedia_file_data = wikipedia_file_data[['id', 'prompt_id', 'file']].drop_duplicates().sort_values(['file', 'id']).reset_index(drop=True)

## Save memory - delete df since it is no longer necessary
del df
_ = gc.collect()
libc.malloc_trim(0)

  0%|          | 0/200 [00:00<?, ?it/s]

1

In [15]:
## top 6의 article이 있는 file을 담고 있다. 

wikipedia_file_data

Unnamed: 0,id,prompt_id,file
0,1141,36,a.parquet
1,1141,151,a.parquet
2,11963992,185,a.parquet
3,11963992,191,a.parquet
4,1200,63,a.parquet
...,...,...,...
1195,31557501,49,y.parquet
1196,34341,179,y.parquet
1197,47610211,49,y.parquet
1198,5187243,103,y.parquet


In [16]:
## Get the full text data
wiki_text_data = []

for file in tqdm(wikipedia_file_data.file.unique(), total=len(wikipedia_file_data.file.unique())):
    _id = [str(i) for i in wikipedia_file_data[wikipedia_file_data['file']==file]['id'].tolist()]
    _df = pd.read_parquet(f"{WIKI_PATH}/{file}", columns=['id', 'text'])

    _df_temp = _df[_df['id'].isin(_id)].copy()
    del _df
    _ = gc.collect()
    libc.malloc_trim(0)
    wiki_text_data.append(_df_temp)
wiki_text_data = pd.concat(wiki_text_data).drop_duplicates().reset_index(drop=True)
_ = gc.collect()

  0%|          | 0/28 [00:00<?, ?it/s]

In [22]:
wiki_text_data

Unnamed: 0,id,text
0,5259071,A Briefer History of Time is a 2006 popular-sc...
1,65293114,A History of the Theories of Aether and Electr...
2,1550261,"The American Petroleum Institute gravity, or A..."
3,4389619,"In superconductivity, fluxon (also called a Ab..."
4,1963,Absolute magnitude () is a measure of the lumi...
...,...,...
1138,47610211,
1139,5187243,A yellow hypergiant (YHG) is a massive star wi...
1140,1217512,Yellow sun or Yellow Sun may refer to: *Yellow...
1141,1063160,was a Japanese-American physicist and professo...


In [17]:
## Parse documents into sentences
processed_wiki_text_data = process_documents(wiki_text_data.text.values, wiki_text_data.id.values)

  0%|          | 0/1143 [00:00<?, ?it/s]

  0%|          | 0/1143 [00:00<?, ?it/s]

In [25]:
processed_wiki_text_data ## offset이 의미하는 바가 뭐지?

Unnamed: 0,document_id,text,offset
0,10087606,"In group theory, geometry, representation theo...","(0, 207)"
1,10087606,"For example, as transformations of an object i...","(208, 329)"
2,10087606,Such symmetry operations are performed with re...,"(330, 441)"
3,10087606,"In the context of molecular symmetry, a symmet...","(442, 631)"
4,10087606,"Two basic facts follow from this definition, w...","(632, 709)"
...,...,...,...
58845,9962772,Mit Beitr.,"(6031, 6041)"
58846,9962772,"Barth, 1957) == Selected publications == *Carl...","(6049, 6223)"
58847,9962772,"(Received 7 September 1920, published in issue...","(6224, 7033)"
58848,9962772,"Volume 1 Part 2 The Quantum Theory of Planck, ...","(7034, 7169)"


In [18]:
## Get embeddings of the wiki text data
wiki_data_embeddings = model.encode(processed_wiki_text_data.text,
                                    batch_size=BATCH_SIZE,
                                    device=DEVICE,
                                    show_progress_bar=True,
                                    convert_to_tensor=True,
                                    normalize_embeddings=True)#.half()
wiki_data_embeddings = wiki_data_embeddings.detach().cpu().numpy()

Batches:   0%|          | 0/7357 [00:00<?, ?it/s]

In [19]:
_ = gc.collect()

In [20]:
## Combine all answers
trn['answer_all'] = trn.apply(lambda x: " ".join([x['A'], x['B'], x['C'], x['D'], x['E']]), axis=1)


## Search using the prompt and answers to guide the search
trn['prompt_answer_stem'] = trn['prompt'] + " " + trn['answer_all']

In [21]:
question_embeddings = model.encode(trn.prompt_answer_stem.values, batch_size=BATCH_SIZE, device=DEVICE, show_progress_bar=True, convert_to_tensor=True, normalize_embeddings=True)
question_embeddings = question_embeddings.detach().cpu().numpy()

Batches:   0%|          | 0/25 [00:00<?, ?it/s]

# Extracting Matching Prompt-Sentence Pairs

In [24]:
## Parameter to determine how many relevant sentences to include
NUM_SENTENCES_INCLUDE = 20

## List containing just Context
contexts = []

for r in tqdm(trn.itertuples(), total=len(trn)):

    prompt_id = r.Index

    prompt_indices = processed_wiki_text_data[processed_wiki_text_data['document_id'].isin(wikipedia_file_data[wikipedia_file_data['prompt_id']==prompt_id]['id'].values)].index.values

    if prompt_indices.shape[0] > 0:
        prompt_index = faiss.index_factory(wiki_data_embeddings.shape[1], "Flat")
        prompt_index.add(wiki_data_embeddings[prompt_indices])

        context = ""
        
        ## Get the top matches
        ss, ii = prompt_index.search(question_embeddings, NUM_SENTENCES_INCLUDE)
        for _s, _i in zip(ss[prompt_id], ii[prompt_id]):
            context += processed_wiki_text_data.loc[prompt_indices]['text'].iloc[_i] + " "
        
    contexts.append(context)

  0%|          | 0/200 [00:00<?, ?it/s]

In [26]:
trn['context'] = contexts

In [27]:
trn[["prompt", "context", "A", "B", "C", "D", "E"]].to_csv("./test_context.csv", index=False)

# Inference

In [28]:
test_df = pd.read_csv("test_context.csv")
test_df.index = list(range(len(test_df)))
test_df['id'] = list(range(len(test_df)))
test_df["prompt"] = test_df["context"].apply(lambda x: x[:2500]) + " #### " +  test_df["prompt"]
test_df['answer'] = 'B'

In [29]:
model_dir = "/kaggle/input/llm-science-run-context-2"
tokenizer = AutoTokenizer.from_pretrained(model_dir)

In [30]:
# We'll create a dictionary to convert option names (A, B, C, D, E) into indices and back again
options = 'ABCDE'
indices = list(range(5))

option_to_index = {option: index for option, index in zip(options, indices)}
index_to_option = {index: option for option, index in zip(options, indices)}

def preprocess(example):
    # The AutoModelForMultipleChoice class expects a set of question/answer pairs
    # so we'll copy our question 5 times before tokenizing
    first_sentence = [example['prompt']] * 5
    second_sentence = []
    for option in options:
        second_sentence.append(example[option])
    # Our tokenizer will turn our text into token IDs BERT can understand
    tokenized_example = tokenizer(first_sentence, second_sentence, truncation=True)
    tokenized_example['label'] = option_to_index[example['answer']]
    return tokenized_example

In [31]:
@dataclass
class DataCollatorForMultipleChoice:
    tokenizer: PreTrainedTokenizerBase
    padding: Union[bool, str, PaddingStrategy] = True
    max_length: Optional[int] = None
    pad_to_multiple_of: Optional[int] = None
    
    def __call__(self, features):
        label_name = "label" if 'label' in features[0].keys() else 'labels'
        labels = [feature.pop(label_name) for feature in features]
        batch_size = len(features)
        num_choices = len(features[0]['input_ids'])
        flattened_features = [
            [{k: v[i] for k, v in feature.items()} for i in range(num_choices)] for feature in features
        ]
        flattened_features = sum(flattened_features, [])
        
        batch = self.tokenizer.pad(
            flattened_features,
            padding=self.padding,
            max_length=self.max_length,
            pad_to_multiple_of=self.pad_to_multiple_of,
            return_tensors='pt',
        )
        batch = {k: v.view(batch_size, num_choices, -1) for k, v in batch.items()}
        batch['labels'] = torch.tensor(labels, dtype=torch.int64)
        return batch

In [32]:
tokenized_test_dataset = Dataset.from_pandas(test_df[['id', 'prompt', 'A', 'B', 'C', 'D', 'E', 'answer']].drop(columns=['id'])).map(preprocess, remove_columns=['prompt', 'A', 'B', 'C', 'D', 'E', 'answer'])
tokenized_test_dataset = tokenized_test_dataset.remove_columns(["__index_level_0__"])
data_collator = DataCollatorForMultipleChoice(tokenizer=tokenizer)
test_dataloader = DataLoader(tokenized_test_dataset, batch_size=1, shuffle=False, collate_fn=data_collator)

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


In [33]:
model_ckpts = {
    "/kaggle/input/llm-science-run-context-2": 0.25,
    "/kaggle/input/how-to-train-open-book-model-part-1/model_v2": 0.35,
    "/kaggle/input/2023kagglellm-deberta-v3-large-model1": 0.1,
    "/kaggle/input/my-1-epoch": 0.1,
    "/kaggle/input/llm-se-debertav3-large": 0.08,
    "/kaggle/input/science-exam-trained-model-weights/run_0": 0.06,
    "/kaggle/input/science-exam-trained-model-weights/run_2": 0.06
}

In [34]:
model_ckpts.keys()

dict_keys(['/kaggle/input/llm-science-run-context-2', '/kaggle/input/how-to-train-open-book-model-part-1/model_v2', '/kaggle/input/2023kagglellm-deberta-v3-large-model1', '/kaggle/input/my-1-epoch', '/kaggle/input/llm-se-debertav3-large', '/kaggle/input/science-exam-trained-model-weights/run_0', '/kaggle/input/science-exam-trained-model-weights/run_2'])

In [35]:
preds = None
for ckpt in tqdm(model_ckpts.keys()):
    print(ckpt + ':' + str(model_ckpts[ckpt]))
    model = AutoModelForMultipleChoice.from_pretrained(ckpt).cuda()
    model.eval()
    
    test_predictions = []
    for batch in tqdm(test_dataloader):
        for k in batch.keys():
            batch[k] = batch[k].cuda()
        with torch.no_grad():
            outputs = model(**batch)
        test_predictions.append(outputs.logits.cpu().detach())
    predictions = torch.cat(test_predictions)
    print(predictions.shape)
    if preds is not None:
        preds += predictions * model_ckpts[ckpt]
    else:
        preds = predictions * model_ckpts[ckpt]
    del model

predictions_as_ids = np.argsort(-preds, 1)

predictions_as_answer_letters = np.array(list('ABCDE'))[predictions_as_ids]
# predictions_as_answer_letters[:3]

predictions_as_string = test_df['prediction'] = [
    ' '.join(row) for row in predictions_as_answer_letters[:, :3]
]

  0%|          | 0/7 [00:00<?, ?it/s]

/kaggle/input/llm-science-run-context-2:0.25


  0%|          | 0/200 [00:00<?, ?it/s]

You're using a DebertaV2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


torch.Size([200, 5])
/kaggle/input/how-to-train-open-book-model-part-1/model_v2:0.35


  0%|          | 0/200 [00:00<?, ?it/s]

torch.Size([200, 5])
/kaggle/input/2023kagglellm-deberta-v3-large-model1:0.1


  0%|          | 0/200 [00:00<?, ?it/s]

torch.Size([200, 5])
/kaggle/input/my-1-epoch:0.1


  0%|          | 0/200 [00:00<?, ?it/s]

torch.Size([200, 5])
/kaggle/input/llm-se-debertav3-large:0.08


  0%|          | 0/200 [00:00<?, ?it/s]

torch.Size([200, 5])
/kaggle/input/science-exam-trained-model-weights/run_0:0.06


  0%|          | 0/200 [00:00<?, ?it/s]

torch.Size([200, 5])
/kaggle/input/science-exam-trained-model-weights/run_2:0.06


  0%|          | 0/200 [00:00<?, ?it/s]

torch.Size([200, 5])


In [36]:
submission = test_df[['id', 'prediction']]
submission.to_csv('submission.csv', index=False)

In [37]:
submission.head(20)

Unnamed: 0,id,prediction
0,0,D E B
1,1,A B E
2,2,A D B
3,3,C D A
4,4,D A B
5,5,B C E
6,6,A C D
7,7,D E B
8,8,C B A
9,9,A E B
