<a href="https://colab.research.google.com/github/jensjoris/Multi-stage-AzureML-Pipeline-Demo/blob/main/talentclef2025/TalentCLEF_submission_creation_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial: Preparing submission file and run evaluation

In this notebook, a step-by-step tutorial is provided for preparing the submission file for the shared task. To achieve this, the data for Task A, hosted on [Zenodo](https://doi.org/10.5281/zenodo.14002665), will be downloaded; a file with the appropriate [submission format](https://talentclef.github.io/talentclef/docs/talentclef-2025/evaluation/) will be prepared, and it will be evaluated using the [task's evaluation script](https://github.com/TalentCLEF/talentclef25_evaluation_script). Additionally, the provided format is also compatible with the benchmark where the test set data will be uploaded on Codabench.



-----------------------------
TalentCLEF is an initiative to advance Natural Language Processing (NLP) in Human Capital Management (HCM). It aims to create a public benchmark for model evaluation and promote collaboration to develop fair, multilingual, and flexible systems that improve Human Resources (HR) practices across different industries.

This shared-task's inaugural edition is part of the [Conference and Labs of the Evaluation Forum (CLEF)](https://clef2025.clef-initiative.eu/index.php?page=Pages/labs.html), scheduled to be held in Madrid in 2025. If you are interested in registering, you can find registration form [here](https://clef2025-labs-registration.dei.unipd.it/).

<img src="https://github.com/TalentCLEF/talentclef/blob/main/logo_talentclef.png?raw=true" alt="TalentCLEF logo" width="200"/>
<img src="https://talentclef.github.io/talentclef/docs/talentclef-2025/workshop/logo_clef_madrid.png" alt="TalentCLEF logo" width="150"/>


## Imports

In [1]:
import pandas as pd
import numpy as np
from sentence_transformers import SentenceTransformer, util
import subprocess

## Download Task A files

First, let's download the Task A and Task B zip files directly from Zenodo.



In [2]:
# Download
!wget https://zenodo.org/records/14879510/files/TaskA.zip
!unzip TaskA.zip -d taskA

--2025-02-28 17:40:35--  https://zenodo.org/records/14879510/files/TaskA.zip
Resolving zenodo.org (zenodo.org)... 188.185.48.194, 188.185.45.92, 188.185.43.25, ...
Connecting to zenodo.org (zenodo.org)|188.185.48.194|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1059184 (1.0M) [application/octet-stream]
Saving to: ‘TaskA.zip’


2025-02-28 17:40:36 (1.56 MB/s) - ‘TaskA.zip’ saved [1059184/1059184]

Archive:  TaskA.zip
   creating: taskA/test/
   creating: taskA/training/
   creating: taskA/training/english/
  inflating: taskA/training/english/taskA_training_en.tsv  
   creating: taskA/training/spanish/
  inflating: taskA/training/spanish/taskA_training_es.tsv  
   creating: taskA/training/german/
  inflating: taskA/training/german/taskA_training_de.tsv  
   creating: taskA/validation/
   creating: taskA/validation/chinese/
  inflating: taskA/validation/chinese/corpus_elements  
  inflating: taskA/validation/chinese/queries  
  inflating: taskA/validation/chin

## Generate releveant files using a simple model

Load queries and corpus elements in English from the Validation folder:

In [3]:
queries = "/content/taskA/validation/english/queries"
corpus_elements = "/content/taskA/validation/english/corpus_elements"

In [4]:
queries = pd.read_csv(queries,sep="\t")
corpus_elements = pd.read_csv(corpus_elements, sep="\t")

Generate a mapping dictionary between IDs and texts from query and corpus element strings.

In [5]:
queries_ids = queries.q_id.to_list()
queries_texts = queries.jobtitle.to_list()
map_queries = dict(zip(queries_ids,queries_texts))

corpus_ids = corpus_elements.c_id.to_list()
corpus_texts = corpus_elements.jobtitle.to_list()
map_corpus = dict(zip(queries_ids,queries_texts))

Load simple embedding model:

In [6]:
model = SentenceTransformer("all-MiniLM-L6-v2")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling%2Fconfig.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [16]:
queries_texts[:5]

['nanny',
 'food technologist',
 'broadcast engineer',
 'automation engineer',
 'veterinarian']

Encode queries and corpus elements:

In [7]:
query_embeddings = model.encode(queries_texts, convert_to_tensor=True)
corpus_embeddings = model.encode(corpus_texts, convert_to_tensor=True)

Compute similarities

In [8]:
similarities = util.cos_sim(query_embeddings, corpus_embeddings).cpu().numpy()

## Prepare submission file

The submissions must follow the TREC Run File format, including headers in the output file. This means that the fle have 6 space-spearated columns per line, with following information:

- q_id: Query ID.
- Q0: A constant identifier, usually "Q0".
- doc_id: ID of the retrieved document.
- rank: Position of the document in the ranking.
- score: Relevance score assigned by the model.
- tag: Experiment name

In [9]:
import numpy as np
results = []
for q_idx, q_id in enumerate(queries_ids):
    sorted_indices = np.argsort(-similarities[q_idx])  # Decrease order
    for rank, c_idx in enumerate(sorted_indices[:10]):  # For this tutorial consider only 10 relevant files
        doc_id = corpus_ids[c_idx]
        score = similarities[q_idx, c_idx]
        results.append(f"{str(q_id)} Q0 {str(doc_id)} {rank+1} {score:.4f} baseline_model")



The list has this structure

In [10]:
results[0:2]

['1 Q0 870 1 0.6761 baseline_model', '1 Q0 2016 2 0.6558 baseline_model']

Let's save the list as a file:

In [11]:
with open("evaluation_baseline.trec", "w", encoding="utf-8") as f:
    f.write("\n".join(results))

## Evaluation

For the evaluation, we will use the official [TalentCLEF evaluation script](https://github.com/TalentCLEF/talentclef25_evaluation_script), which uses the Ranx library under the hood.

First, clone the repo and install the requirements file:

In [12]:
!git clone https://github.com/TalentCLEF/talentclef25_evaluation_script.git
!pip install -r /content/talentclef25_evaluation_script/requirements.txt


Cloning into 'talentclef25_evaluation_script'...
remote: Enumerating objects: 24, done.[K
remote: Counting objects: 100% (24/24), done.[K
remote: Compressing objects: 100% (22/22), done.[K
remote: Total 24 (delta 8), reused 0 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (24/24), 9.21 KiB | 9.21 MiB/s, done.
Resolving deltas: 100% (8/8), done.
Collecting ranx (from -r /content/talentclef25_evaluation_script/requirements.txt (line 2))
  Downloading ranx-0.3.20-py3-none-any.whl.metadata (17 kB)
Collecting ir-datasets (from ranx->-r /content/talentclef25_evaluation_script/requirements.txt (line 2))
  Downloading ir_datasets-0.5.9-py3-none-any.whl.metadata (12 kB)
Collecting lz4 (from ranx->-r /content/talentclef25_evaluation_script/requirements.txt (line 2))
  Downloading lz4-4.4.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)
Collecting cbor2 (from ranx->-r /content/talentclef25_evaluation_script/requirements.txt (line 2))
  Downloading cb

Then, select the Qrels file and the Run file to perform the evaluation.


In [13]:
qrels_file = "/content/taskA/validation/english/qrels.tsv"
run_file = "/content/evaluation_baseline.trec"

In [14]:
command = ["python", "/content/talentclef25_evaluation_script/talentclef_evaluate.py", "--qrels", qrels_file, "--run", run_file]
result = subprocess.run(command, capture_output=True, text=True)
print(result.stdout)

Received parameters:
  qrels: /content/taskA/validation/english/qrels.tsv
  run: /content/evaluation_baseline.trec
Loading qrels...
Loading run...
Running evaluation...

=== Evaluation Results ===
map: 0.2923
mrr: 0.7609
ndcg: 0.4296
precision@5: 0.6762
precision@10: 0.5914
precision@100: 0.0591

