### BiLSTM-CRF Baseline Tagger

This script runs the NER Tagger (https://github.com/glample/tagger) on the GutBrainie2025 (https://hereditary.dei.unipd.it/challenges/gutbrainie/2025/) dataset and uses the official GutBrainie2025 Evaluation script to evaluate its performance (cf. https://github.com/MMartinelli-hub/GutBrainIE_2025_Baseline/).

In [1]:
from google.colab import files
import shutil
import os
import json
import random
import re
from pathlib import Path

#### Install Python 2.7, Theano and Numpy (as required by https://github.com/glample/tagger)

In [2]:
!sudo apt-get update
!sudo apt-get install python2.7 python2.7-dev
!wget https://bootstrap.pypa.io/pip/2.7/get-pip.py
!python2.7 get-pip.py
!python2.7 -m pip install numpy Theano # theano needs pygpu, however, versions are not supported and compatible anymore. So we use CPU...

Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Hit:2 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:3 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:4 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Get:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Get:6 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Hit:7 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:8 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Get:9 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Hit:10 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Get:11 https://r2u.stat.illinois.edu/ubuntu jammy/main amd64 Packages [2,718 kB]
Get:12 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [1,659 kB]
Get:13 https://r2u.stat.illinois.edu/ubuntu jamm

#### Upload the Baseline model folder and the training and test data from GutBrainie2025

In [3]:
# upload the Baseline folder as zip
uploaded = files.upload()
!unzip BiLSTM-CRF-NER-Baseline.zip -d BiLSTM-CRF-NER-Baseline

Saving BiLSTM-CRF-NER-Baseline.zip to BiLSTM-CRF-NER-Baseline.zip
Archive:  BiLSTM-CRF-NER-Baseline.zip
   creating: BiLSTM-CRF-NER-Baseline/BiLSTM-CRF-NER-Baseline/
  inflating: BiLSTM-CRF-NER-Baseline/__MACOSX/._BiLSTM-CRF-NER-Baseline  
  inflating: BiLSTM-CRF-NER-Baseline/BiLSTM-CRF-NER-Baseline/LICENSE.md  
  inflating: BiLSTM-CRF-NER-Baseline/BiLSTM-CRF-NER-Baseline/nn.pyc  
  inflating: BiLSTM-CRF-NER-Baseline/BiLSTM-CRF-NER-Baseline/optimization.pyc  
  inflating: BiLSTM-CRF-NER-Baseline/BiLSTM-CRF-NER-Baseline/.DS_Store  
  inflating: BiLSTM-CRF-NER-Baseline/__MACOSX/BiLSTM-CRF-NER-Baseline/._.DS_Store  
  inflating: BiLSTM-CRF-NER-Baseline/BiLSTM-CRF-NER-Baseline/get-pip.py  
  inflating: BiLSTM-CRF-NER-Baseline/BiLSTM-CRF-NER-Baseline/loader.pyc  
  inflating: BiLSTM-CRF-NER-Baseline/BiLSTM-CRF-NER-Baseline/nn.py  
   creating: BiLSTM-CRF-NER-Baseline/BiLSTM-CRF-NER-Baseline/dataset/
  inflating: BiLSTM-CRF-NER-Baseline/__MACOSX/BiLSTM-CRF-NER-Baseline/._dataset  
  inflatin

In [4]:
# upload the GutBrainIE2025 dataset
uploaded = files.upload()
!unzip gutbrainie2025.zip -d GutBrainIE2025

Saving gutbrainie2025.zip to gutbrainie2025.zip
Archive:  gutbrainie2025.zip
   creating: GutBrainIE2025/gutbrainie2025/
  inflating: GutBrainIE2025/__MACOSX/._gutbrainie2025  
  inflating: GutBrainIE2025/gutbrainie2025/.DS_Store  
  inflating: GutBrainIE2025/__MACOSX/gutbrainie2025/._.DS_Store  
   creating: GutBrainIE2025/gutbrainie2025/Articles/
  inflating: GutBrainIE2025/__MACOSX/gutbrainie2025/._Articles  
   creating: GutBrainIE2025/gutbrainie2025/Annotations/
  inflating: GutBrainIE2025/__MACOSX/gutbrainie2025/._Annotations  
  inflating: GutBrainIE2025/gutbrainie2025/Articles/.DS_Store  
  inflating: GutBrainIE2025/__MACOSX/gutbrainie2025/Articles/._.DS_Store  
   creating: GutBrainIE2025/gutbrainie2025/Articles/csv_format/
  inflating: GutBrainIE2025/__MACOSX/gutbrainie2025/Articles/._csv_format  
   creating: GutBrainIE2025/gutbrainie2025/Articles/json_format/
  inflating: GutBrainIE2025/__MACOSX/gutbrainie2025/Articles/._json_format  
   creating: GutBrainIE2025/gutbrainie2

In [5]:
%cd /content
!ls -l

/content
total 195364
drwxr-xr-x 4 root root      4096 May  3 08:25 BiLSTM-CRF-NER-Baseline
-rw-r--r-- 1 root root 189783562 May  3 08:25 BiLSTM-CRF-NER-Baseline.zip
-rw-r--r-- 1 root root   1908226 May  2 15:47 get-pip.py
drwxr-xr-x 4 root root      4096 May  3 08:30 GutBrainIE2025
-rw-r--r-- 1 root root   8339467 May  3 08:30 gutbrainie2025.zip
drwxr-xr-x 1 root root      4096 Apr 30 13:37 sample_data


#### Preprocess Training, Validation, and Test data for the model

In [6]:
import os
import json
import random
import re
from pathlib import Path
import nltk
from nltk.tokenize import PunktSentenceTokenizer, TreebankWordTokenizer

nltk.download('punkt')

def read_json_files_from_dir(train_dir):
    combined_data = {}
    # get all json files excluding bronze quality since they are autogenerated
    for json_file in train_dir.glob("*/json_format/*.json"):
        if "bronze" in str(json_file):
            continue
        with open(json_file, "r", encoding="utf-8") as f:
            data = json.load(f)
            # merge the data
            combined_data.update(data)
    return combined_data

def read_json_file(json_path):
    with open(json_path, "r", encoding="utf-8") as f:
        return json.load(f)


def split_train_dev(train_data, dev_size):
    keys = list(train_data.keys())
    random.shuffle(keys)
    keys = sorted(list(train_data.keys()))
    dev_keys = keys[:dev_size]
    train_keys = keys[dev_size:]
    train_split = {k: train_data[k] for k in train_keys}
    dev_split = {k: train_data[k] for k in dev_keys}
    return train_split, dev_split


def get_bio_labels(tokens, entities):
    """
    For a list of tokens and entity annotations, assigns BIO labels based on the entity spans. The entities should have a start_idx, an end_idx, and a label.
    """
    labels = ["O"] * len(tokens)
    for entity in entities:
        start_idx = entity["start_idx"]
        end_idx = entity["end_idx"] + 1  # exclusive
        ent_label = entity["label"]
        if " " in ent_label:
          ent_label = ent_label.replace(" ", "_")

        # assign labels to all tokens that fall inside the entity span.
        for i, (token, t_start, t_end) in enumerate(tokens):
            if t_start >= start_idx and t_end <= end_idx:
                if t_start == start_idx:
                    labels[i] = f"B-{ent_label}"
                else:
                    labels[i] = f"I-{ent_label}"
    return labels

def process_text_field(article, location):
    """
    Processes one text field (e.g. "title" or "abstract") from an article ID. First, the text is tokenized into sentences, and then to words. Then, the token spans are adjusted to absolute offsets, so they can be used for BIO tagging.
    Returns a list of tokens and their labels (as a tuple).

    """
    if "metadata" not in article or not article["metadata"].get(location):
        return []

    text = article["metadata"][location]

    sentence_tokenizer = PunktSentenceTokenizer()
    sentence_spans = list(sentence_tokenizer.span_tokenize(text)) # get sentence spans

    treebank_tokenizer = TreebankWordTokenizer()

    sentences_token_labels = []

    for sent_start, sent_end in sentence_spans: # iterate through the sentences
        sentence_text = text[sent_start:sent_end]
        tokens = []
        # tokenize the sentence and get spans
        for token_start_rel, token_end_rel in treebank_tokenizer.span_tokenize(sentence_text):
            token = sentence_text[token_start_rel:token_end_rel]
            # adjust the relative offsets to the absolute offsets by adding sent_start
            abs_token_start = sent_start + token_start_rel
            abs_token_end = sent_start + token_end_rel
            tokens.append((token, abs_token_start, abs_token_end))

        entities = [ent for ent in article.get("entities", []) if ent.get("location") == location]
        bio_labels = get_bio_labels(tokens, entities)
        token_label_pairs = [(token, label) for (token, _, _), label in zip(tokens, bio_labels)]
        sentences_token_labels.append(token_label_pairs)

    return sentences_token_labels


def write_output(data, output_path):
    """
    Writes tokenized output to a file and makes sure to insert an empty line between each sentence to match the ouput required by Lample.
    """
    with open(output_path, "w", encoding="utf-8") as f:
        for article_id, article in data.items():
            for location in ["title", "abstract"]:
                sentences = process_text_field(article, location)
                for sentence in sentences:
                    for token, label in sentence:
                        f.write(f"{token}\t{label}\n")
                    # add empty line after each sentence
                    f.write("\n")



random.seed(42)

TRAIN_DIR = Path("/content/GutBrainIE2025/gutbrainie2025/Annotations/Train")
PATH_TEST = "GutBrainIE2025/gutbrainie2025/Annotations/Dev/json_format/dev.json"

train_data = read_json_files_from_dir(TRAIN_DIR)
test_data = read_json_file(PATH_TEST)

dev_size = len(test_data) # dev size should be the same as test size
new_train, new_dev = split_train_dev(train_data, dev_size)

write_output(new_train, "train.txt")
write_output(new_dev, "dev.txt")
write_output(test_data, "test.txt")


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


#### Train the model in IOB tag scheme

In [7]:
#files.download("train.txt")
#files.download("dev.txt")
#files.download("test.txt") # sanity check
%cd /content/BiLSTM-CRF-NER-Baseline/BiLSTM-CRF-NER-Baseline/
!python2.7 train.py --train /content/train.txt --dev /content/dev.txt --test /content/test.txt --tag_scheme iob # train the model with IOB tagging scheme

/content/BiLSTM-CRF-NER-Baseline/BiLSTM-CRF-NER-Baseline
Model location: ./models/tag_scheme=iob,lower=False,zeros=False,char_dim=25,char_lstm_dim=25,char_bidirect=True,word_dim=100,word_lstm_dim=100,word_bidirect=True,pre_emb=,all_emb=False,cap_dim=0,crf=True,dropout=0.5,lr_method=sgd-lr_.005
Found 14597 unique words (223177 in total)
Found 125 unique characters
Found 27 unique named entity tags
8478 / 408 / 432 sentences in train / dev / test.
Saving the mappings to disk...
Compiling...
Starting epoch 0...
Score on dev: 48.92000
Score on test: 42.62000
New best score on dev.
Saving model to disk...
New best score on test.
Epoch 0 done. Average cost: 15.376952
Starting epoch 1...
Score on dev: 65.28000
Score on test: 60.09000
New best score on dev.
Saving model to disk...
New best score on test.
Epoch 1 done. Average cost: 8.356228
Starting epoch 2...
Score on dev: 65.64000
Score on test: 65.08000
New best score on dev.
Saving model to disk...
New best score on test.
Epoch 2 done. Ave

In [16]:
%cd /content # rename the default model name so we can use it more easily
!mv 'BiLSTM-CRF-NER-Baseline/BiLSTM-CRF-NER-Baseline/models/tag_scheme=iob,lower=False,zeros=False,char_dim=25,char_lstm_dim=25,char_bidirect=True,word_dim=100,word_lstm_dim=100,word_bidirect=True,pre_emb=,all_emb=False,cap_dim=0,crf=True,dropout=0.5,lr_method=sgd-lr_.005' 'BiLSTM-CRF-NER-Baseline/BiLSTM-CRF-NER-Baseline/models/GutBrainiemodelIOB'

!ls {'BiLSTM-CRF-NER-Baseline/BiLSTM-CRF-NER-Baseline/models/'}

/content
english  GutBrainiemodelIOB


In [17]:
!ls {'BiLSTM-CRF-NER-Baseline/BiLSTM-CRF-NER-Baseline/models/'}

english  GutBrainiemodelIOB


### Tagging entities with the trained model
#### Step 1: Prepare input files for the tagger and subsequently tag them

In [18]:
import os
import json
import subprocess
import nltk
from nltk.tokenize import PunktSentenceTokenizer, TreebankWordTokenizer

nltk.download('punkt')


TEST_JSON_PATH = "GutBrainIE2025/gutbrainie2025/Annotations/Dev/json_format/dev.json"
TAGGER_INPUT_DIR = "tagger_input"
TAGGER_OUTPUT_DIR = "tagger_output"
FINAL_OUTPUT_FILE = "tagged_test.json"

TAGGER_SCRIPT = "BiLSTM-CRF-NER-Baseline/BiLSTM-CRF-NER-Baseline/tagger.py"
TAGGER_MODEL = "BiLSTM-CRF-NER-Baseline/BiLSTM-CRF-NER-Baseline/models/GutBrainiemodelIOB/"


def load_test_data(json_path):
    with open(json_path, "r") as f:
        return json.load(f)

def prepare_tagger_input(test_data, output_dir):
    """
    For each article in test_data, for each location (title and abstract),
    create an input file for the tagger. Each file will have one sentence per line,
    where sentences are tokenized using the TreebankWordTokenizer. Them, it saves the output file in output_dir.
    """
    sentence_tokenizer = PunktSentenceTokenizer()
    treebank_tokenizer = TreebankWordTokenizer()

    for article_id, article in test_data.items():
        for location in ["title", "abstract"]:
            if "metadata" not in article or not article["metadata"].get(location):
                continue
            text = article["metadata"][location] # get the text

            sentence_spans = list(sentence_tokenizer.span_tokenize(text)) # get the sentence spans
            sentences = []
            for start, end in sentence_spans:
                sentence_text = text[start:end]
                tokens = treebank_tokenizer.tokenize(sentence_text) # tokenize sentences
                sentence_line = " ".join(tokens)
                sentences.append(sentence_line)

            # write the sentences to a file
            out_file = os.path.join(output_dir, "{}_{}.txt".format(article_id, location))
            with open(out_file, "w") as f:
                for sentence in sentences:
                    f.write(sentence + "\n")
        #print("Done with preparing the input data for the tagger!")


def run_tagger_on_all_inputs():
    """
    Runs the tagger on all input files in TAGGER_INPUT_DIR and saves outputs in TAGGER_OUTPUT_DIR.
    """
    if not os.path.exists(TAGGER_OUTPUT_DIR):
        os.makedirs(TAGGER_OUTPUT_DIR)

    input_files = [f for f in os.listdir(TAGGER_INPUT_DIR) if f.endswith(".txt")]

    for input_file in input_files:
        input_path = os.path.join(TAGGER_INPUT_DIR, input_file)
        output_path = os.path.join(TAGGER_OUTPUT_DIR, input_file)

        print(f"Processing: {input_file} → {output_path}")

        # run the tagger on the test set to get our predictions
        subprocess.run([
            "python2.7", TAGGER_SCRIPT,
            "--model", TAGGER_MODEL,
            "--input", input_path,
            "--output", output_path
        ], check=True)

    print(f"Done with tagging.")


os.chdir("/content")

test_data = load_test_data(TEST_JSON_PATH)

if not os.path.exists(TAGGER_INPUT_DIR):
    os.makedirs(TAGGER_INPUT_DIR)

prepare_tagger_input(test_data, TAGGER_INPUT_DIR) # prepare the input for the tagger

run_tagger_on_all_inputs() # run tagger on all input files and get output files

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Processing: 37676767_title.txt → tagger_output/37676767_title.txt
Processing: 36978911_title.txt → tagger_output/36978911_title.txt
Processing: 30405455_title.txt → tagger_output/30405455_title.txt
Processing: 37212075_title.txt → tagger_output/37212075_title.txt
Processing: 33968794_abstract.txt → tagger_output/33968794_abstract.txt
Processing: 33963281_abstract.txt → tagger_output/33963281_abstract.txt
Processing: 32999308_title.txt → tagger_output/32999308_title.txt
Processing: 37071196_title.txt → tagger_output/37071196_title.txt
Processing: 36532064_title.txt → tagger_output/36532064_title.txt
Processing: 33777957_title.txt → tagger_output/33777957_title.txt
Processing: 33327540_title.txt → tagger_output/33327540_title.txt
Processing: 32438623_title.txt → tagger_output/32438623_title.txt
Processing: 30309367_abstract.txt → tagger_output/30309367_abstract.txt
Processing: 34290352_abstract.txt → tagger_output/34290352_abstract.txt
Processing: 38613087_title.txt → tagger_output/38613

### Step 2: Transform the tagger output to a json file containing the entities

This json file will be used for evaluation to compare the baseline to the BERT-based models. The file will be in the required submission format for the GutBrainie2025 challenge (https://hereditary.dei.unipd.it/challenges/gutbrainie/2025/).

In [19]:
import os
import json
import nltk
from nltk.tokenize import PunktSentenceTokenizer, TreebankWordTokenizer

nltk.download('punkt')

sentence_tokenizer = PunktSentenceTokenizer()
word_tokenizer = TreebankWordTokenizer()

def get_original_token_spans(text):
    """
    Tokenizes text in the same way as the tagger input is created to ensure proper matching with original text to get the entity spans.
    """
    spans = []

    sentence_spans = list(sentence_tokenizer.span_tokenize(text)) # sentence tokenization

    for sent_start, sent_end in sentence_spans:
        sentence = text[sent_start:sent_end]
        word_spans = list(word_tokenizer.span_tokenize(sentence))  # word tokenization

        for word_start, word_end in word_spans:
            actual_start = sent_start + word_start
            actual_end = sent_start + word_end
            spans.append((text[actual_start:actual_end], actual_start, actual_end))

    return spans

def process_tagger_output_file(article_id, location, original_text, tagger_output_file):
    """
    Process the tagger output, aligning tokens to original text and extracting entities based on BIO tagging.
    """
    original_tokens = get_original_token_spans(original_text) # tokenize and get spans from the original or ground truth text

    # read the tagger outputs and store list of (token, tag)
    tagged_tokens = []
    with open(tagger_output_file, "r") as f:
        lines = f.readlines()


    tagged_text = " ".join(line.strip() for line in lines)  # join lines into a single string (so we can match it better with the original text)
    for pair in tagged_text.split():
        if "__" in pair:
            token, label = pair.rsplit("__", 1) # split the token and its label (e.g. gut_B-microbiome)
            tagged_tokens.append((token, label))
        else:
            print(f"Not valid labeling detected.")


    entities = []
    current_entity = None
    original_idx = 0  # pointer to track original text tokens

    for tagged_token, tag in tagged_tokens:
        # find the corresponding token in the original tokenized list
        while original_idx < len(original_tokens) and original_tokens[original_idx][0] != tagged_token:
            original_idx += 1  # move pointer until we find a match

        if original_idx >= len(original_tokens):
            continue

        orig_token, orig_start, orig_end = original_tokens[original_idx]

        if tag.startswith("B-"):
            # start a new entity (because B is the prefix for the beginning of an enitity)
            if current_entity:
                entities.append(current_entity)  # save the previous entity

            current_entity = {
                "start_idx": orig_start,
                "end_idx": orig_end - 1,  # end offset as inclusive index (in the challenge the index is inclusive)
                "location": location,
                "text_span": orig_token,
                "label": tag[2:]  # remove "B-"
            }

        elif tag.startswith("I-") and current_entity:
            # continue the current entity if it starts with I (Inside)
            current_entity["end_idx"] = orig_end - 1
            current_entity["text_span"] += " " + orig_token

        elif tag == "O" and current_entity:
            # end the current entity
            entities.append(current_entity)
            current_entity = None

        original_idx += 1

    if current_entity:
        entities.append(current_entity)

    return entities

def process_tagger_outputs(test_data, tagger_output_dir, final_output_file):
    """
    Iterates over each article in test_data, reads the corresponding tagger output files
    (for title and abstract each), processes them to extract entity offsets, and writes the
    final results into a JSON file.
    """
    results = {}
    for article_id, article in test_data.items():
        article_entities = []
        for location in ["title", "abstract"]:
            if "metadata" not in article or not article["metadata"].get(location):
                continue
            original_text = article["metadata"][location]
            tagger_file = os.path.join(tagger_output_dir, "{}_{}.txt".format(article_id, location))
            if not os.path.exists(tagger_file):
                print("Tagger output file {} does not exist; skipping.".format(tagger_file))
                continue
            entities = process_tagger_output_file(article_id, location, original_text, tagger_file)
            article_entities.extend(entities)
        results[article_id] = {"entities": article_entities}

    with open(final_output_file, "w") as f:
        json.dump(results, f, indent=2)


process_tagger_outputs(test_data, TAGGER_OUTPUT_DIR, FINAL_OUTPUT_FILE)
files.download("tagged_test.json") # download the final entities in the test set

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

#### Run the evaluation script from the challenge (cf. https://github.com/MMartinelli-hub/GutBrainIE_2025_Baseline)

In [20]:
def replace_underscores_in_labels(data):
    """
    Replaces "_" with " " in the label section of an entity in a json file.
    """
    for article_id, article_data in data.items():
        for entity in article_data.get("entities", []):
            if "label" in entity:
                entity["label"] = entity["label"].replace("_", " ")

    return data

def modify_json_file(input_file, output_file):
    """
    Replaces "_" with " " in a json file and saves the modified json in a specified path.
    """
    with open(input_file, "r") as infile:
        data = json.load(infile)

    updated_data = replace_underscores_in_labels(data)

    with open(output_file, "w") as outfile:
        json.dump(updated_data, outfile, indent=2)
    print(f"Updated JSON saved to {output_file}")


input_file = "tagged_test.json"
output_file = "tagged_test_valid_labels.json"


modify_json_file(input_file, output_file) # replace _ with " " in the entity labels of the json file

# From here the code is taken from the evaluation of the challenge (see https://github.com/MMartinelli-hub/GutBrainIE_2025_Baseline)

PREDICTIONS_PATH_6_1 = "tagged_test_valid_labels.json"

# DEFINE HERE FOR WHICH SUBTASK(S) YOU WANT TO EVAL YOUR PREDICTIONS
eval_6_1_NER = True


GROUND_TRUTH_PATH =  "GutBrainIE2025/gutbrainie2025/Annotations/Dev/json_format/dev.json"

try:
    with open(GROUND_TRUTH_PATH, 'r', encoding='utf-8') as file:
        ground_truth = json.load(file)
except OSError:
    raise OSError(f'Error in opening the specified json file: {GROUND_TRUTH_PATH}')

LEGAL_ENTITY_LABELS = [
    "anatomical location",
    "animal",
    "bacteria",
    "biomedical technique",
    "chemical",
    "DDF",
    "dietary supplement",
    "drug",
    "food",
    "gene",
    "human",
    "microbiome",
    "statistical technique"
]


def eval_submission_6_1_NER(path):
    try:
        with open(path, 'r', encoding='utf-8') as file:
            predictions = json.load(file)
    except OSError:
        raise OSError(f'Error in opening the specified json file: {path}')

    ground_truth_NER = dict()
    count_annotated_entities_per_label = {}

    for pmid, article in ground_truth.items():
        if pmid not in ground_truth_NER:
            ground_truth_NER[pmid] = []
        for entity in article['entities']:
            start_idx = int(entity["start_idx"])
            end_idx = int(entity["end_idx"])
            location = str(entity["location"])
            text_span = str(entity["text_span"])
            label = str(entity["label"])

            entry = (start_idx, end_idx, location, text_span, label)
            ground_truth_NER[pmid].append(entry)

            if label not in count_annotated_entities_per_label:
                count_annotated_entities_per_label[label] = 0
            count_annotated_entities_per_label[label] += 1

    count_predicted_entities_per_label = {label: 0 for label in list(count_annotated_entities_per_label.keys())}
    count_true_positives_per_label = {label: 0 for label in list(count_annotated_entities_per_label.keys())}

    for pmid in predictions.keys():
        try:
            entities = predictions[pmid]['entities']
        except KeyError:
            raise KeyError(f'{pmid} - Not able to find field \"entities\" within article')

        for entity in entities:
            try:
                start_idx = int(entity["start_idx"])
                end_idx = int(entity["end_idx"])
                location = str(entity["location"])
                text_span = str(entity["text_span"])
                label = str(entity["label"])
            except KeyError:
                raise KeyError(f'{pmid} - Not able to find one or more of the expected fields for entity: {entity}')

            if label not in LEGAL_ENTITY_LABELS:
                raise NameError(f'{pmid} - Illegal label {label} for entity: {entity}')

            if label in count_predicted_entities_per_label:
                count_predicted_entities_per_label[label] += 1

            entry = (start_idx, end_idx, location, text_span, label)
            if entry in ground_truth_NER[pmid]:
                count_true_positives_per_label[label] += 1

    count_annotated_entities = sum(count_annotated_entities_per_label[label] for label in list(count_annotated_entities_per_label.keys()))
    count_predicted_entities = sum(count_predicted_entities_per_label[label] for label in list(count_annotated_entities_per_label.keys()))
    count_true_positives = sum(count_true_positives_per_label[label] for label in list(count_annotated_entities_per_label.keys()))

    micro_precision = count_true_positives / (count_predicted_entities + 1e-10)
    micro_recall = count_true_positives / (count_annotated_entities + 1e-10)
    micro_f1 = 2 * ((micro_precision * micro_recall) / (micro_precision + micro_recall + 1e-10))

    precision, recall, f1 = 0, 0, 0
    n = 0
    for label in list(count_annotated_entities_per_label.keys()):
        n += 1
        current_precision = count_true_positives_per_label[label] / (count_predicted_entities_per_label[label] + 1e-10)
        current_recall = count_true_positives_per_label[label] / (count_annotated_entities_per_label[label] + 1e-10)

        precision += current_precision
        recall += current_recall
        f1 += 2 * ((current_precision * current_recall) / (current_precision + current_recall + 1e-10))

    precision = precision / n
    recall = recall / n
    f1 = f1 / n

    return precision, recall, f1, micro_precision, micro_recall, micro_f1



round_to_decimal_position = 4

if eval_6_1_NER:
    precision, recall, f1, micro_precision, micro_recall, micro_f1 = eval_submission_6_1_NER(PREDICTIONS_PATH_6_1)
    print("\n\n=== 6_1_NER ===")
    print(f"Macro-precision: {round(precision, round_to_decimal_position)}")
    print(f"Macro-recall: {round(recall, round_to_decimal_position)}")
    print(f"Macro-F1: {round(f1, round_to_decimal_position)}")
    print(f"Micro-precision: {round(micro_precision, round_to_decimal_position)}")
    print(f"Micro-recall: {round(micro_recall, round_to_decimal_position)}")
    print(f"Micro-F1: {round(micro_f1, round_to_decimal_position)}")

Updated JSON saved to tagged_test_valid_labels.json


=== 6_1_NER ===
Macro-precision: 0.5476
Macro-recall: 0.4655
Macro-F1: 0.4921
Micro-precision: 0.7122
Micro-recall: 0.607
Micro-F1: 0.6554


#### Some sanity checks and downloading the trained model

In [None]:
# have a look at one example tagged file
files.download("tagger_output/28716445_abstract.txt")

In [None]:
ls -l # folder structure. The cwd should be set to "/content".

In [None]:
!ls /content/BiLSTM-CRF-NER-Baseline/BiLSTM-CRF-NER-Baseline/models/GutBrainiemodelIOB

In [None]:
folder_name = './models/'
!zip -r models.zip {folder_name}
shutil.make_archive(folder_name, 'zip', folder_name)
files.download(f'models.zip') # download models