In [None]:
!pip install reconner

# Introduction

This notebook walks through some of the basic use cases of Recon.

In [1]:
from pprint import pprint
from recon import Dataset, get_ner_stats

## Loading Data

In [2]:
ds = Dataset("train", verbose=True).from_disk('./data/skills')
print(ds)

<recon.dataset.Dataset object at 0x7faa42ca0670>


In [3]:
ds.summary()

Dataset
Name: train
Stats: {
    "n_examples": 106,
    "n_examples_no_entities": 29,
    "n_annotations": 243,
    "n_annotations_per_type": {
        "SKILL": 197,
        "PRODUCT": 33,
        "JOB_ROLE": 10,
        "skill": 2,
        "product": 1
    },
    "examples_with_type": null
}


## Applying Dataset Operations using `Dataset.apply`

If we run `get_ner_stats` on the data in our Dataset, we see the same stats that are printed above. The `Dataset` `__str__` runs the `get_ner_stats` function internally.

In [4]:
print(ds.apply(get_ner_stats))

{
    "n_examples": 106,
    "n_examples_no_entities": 29,
    "n_annotations": 243,
    "n_annotations_per_type": {
        "SKILL": 197,
        "PRODUCT": 33,
        "JOB_ROLE": 10,
        "skill": 2,
        "product": 1
    },
    "examples_with_type": null
}


## Make in-place Dataset Corrections using `Dataset.apply_`
It looks like we have a few instances where we had a lowercase label (maybe from some old annotations). Let's apply an operation on the dataset and modify it in-place, converting "skill" => "SKILL" and "product" -> "PRODUCT" in our examples

In [5]:
ds.apply_("recon.v1.upcase_labels")

=> Applying operation 'recon.v1.upcase_labels' to dataset 'train'


100%|███████████████████████████████████████| 106/106 [00:00<00:00, 1619.08it/s]

[38;5;2m✔ Completed operation 'recon.v1.upcase_labels'[0m





In [6]:
ds.summary()

Dataset
Name: train
Stats: {
    "n_examples": 106,
    "n_examples_no_entities": 29,
    "n_annotations": 243,
    "n_annotations_per_type": {
        "SKILL": 199,
        "PRODUCT": 34,
        "JOB_ROLE": 10
    },
    "examples_with_type": null
}


## Conclusion

If we look again at our Dataset summary, we can see that all NER annotations with the lowercase labels have been corrected. We can now save our data and use the corrected data to train a new model. This correction is really simple and if you have a consistent annotation process, one you might not need to use very often. In later notebooks you'll see examples of more advanced corrections