In [11]:
import spacy

## Prodigy

In our patents, a lot of named entities are abbreviated.
For example, `WI-FI Direct` is mentioned as `WFD`, or `P2P Group Owner` as `GO`.

Our original model does not recognize these abbreviations and therefore a huge part of the entities are ignored.
So we will fine-tune our model using prodigy

In [14]:
nlp = spacy.load("spacy_output/model-best")

doc = nlp("Wi-Fi Direct (registered trademark, which will be hereinafter referred to as WFD).")

colors = {"TECH": "#F67DE3"}
options = {"colors": colors} 

spacy.displacy.render(doc, style="ent", options=options, jupyter=True)

We create a train dataset for fine-tuning our ner model.

In [1]:
!prodigy ner.correct fine_tune_g06k spacy_output/model-best G06K.txt --loader txt --label TECH

Using 1 label(s): TECH

✨  Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

^C

[38;5;2m✔ Saved 35 annotations to database SQLite[0m
Dataset: fine_tune_g06k
Session ID: 2022-04-20_12-40-25



We export our dataset.

In [6]:
!prodigy db-out fine_tune_g06k terms_g06k

[38;5;2m✔ Exported 35 annotations from 'fine_tune_g06k' in database SQLite[0m
/Users/gaetanserre/Documents/Projects/patent_ner_linking/terms_g06k/fine_tune_g06k.jsonl


Now let's fine-tune our model!

In [10]:
!prodigy train ./prodigy_output/ --ner fine_tune_g06k --base-model spacy_output/model-best --gpu-id 0

[38;5;4mℹ Using CPU[0m
[38;5;4mℹ To switch to GPU 0, use the option: --gpu-id 0[0m
[1m
Traceback (most recent call last):
  File "/home/gaetan/miniconda3/envs/ML/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/gaetan/miniconda3/envs/ML/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/gaetan/miniconda3/envs/ML/lib/python3.9/site-packages/prodigy/__main__.py", line 61, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 329, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "/home/gaetan/miniconda3/envs/ML/lib/python3.9/site-packages/plac_core.py", line 367, in call
    cmd, result = parser.consume(arglist)
  File "/home/gaetan/miniconda3/envs/ML/lib/python3.9/site-packages/plac_core.py", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/home/gaetan/miniconda3/envs/ML/l

Now our refined model should recognize better the abbreviations.

In [None]:
nlp = spacy.load("prodigy_output/model-best")

doc = nlp("Wi-Fi Direct (registered trademark, which will be hereinafter referred to as WFD).")

colors = {"TECH": "#F67DE3"}
options = {"colors": colors} 

spacy.displacy.render(doc, style="ent", options=options, jupyter=True)