# Named Entity Recognition Model

In [1]:
import spacy

#### txt file to convert spacy file for train model

In [2]:
!python -m spacy convert "train.txt" spacyNER_data -c ner

[38;5;4mℹ Auto-detected token-per-line NER format[0m
[38;5;4mℹ Grouping every 1 sentences into a document.[0m
[38;5;3m⚠ To generate better training data, you may want to group sentences
into documents with `-n 10`.[0m
[38;5;2m✔ Generated output file (100 documents): spacyNER_data/train.spacy[0m


In [3]:
!python -m spacy convert "validation.txt" spacyNER_data -c ner

[38;5;4mℹ Auto-detected token-per-line NER format[0m
[38;5;4mℹ Grouping every 1 sentences into a document.[0m
[38;5;3m⚠ To generate better training data, you may want to group sentences
into documents with `-n 10`.[0m
[38;5;2m✔ Generated output file (31 documents):
spacyNER_data/validation.spacy[0m


In [4]:
!python -m spacy convert "test.txt" spacyNER_data -c ner

[38;5;4mℹ Auto-detected token-per-line NER format[0m
[38;5;4mℹ Grouping every 1 sentences into a document.[0m
[38;5;3m⚠ To generate better training data, you may want to group sentences
into documents with `-n 10`.[0m
[38;5;2m✔ Generated output file (50 documents): spacyNER_data/test.spacy[0m


#### python -m spacy download en_core_web_lg

In [5]:
nlp = spacy.load('en_core_web_lg')

#### Create config file
#### Reference https://spacy.io/usage/training#quickstart

In [6]:
!python -m spacy init fill-config base_config.cfg config.cfg

[38;5;2m✔ Auto-filled config with all values[0m
[38;5;2m✔ Saved config[0m
config.cfg
You can now add your data and train your pipeline:
python -m spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy


### Train our model

In [7]:
!python -m spacy train config.cfg --output output --paths.train spacyNER_data/train.spacy --paths.dev spacyNER_data/validation.spacy

[38;5;4mℹ Saving to output directory: output[0m
[38;5;4mℹ Using CPU[0m
[1m
[2022-02-25 10:30:47,005] [INFO] Set up nlp object from config
[2022-02-25 10:30:47,016] [INFO] Pipeline: ['tok2vec', 'ner']
[2022-02-25 10:30:47,019] [INFO] Created vocabulary
[2022-02-25 10:30:48,166] [INFO] Added vectors: en_core_web_lg
[2022-02-25 10:30:49,923] [INFO] Finished initializing nlp object
[2022-02-25 10:30:50,164] [INFO] Initialized pipeline components: ['tok2vec', 'ner']
[38;5;2m✔ Initialized pipeline[0m
[1m
[38;5;4mℹ Pipeline: ['tok2vec', 'ner'][0m
[38;5;4mℹ Initial learn rate: 0.001[0m
E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
---  ------  ------------  --------  ------  ------  ------  ------
  0       0          0.00     58.90    0.00    0.00    0.00    0.00
 28     200         10.80   1131.35   46.99   60.00   38.61    0.47
 63     400         25.66     87.45   45.45   66.04   34.65    0.45
106     600         20.25     42.59   50.32   72.22   38.61    

### Evaluate model 

In [9]:
!python -m spacy evaluate output/model-best spacyNER_data/validation.spacy -dp result_val

[38;5;4mℹ Using CPU[0m
[1m

TOK     -    
NER P   69.49
NER R   40.59
NER F   51.25
SPEED   3960 

[1m

           P       R       F
CRP    75.00   36.73   49.32
PRD    65.71   46.94   54.76
y       0.00    0.00    0.00
CRPO    0.00    0.00    0.00

[38;5;2m✔ Generated 25 parses as HTML[0m
result_val
