# Training Module

This module contains everything you need to train a spaCy NER model.

## Functions

### train_new_model

Build a new blank spacy model and trains it with the entities provided <br>

Parameters: <br>

- __train_data:__ `list` <br>
The data required to train the model <br>

- __language:__ `str, optional` <br>
The language of the model you want to train, by default 'es' <br>

- __epochs:__ `int, optional` <br>
The number of times you want to show the data to the model. If set to None, will iterate 300 times or will cut when it finds the best possible model, given the hyper-parameters. By default None. <br>

- __target_gradient:__ `int, optional` <br>
The expected level of the gradient to finish it's training , by default None. <br>

- __dropout_rate:__ `float, optional` <br>
How much of the data learned you want to force to throw each iteration to avoid overffiting, by default 0.1 <br>

- __success_threshold:__ `float, optional` <br>
A percentage of expected minimization of the gradient, by default 0.9 <br>

- __loss_tolerance:__ `[type], optional` <br>
A threshold to avoid catastrophic forgetting, by default None. <br>

- __target_device:__ `str, optional` <br>
Whether to train on cpu or gpu, if avaliable, by default `cpu` <br>


Returns: <br>

- __`spacy.lang.es.Spanish`__ <br>
A trained model capable to recognize the target entities to a certain extent





#### Examples

In [18]:
# Import 
from nlptools import training as train
from nlptools.data_augmentation import TaggedDoc

Loading the training data from a pickle file

In [2]:
import pickle
with open('../src/nlptools/data/estatutos/tagged/spacy_dataset_2020-5-6.pkl', 'rb') as f:
    data = pickle.load(f)

In [3]:
docs = [TaggedDoc(doc)._get_spacy_entities() for doc in data.values()]

In [19]:
model = train.train_new_model(train_data=docs,language='es',epochs=300,target_gradient=None,dropout_rate=0.1,
                              success_threshold=0.9,loss_tolerance=None,target_device='gpu')

HBox(children=(FloatProgress(value=0.0, max=300.0), HTML(value='')))

Total time 0:10:50.768780


Now that we already trained the model it's time to make some predictions <br>

First we need data, so we picked the first document from our train dataset <br>

__Disclaimer__:  _We are aware that this is considered __overfitting__ but this is only for educational purposes_

In [20]:
doc_0 = model(docs[0][0])

Do you believe in magic? Prepare for some real magic tricks...

In [21]:
from spacy import displacy

displacy.render(doc_0,style='ent')

### create_blank_ner

Creates a new nlp model with only one object in pipeline, called ner.

Parameters: <br>

- __training_data:__ `[type]` <br>
The data required to train the model. <br>

- __language:__ `str, optional` <br>
The language of the model you want to train, by default `es`. <br>

Returns: <br>

- __`spacy.lang.es.Spanish`__ <br>
An spacy object to process text to extraxt entities.