Multilayer ner #1289

AngledLuffa · 2023-09-30T02:53:08Z

Add an ability to label text with multiple types of NER tags using one classifier. The idea is that a model can be trained to do both at the same time, and the information from both datasets should work together to make the overall model better. The learning from the second dataset can help the model generalize, even if it isn't the same tagset as the first dataset.

This will let us do something like cross-train the same model on different datasets, such as OntoNotes and CoNLL at the same time, or OntoNotes and the 8 class WorldWide dataset.

In the training data for a mixed dataset, each word now has an entry for "multi_ner" which can support more than one NER tag. Tags which aren't present for a sentence can be blank. In the case of the OntoNotes & WorldWide mixed dataset, for example, text from the WorldWide dataset has its 8 class tag and a blank tag, and text from the OntoNotes dataset has the original 18 class tag and a downscaled version of the 8 class tag.

There are two options for implementing this: one in which there is the original LSTM encoder, followed by a unique Linear for each tag class and a corresponding CRFLoss, and one in which the output of one of the Linears goes back into the input of the next output layer.

Old models are maintained by converting the original version of the tensors to the new tensors. Old datasets with one NER tagset are converted when loaded at training time. Therefore, no need to do anything to the existing models or datasets.

Results of running the OntoNotes model, with charlm but not transformer, on the OntoNotes and WorldWide test sets:

original ontonotes on worldwide:   88.71  69.29
simplify-separate                  88.24  75.75
simplify-connected                 88.32  75.47

Here, "simplify" means the 18 class OntoNotes model is converted to 8 classes, then that data is combined with the WorldWide data as the training data for the second output layer

This will allow representing multiple layers of tags in the same model in the vocab. Currently there is only one layer supported, though. Existing models may have TagVocab, including models created by users, so the model loading function converts them. The model could potentially get multiple layers of tags if the data returns multiple layers. For now we only handle the top layer (using indexing instead of squeeze so that it functions by ignoring later layers). Ultimately we will need to iterate

…o update several utility methods to make it work

Includes testing of a two column version If tuples are passed in, tuples are returned If process_tags receives a single column of tags as non-tuples, return a single column of tags instead of returning tuples. Primary use case is in the scripts which score flair or spacy on the WW dataset Includes error checking for the conversion from string to tuple of string

…n the NER data.py Later we will have the model use EMPTY to signify that this particular tag should be masked out, rather than having it learn to predict EMPTY

multiple tags, as the scorer doesn't support multiples and the model itself is only one layer. However, this is an important intermediate step Temporarily (?) use the output for just the first tag

…hose datasets don't align with the new training data

…ocab and multiple layers of tags from the dataset

single layer. This is the big money change - the model can now train with two output heads after adding these new layers. Old models would be incompatible with this format, but the loading code updates the tensors to the new format. Iterating over the lists in predict, unmapping all the tags, and then discarding tags after the first column allows it to successfully do something with a multi-entry data file Use EMPTY tags to mask out words where we don't want the NER model to learn anything about the tags Includes a basic test that training with two types of tags works Includes a check that two tag_clfs are both changing when backpropping Verify that the masking of empty tags means those tags aren't being trained. We do this with a unittest that turns off all of the tags in one tagset and then finetunes the model on that tagset

Add a test that the --connect_output_layers feature doesn't crash and actually connects the output layers

…agset to an NER model. Also, add ner_predict_tagset as an option to the Pipeline. This will allow the Pipeline to choose a different tagset for a multi-headed tagset

…ly OOV - EMPTY in particular is treated by the CompositeVocab as 'leave this blank'. The better fix would be to remove those states from the output layer entirely

AngledLuffa force-pushed the multilayer_ner branch 10 times, most recently from 58e8761 to f7c756b Compare October 2, 2023 16:02

AngledLuffa force-pushed the dev branch from 1060d7b to 1ff91fa Compare October 2, 2023 20:57

AngledLuffa force-pushed the multilayer_ner branch 11 times, most recently from f4b6590 to cc7f10d Compare October 3, 2023 03:54

AngledLuffa force-pushed the dev branch from 8f50218 to cba70dc Compare October 3, 2023 04:15

AngledLuffa force-pushed the multilayer_ner branch from cc7f10d to d4e9f13 Compare October 3, 2023 04:16

AngledLuffa added 6 commits October 2, 2023 21:17

Initial step of reading MULTI_NER instead of NER if available. Need t…

2bd92d9

…o update several utility methods to make it work

Turn None, -, _ into EMPTY when preprocessing training or test data i…

8f7e17f

…n the NER data.py Later we will have the model use EMPTY to signify that this particular tag should be masked out, rather than having it learn to predict EMPTY

Store lists of tags in the data object. This still doesn't support

a3f337b

multiple tags, as the scorer doesn't support multiples and the model itself is only one layer. However, this is an important intermediate step Temporarily (?) use the output for just the first tag

Add a note on something we need to do - check for finetuning models w…

b50a08a

…hose datasets don't align with the new training data

AngledLuffa added 6 commits October 2, 2023 21:17

Add a doc line for how many columns of training tags we have

a5d2788

Refactor warn_missing_tags to handle multiple layers of tags in the v…

1a11512

…ocab and multiple layers of tags from the dataset

Add an optional connection from one output layer to the next

64b4e1c

Add a test that the --connect_output_layers feature doesn't crash and actually connects the output layers

Add an option to use a different tagset as the dev tagset / predict t…

d06ba7d

…agset to an NER model. Also, add ner_predict_tagset as an option to the Pipeline. This will allow the Pipeline to choose a different tagset for a multi-headed tagset

Janky patch to avoid errors from predicting states which are effetive…

d4e9f13

…ly OOV - EMPTY in particular is treated by the CompositeVocab as 'leave this blank'. The better fix would be to remove those states from the output layer entirely

AngledLuffa merged commit 04bfef9 into dev Oct 3, 2023
1 check passed

AngledLuffa deleted the multilayer_ner branch October 3, 2023 04:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multilayer ner #1289

Multilayer ner #1289

AngledLuffa commented Sep 30, 2023 •

edited

Multilayer ner #1289

Multilayer ner #1289

Conversation

AngledLuffa commented Sep 30, 2023 • edited

AngledLuffa commented Sep 30, 2023 •

edited