Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Pre-Tagging information in Sequence-Tagging? #3416

Open
raykyn opened this issue Mar 4, 2024 · 4 comments
Open

[Question]: Pre-Tagging information in Sequence-Tagging? #3416

raykyn opened this issue Mar 4, 2024 · 4 comments
Labels
question Further information is requested

Comments

@raykyn
Copy link

raykyn commented Mar 4, 2024

Question

My specific use case:
I'm trying to solve an event-extraction task which I model as a sequence-tagging problem. So this event-tagger should be able to identify the event trigger, actors and objects in a given span. Now, I've got pretty reliable NER-tags which I would like as a additional information for my event-tagger to use as information (as for example, only PER and ORGs may be actors).

Is there a best practice to use this information? I'm thinking the easy way would be to put annotations inside the train/dev/test data so they can be part of the encoding? Is there a known best way to write these anntotations?

Barack Obama went to New York

[PER] Barack Obama [/PER] went to [LOC] New York [/LOC]

Barack [B-PER] Obama [I-PER] went to New [B-LOC] York [I-LOC]

I guess I'm asking less for a technical solution and more if there is an established way to do this or at least some experience?

(also to the devs: thank you for this awesome framework and the char-based embeddings, pretty much none of my research would be possible without Flair)

@raykyn raykyn added the question Further information is requested label Mar 4, 2024
@helpmefindaname
Copy link
Collaborator

Hi @raykyn

I am sorry for late response.
If you question is still up, I would suggest you look at the how to load a ColumnCorpus tutorial.

@raykyn
Copy link
Author

raykyn commented Apr 2, 2024

Hi, thank you for the reponse! I'm not sure what you refer to in that tutorial? Unless you mean the format in which the annotation is printed with the to_tagged_string-Function. I guess that would be the best practice to present the annotations, but I'm not sure if that'd transfer to being the best way to add the annotations to the input.

@zrjohnnyl
Copy link

zrjohnnyl commented Apr 4, 2024

Hi @raykyn, another solution you can consider is training two separate models using multitask learning and have the shared embedding do auto feature engineering since the embeddings will create features for tagging both event triggers, actors and objects as well as PER and ORGs. The model will learn something is tagged as PER or ORG will more likely be actor as well. This will probably improve the model performance of your main task without having add the tags. Another benefit to this approach is when you run your model at inference time you don't need to run the another model.

@raykyn
Copy link
Author

raykyn commented Apr 4, 2024

Thank you very much for this input, I actually completely missed that there is support for multitask learning!
Due to the nature of my task, I'll probably still need to run two models in sequence to each other (basically my documents do not have multiple sentences, instead I need to detect certain events inside entity mentions and do a kind-of syntactic analysis with flair), so I guess my question about the best practice to use pre-tagging information is still open. But the multitask learning is great to know and something I'll test for sure!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants