GH-1445: Targeted Sentiment Analysis #1758

whoisjones · 2020-07-15T08:16:20Z

closes gh-1445: we've added a pretrained tagger model for negation and speculation based on the bioscope data

…-speculation-model

alanakbik · 2020-07-15T11:51:49Z

👍

alanakbik · 2020-07-15T11:52:02Z

@whoisjones thanks for adding this!

nipunsadvilkar · 2020-07-21T10:04:25Z

@whoisjones : Interesting PR! Was curious to know why you opted for a sequence tagger approach rather than just text classification approach to do aspect-based sentiment analysis?

It will help me understand how you have framed the problem and understand the pros & cons of it. Thanks!

whoisjones · 2020-07-21T13:24:25Z

@nipunsadvilkar a sequence tagger gives a way more information compared to a text classification. Classification by definition will aggregate information. consider you are having a nested negation and / or speculation in one document. instead of assigning it to a class, we can say that part of the sentence / document is negated oder speculative.

nipunsadvilkar · 2020-07-21T14:04:00Z

@whoisjones Thanks for prompt reply! Appreciate it 👍
so let's say we have a sentence :
sent

The patient states of no fever or chills but has asthma and a sore throat that has been going on for 3 months.

Entities/Aspects would be Disease mentions - fever, chills, asthma, sore throat

Negated - fever, chills
Assertive - asthma, sore throat

Approach A) In the text classifier approach, plausible setting would be to label the above sentence with 4 different mentions and train aspect-based text classifier.

Text	Entity	Label
sent	fever	Negative
sent	chills	Negative
sent	asthma	Assertive
sent	sore throat	Assertive

Approach B) Sequence Tagger
How would you frame it into a sequence tagger problem? What would be scope for those 4 entities? Does scope also need to be annotated?

from what I see in your repo - whoisjones/BioScopeSequenceLabelingData. You would tag it like following?

Click to see IOB tagging scheme!

The O
patient O
states O
of O
no B-NEGATION
fever I-NEGATION
or I-NEGATION
chills I-NEGATION
but O
has O
asthma O
and O
a O
sore O
throat O
that O
has O
been O
going O
on O
for O
3 O
months O
. O

I Agree this scope of negation or speculation becomes tricky to handle. In above example negation for 2 entities is consecutive. How do you handle disjoint spans?

Also, would like to hear your thoughts on Approach A)

whoisjones · 2020-08-13T14:42:16Z

@nipunsadvilkar sorry for the late reply. here's what I think:
Approach A as you described also needs at least something pretty close to a sequence tagger since you already identified the entities. To my knowledge most of NER downstream task are sequence labeling problems. However if you have identified the entities (in our case diseases) and do it like you descirbed, then you still need information about the dependencies inside the phrase, check out this example here: Aspect Based Analysis.

“I hated their fajitas, but their salads were great” --> {fajitas: negative, salads: positive}

If we would only give labels to this phrase like this, we would learn with a text classifier approach that fajitas in general are bad and salads in general are delicious. However we want to know which item is rated good or bad and this depends on where in the phrase fajitas or salads is positioned. So we would like a sequence labeling problem around it.

Approach B
This corpora we used was annotated by linguistic professionals containing speculations and negations in the medical space. We can't identify with our pretrained tagger what kind of disease it is - for that we would need another tagger exactly as you mentioned. But it can be a lot easier - think about this:

Let our tagger annotate your text whether it is speculation or negation. Then search for all the diseases you would like to know about. This obviously only includes all diseases you are specifying beforehand.
To answer your question: If you want a want general sequence tagger with which you can identify all kinds of diseases, you would need a training set in a similar format as the BioScope dataset to train one.

whoisjones added 11 commits July 8, 2020 19:56

start training for tagger

a5a5c79

start training for tagger

d48d8b1

hyperparameter changes

23a9ddc

test

1ba75e8

hyperparameter run

67cd57d

hyperparameter run for 32 batch_size

5e18922

hyperparameter run for 64 batch_size

5434936

hyperparameter run for 64 batch_size

966fb0d

Merge branch 'master' of https://github.com/flairNLP/flair into gh-1445…

5e406ce

…-speculation-model

model and corpus added. model needs to be put on server.

eb4738c

changed path to server. added model in docs.

d1ace57

alanakbik merged commit 17fa344 into master Jul 15, 2020

whoisjones deleted the gh-1445-speculation-model branch February 4, 2021 19:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-1445: Targeted Sentiment Analysis #1758

GH-1445: Targeted Sentiment Analysis #1758

whoisjones commented Jul 15, 2020

alanakbik commented Jul 15, 2020

alanakbik commented Jul 15, 2020

nipunsadvilkar commented Jul 21, 2020

whoisjones commented Jul 21, 2020

nipunsadvilkar commented Jul 21, 2020

whoisjones commented Aug 13, 2020

GH-1445: Targeted Sentiment Analysis #1758

GH-1445: Targeted Sentiment Analysis #1758

Conversation

whoisjones commented Jul 15, 2020

alanakbik commented Jul 15, 2020

alanakbik commented Jul 15, 2020

nipunsadvilkar commented Jul 21, 2020

whoisjones commented Jul 21, 2020

nipunsadvilkar commented Jul 21, 2020

whoisjones commented Aug 13, 2020