Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-1445: Targeted Sentiment Analysis #1758

Merged
merged 11 commits into from
Jul 15, 2020
Merged

Conversation

whoisjones
Copy link
Member

closes gh-1445: we've added a pretrained tagger model for negation and speculation based on the bioscope data

@alanakbik
Copy link
Collaborator

👍

@alanakbik
Copy link
Collaborator

@whoisjones thanks for adding this!

@alanakbik alanakbik merged commit 17fa344 into master Jul 15, 2020
@nipunsadvilkar
Copy link

@whoisjones : Interesting PR! Was curious to know why you opted for a sequence tagger approach rather than just text classification approach to do aspect-based sentiment analysis?

It will help me understand how you have framed the problem and understand the pros & cons of it. Thanks!

@whoisjones
Copy link
Member Author

@nipunsadvilkar a sequence tagger gives a way more information compared to a text classification. Classification by definition will aggregate information. consider you are having a nested negation and / or speculation in one document. instead of assigning it to a class, we can say that part of the sentence / document is negated oder speculative.

@nipunsadvilkar
Copy link

@whoisjones Thanks for prompt reply! Appreciate it 👍
so let's say we have a sentence :
sent

The patient states of no fever or chills but has asthma and a sore throat that has been going on for 3 months.

Entities/Aspects would be Disease mentions - fever, chills, asthma, sore throat

Negated - fever, chills
Assertive - asthma, sore throat

Approach A) In the text classifier approach, plausible setting would be to label the above sentence with 4 different mentions and train aspect-based text classifier.

Text Entity Label
sent fever Negative
sent chills Negative
sent asthma Assertive
sent sore throat Assertive

Approach B) Sequence Tagger
How would you frame it into a sequence tagger problem? What would be scope for those 4 entities? Does scope also need to be annotated?

from what I see in your repo - whoisjones/BioScopeSequenceLabelingData. You would tag it like following?

Click to see IOB tagging scheme! The O
patient O
states O
of O
no B-NEGATION
fever I-NEGATION
or I-NEGATION
chills I-NEGATION
but O
has O
asthma O
and O
a O
sore O
throat O
that O
has O
been O
going O
on O
for O
3 O
months O
. O

I Agree this scope of negation or speculation becomes tricky to handle. In above example negation for 2 entities is consecutive. How do you handle disjoint spans?

Also, would like to hear your thoughts on Approach A)

@whoisjones
Copy link
Member Author

@nipunsadvilkar sorry for the late reply. here's what I think:
Approach A as you described also needs at least something pretty close to a sequence tagger since you already identified the entities. To my knowledge most of NER downstream task are sequence labeling problems. However if you have identified the entities (in our case diseases) and do it like you descirbed, then you still need information about the dependencies inside the phrase, check out this example here: Aspect Based Analysis.

“I hated their fajitas, but their salads were great” --> {fajitas: negative, salads: positive}

If we would only give labels to this phrase like this, we would learn with a text classifier approach that fajitas in general are bad and salads in general are delicious. However we want to know which item is rated good or bad and this depends on where in the phrase fajitas or salads is positioned. So we would like a sequence labeling problem around it.

Approach B
This corpora we used was annotated by linguistic professionals containing speculations and negations in the medical space. We can't identify with our pretrained tagger what kind of disease it is - for that we would need another tagger exactly as you mentioned. But it can be a lot easier - think about this:

Let our tagger annotate your text whether it is speculation or negation. Then search for all the diseases you would like to know about. This obviously only includes all diseases you are specifying beforehand.
To answer your question: If you want a want general sequence tagger with which you can identify all kinds of diseases, you would need a training set in a similar format as the BioScope dataset to train one.

@whoisjones whoisjones deleted the gh-1445-speculation-model branch February 4, 2021 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Targeted Sentiment Analysis? (or Aspect-Based Sentiment Analysis)
3 participants