Skip to content

A sequence to sequence NLP model made by custom training Spacy's NER pipeline

Notifications You must be signed in to change notification settings

heisenberg1804/Sentiment-Extraction-from-tweets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Sentiment-Extraction-from-tweets

Dataset(From Kaggle Competition): Given twitter dataset has 4 columns textID, text, sentiment, selected_text.

Objective: To extract support phrases from the text (tweets) according to the given sentiment of the tweet and check accuracy with the target variable i.e selected_text

My work: Performed text preprocessing and EDA on the twitter dataset which included: 

  1. Cleaning the text tweets by removing punctuations, numbers, STOPWORDS, http links and eventually tokenized the text.
  2. Counted the common words and unique words in the raw text for each sentiment specifically.
  3. Created wordclouds , treemaps, donut, funnel and kde plots for visualization purposes.
  4. Used Jaccard similarity to check the similarity between the text and selected text.

Creating the model: I treated the problem as a Named Entitiy Recognition task. I used the Spacy NER model pipeline. Treated the selected text as an Entity and custom trained the model for 30 iterations.

Created 2 models for for positve and negative sentiment each. Didn't make model for neutral sentiment as from EDA it was identified that neutral text and selected text had jacc_sim ~ 1.

Evaluation : For evaluation i used Jaccard Score as the metric . The model achieved 0.62 and 0.60 on the training and test set respectively.

This text sequence to sequence problem can be treated as Q-A problem or regression problem also and that is the future scope of this project

About

A sequence to sequence NLP model made by custom training Spacy's NER pipeline

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages