Skip to content

langwu616/Sentiment-Analysis-of-Tweet-Data-Using-BERT

Repository files navigation

Arthor:

Lang Wu

Instruction for use:

Run the following python file. The project description is described by Saif et al.(2008).

Sentiment-Analysis-of-Tweet-Data-Using-BERT

This project is about using the sentiment analysis of the tweet data by using the Bidirectional Encoder Representations from Transformers (BERT). BERT is a deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus. Context-free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, where BERT take into account the context for each occurrence of a given word. Tweete sentiment, which includes an array of subtasks on inferring the affectual state of a person from their tweet. Only the English tweets are modeled in this project, although it can be easily extendes for other languages that described by Saif et al.(2008). This method outperforms any other state of the art techniques for this datasets. However, stacking a multilayer BERT can furthur improve the performance, which will be considered for the future task.

##Steps to the modeling:

  • The code can be run on Goggle Colab, where free memory and TPU can be used. The datasets include:train, validation and test data. Also, the BERT model of large uncase is run on Goggle Colab TPU mode.
  • The datastes are tokenized and padded by bert hub model and ultimately converted into desired features for the bert model to be used.
  • The number of layer in ther bert is fine tunned which preceded by a sigmoid layer (perhaps some more Relu layers in the middle) as the output layer. Overfitting is reduced by Early Stopping on the validation dataset.
  • The best model is saved and later used to predict on the test dataset.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages