Sentiment Analysis

Train a sentiment classifier using a sentiment analysis dataset in order to interpret a text input and identify whether the text is positive or negative.
For this, two datasets obtained from Kaggle were used:

Sentiment140: https://www.kaggle.com/kazanova/sentiment140
Contains 1.6M de tweets rated: 0 e 4
The following fields will be used:
- target: the polarity of the tweet (0 = negative, 4 = positive)
- text: the text of review
IMDB movie reviews: https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
Contains 50K of IMDB movie ratings rated: positive e negative/i>
The following fields will be used:

review: the text of the review

sentiment: the polarity of the review (positive ou negative)

The Sentiment140 dataset was used to train the model, for this, the vectorization of the data to an array of numbers with the Sklearn CountVectorizer word frequency counter was performed in the logistic regression which, despite being a simple model, presented a satisfactory result, this can be verified with the 2x2 confusion matrix.
The IMDB movie reviews dataset was used to test whether the model was able to rank the reviews satisfactorily.

Running the application:
Docker

docker build -t sentiment-analysis .

docker run --rm -ti sentiment-analysis

Execution results

Loading dataset... target id date flag user text 0 0 1467810369 Mon Apr 06 22:19:45 PDT 2009 NO_QUERY _TheSpecialOne_ @switchfoot http://twitpic.com/2y1zl - Awww, t... 1 0 1467810672 Mon Apr 06 22:19:49 PDT 2009 NO_QUERY scotthamilton is upset that he can't update his Facebook by ... 2 0 1467810917 Mon Apr 06 22:19:53 PDT 2009 NO_QUERY mattycus @Kenichan I dived many times for the ball. Man... 3 0 1467811184 Mon Apr 06 22:19:57 PDT 2009 NO_QUERY ElleCTF my whole body feels itchy and like its on fire 4 0 1467811193 Mon Apr 06 22:19:57 PDT 2009 NO_QUERY Karoli @nationwideclass no, it's not behaving at all.... Vectorizing... tweet: @mandelak I guess my local printer is lousy. sentiment: 0 Freq. in dataset|Freq. in current tweet (0, 234226) 1 (0, 267751) 1 (0, 336267) 1 (0, 339499) 1 (0, 350289) 1 (0, 388197) 1 (0, 441594) 1 Training... /usr/local/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:814: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( Accuracy: 0.8005916666666667 Generating confusion matrix... 0 4 0 0.808434 0.206907 4 0.191566 0.793093 text target 1 One of the other reviewers has mentioned that ... positive 2 A wonderful little production. <br /><br />The... positive 3 I thought this was a wonderful way to spend ti... positive 4 Basically there's a family where a little boy ... negative 5 Petter Mattei's "Love in the Time of Money" is... positive text target 1 One of the other reviewers has mentioned that ... 4 2 A wonderful little production. <br /><br />The... 4 3 I thought this was a wonderful way to spend ti... 4 4 Basically there's a family where a little boy ... 0 5 Petter Mattei's "Love in the Time of Money" is... 4 Accuracy: 0.846171875

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
Dockerfile		Dockerfile
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis

Docker

Execution results

About

Releases

Packages

Languages

erickfunier/Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis

Docker

Execution results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages