Twitter Experiment

What about classify sentiments on Twitter sample data?

ATTENTION

As the name says, it's experimental. Please, don't be a fool, ths is not to be used on production environment.

Copus Generation

Twitter Api Terms does not allow to share or resyndicate Twitter content, cause of that I will not do it.

However, its possible to generate a script to create a corpus, and i did that. The corpus generator uses Twitter stream. This script is composed of two parts, but before it you need to configure your environment:

Use Twitter Streaming API to download tweets.

foreman run forest_consume

That will consume Twitter Sample Stream and save on a MongoDB database. Trainable tweets will be flagged. It will never finish, you need to decide how big you wnat your corpus, and when you decided is enough, simple stop it.

To detect trainable_tweets I simple look to emoticons. If tweet has a happy or a sad emoticon, it's trainable tweet. This idea was not mine, I found it on 'Twitter as a Corpus for Sentiment Analysis and Opinion Mining' (A Pak, P Paroubek - LREC, 2010).

After that you neet to train the classifier.

foreman run forest_train

that will generate a folder bayes_data with yout train.

Configuration

Twitter Experiment need to authenticate on Twitter developers, because of that you need to export some variables. To handle that we use dotenv. So all you need to do is:

Copy env.sample to .env.

cp config/env.sample .env

Edit .env with your own keys

The Script saves Twitter data on MongoDB so you need to configure it.

Copy mongoid.sample to mongoid.yml

cp config/mongoid.sample mongoid.yml

Edit your config/mongoid.yml with your mongo variables.

Results

To validate the experiment, I created some statistics. For that:

I found a set of 4662 tweets.
Split them in 90% + 10%.
Trainned those 90% on Naive Bayes Classifier.
Classified those other 10% using the trainned classfier.

After that i got these results:

Index	Grade
F1-Score	0.387479175558645
Accuracy	0.775160599571734
Recall	0.774870646948735
Precision	0.775224132863021
Matthews correlation	0.550321199143469

To reexecute the statistics you can do

foreman run forest_statistics

TODO

Handle negations by attaching negation particle

Eg.: I do not like fish: I do+not, do+not like, not+like fish

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
app		app
bin		bin
config		config
lib		lib
spec		spec
.gitignore		.gitignore
.ruby-version		.ruby-version
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
Procfile		Procfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app

app

bin

bin

config

config

lib

lib

spec

spec

.gitignore

.gitignore

.ruby-version

.ruby-version

Gemfile

Gemfile

Gemfile.lock

Gemfile.lock

Procfile

Procfile

README.md

README.md

Repository files navigation

Twitter Experiment

ATTENTION

Copus Generation

Configuration

Results

TODO

About

Releases

Packages

Languages

stupied4ever/twitter-experiment

Folders and files

Latest commit

History

Repository files navigation

Twitter Experiment

ATTENTION

Copus Generation

Configuration

Results

TODO

About

Resources

Stars

Watchers

Forks

Languages