Tweet Emotion Recognition

This repository contains the files to build a model to evaluate the emotions from tweets.

Setup

Ensure you have the right versions of the dependencies. Run:
```
pip install -r requirements.txt
```
Create a .env file using the .env.sample:
```
cp .env.sample .env
```
Set the variables at the .env file

Data Fetch

This module is responsible for the tweets gathering process.

get_tweets

python -m data_fetch.get_tweets [QUERY] -s <SAVE_DIR> -m <MAX_REQUESTS> -c <COUNT> -n

Call the Twitter's API and search for tweets w/ the given QUERY.

Arguments

query: String. Search parameter.
-s | --save_dir: String. Path to the directory where the data will be saved.
-m | --max_requests: Int. Max number of requests to the API. Default: 10.
-c | --count: Int. Max number of tweets for each request. Default: 100.
-n | -newer: Boolean. Collect the newer tweets. Default: True.

Returns

A DataFrame w/ the collected data when a SAVE_DIR is provided. Otherwise, None.

concat_datasets

python -m data_fetch.concat_datasets [QUERY] -s <SAVE_DIR> -m <MAX_REQUESTS> -c <COUNT> -n

Create a new file w/ the datasets containing the given QUERY.

Arguments

query: String. Search parameter.
dataset_dir: String. Path to the directory where the files are saved.

kmeans_data_filter

python -m data_fetch.kmeans_data_filter [FILE] -o <OUTPUT_NAME> -od <OUTPUT_DIR_NAME>
                                        -mf <MAX_FEATURES> -n <N_CLUSTERS>
                                        -ni <N_INIT> -mi <MAX_ITER>
                                        -r <RANDOM_STATE>

Filter the data from a file, based on the result of a clustering process using the K-means algorithm.

Arguments

file: String. Path to the file that will be processed.
-o | --output_name: String. Name of the output file, including the extension.
-od | --output_dir_name: String. Name of the directory where the output file will be placed. Default: datasets/kmeans.
-mf | --max_features: Int. Max number of features to use for the TF-IDF scoring. Default: 8000.
-n | --n_clusters: Int. Number of clusters to create. Default: 4.
-ni | --n_init: Int. Number of time the k-means algorithm will be run with different centroid seeds. Default: 10.
-mi | --max_iter: Int. Maximum number of iterations of the k-means algorithm for a single run. Default: 300.
-r | --random_state: Int. Determines random number generation for centroid initialization. Default: 20.

Sentiment Analysis

This module is responsible for filtering the tweets, based the emotion associated w/ the query and the predicted sentiment.

train

python -m sentiment_analysis.train [MODEL_TYPE] [DATASET_PATH] [TOKENIZER_PATH]
                                   [SAVE_DIR] -l <LABEL_COL> -t <TEXT_COL>
                                   -v <VALIDATION_SPLIT> -ed <EMBEDDING_DIM>
                                   -lr <LEARNING_RATE> -e <EPOCHS> -b <BATCH_SIZE>

Train a sentiment analysis model, using the given MODEL_TYPE and save the weights.

Arguments

model_type: String. Model to use. Choices: [lstm, lstm_conv]
dataset_path: String. Path to the dataset.
tokenizer_path: String. Path to the tokenizer.
save_dir: Stirng. Path to the directory where the weights will be saved.
-l | --label_col: String. Name of the label column. Default: label.
-t | --text_col: String. Name of the text column. Default: text.
-v | --validation_split: Float. Fraction of the dataset to use as validation data. Default: 0.3.
-ed | --embedding_dim: Int. Output dimension of the embedding layer. Default: 100.
-lr | --learning_rate: Float. Initial learning rate. Default: 1e-3.
-e | --epochs: Int. Total number of epochs. Default: 10.
-b | --batch_size: Int. Number of batches per epoch. Default: 32.

train_nb

python -m sentiment_analysis.train_nb [DATA_FILE] [MODEL_FILE] [SAVE_DIR] -t <TEXT_COL> -p

Train a naive bayes model and save it.

Arguments

filename: String. Path to the dataset.
-l | label_col: String: Name of the label column. Default: label.
-t | text_col: String: Name of the text column. Default: text.
-t | validation_split: Float. Fraction of the dataset to use as validation data. Default: 0.3.

predict_from_directory

python -m sentiment_analysis.predict_from_directory [FILES_DIR] [MODEL_FILE] [SAVE_PATH] -t <TEXT_COL>

Create an annotated dataset, based on the search query and the Naive Bayes' sentiment analysis.

Arguments

files_dir: String. Path to the directory containing the datasets.
model_file: String. Path to the naive bayes .pickle file.
save_path: String. Path where the resulting dataset will be saved.
-t | --text_col: String. Name of the text column. Default: text.

predict_nb

python -m sentiment_analysis.predict_nb [DATA_FILE] [MODEL_FILE] [SAVE_DIR] -t <TEXT_COL> -p

Predict the tweet polarity and save the purge the oddly fetched examples.

Arguments

data_file: String. Path to the data file.
model_file: String. Path to the model file.
save_dir: String. Path to the directory where the result will be saved.
-t | --text_col: String. Name of the text column. Default: text.
-p | --positive: Boolean. If the emotion can be classified as positive. Default: False.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.github		.github
data_fetch		data_fetch
datasets		datasets
emotion_recognition		emotion_recognition
models		models
nlp		nlp
notebooks		notebooks
sentiment_analysis		sentiment_analysis
.env.sample		.env.sample
.gitignore		.gitignore
README.md		README.md
query_relations.json		query_relations.json
requirements.gpu.txt		requirements.gpu.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tweet Emotion Recognition

Setup

Data Fetch

get_tweets

Arguments

Returns

concat_datasets

Arguments

kmeans_data_filter

Arguments

Sentiment Analysis

train

Arguments

train_nb

Arguments

predict_from_directory

Arguments

predict_nb

Arguments

About

Releases

Packages

Languages

rmohashi/tweet-emotion-recognition

Folders and files

Latest commit

History

Repository files navigation

Tweet Emotion Recognition

Setup

Data Fetch

get_tweets

Arguments

Returns

concat_datasets

Arguments

kmeans_data_filter

Arguments

Sentiment Analysis

train

Arguments

train_nb

Arguments

predict_from_directory

Arguments

predict_nb

Arguments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages