GitHub - hbjornoy/twitter-sentiment-analysis: A sentiment analysis of tweets, predicting which tweets had smileys or frownfaces.

Project Text Sentiment Classification

The task of this project is to predict if a tweet message used to contain a positive :) or negative :( smiley, by considering only the remaining text!

How to produce the prediction ( run.py )

Fork or download zip of repository, and run the run.py from the root folder with Python run.py. This pruduces the file powerpuffz_kagglescore.csv in the root folder.

Installations and requirements

In order to run run.py some packages with their dependencies are required. We recommend creating a clean Anaconda Environment, and installing the following packages via Anaconda Navigator:

Python V: 3.6.3
Numpy V: 1.12.1
Keras V: 2.0.8
Tensorflow V: 1.2.1
Scikit-learn V: 0.19.1
Gensim V: 3.1.0

In order to run the whole preprocessing and other project components, the following packages with their dependencies are required:

NLTK V: 3.2.5
Ekphrasis V: 0.3.6
- Run: pip install ekphrasis.
- NB: Will take some time on first import to download "Word statistics files".
Enchant V: 2.0.0 ( mac / linux only )
- Run: pip install pyenchant
- NB: No version for Python on 64-bit windows. Run on a OSX/Linux-machine.
- NB: This is only used for the enchant word-dictionary, in tokenizing.py, a step of the pre-processing and can be exchanged with another dictionary if running on windows is wanted. Without utalizing another dictionary, you're only hindered from running the pre-processing. We refer the reader to use the pickled pre-processed corpus defined below.

Files downloaded and used in this project

All downloaded files should be extracted and placed in the root project folder.

Download Glove word vectors word vecs from Standford Twitter Glove - In order to create the gensim_global_vectors_Xdim.txt, the method create_gensim_word2vec_file in glove_module.py must be run.
Download and unzip in root folder the training and test dataset from Kaggle

Files included in this repository

Notebooks

- Preprocessing_on_test.ipynb: This notebook is used to determine best preprosessing.   
- Data_analysis.ipynb: This notebook contains preliminary data analysis.  
- Kaggle_submissions.ipynb: This notebook contains code needed to make predictions for the 
  unseen test set from scratch, or from pickles.

Python files

- run.py: Run this file to reproduce the predictions, as explained above.     
- tokenizing.py: Contains methods for processesing data 
- helpers.py: Contains general helpers 
- dataanalysis.py: Contains methods for comparing positive and negative datasets
- neural_nets.py: Contains definitions of the neural nets used
- glove_module.py: Contains word embedding methods
- validation_and_prediction: Contains methods for running cross validation and prediction

Pickled files

- stopword_100_corpus_N2_SHN_E_SN_H_HK.pkl: Stored pickle of corpus after optimal pre-processing
- final_document_vectors.plk: Contains the final document vectors for the corpus after optimal pre-processing

Neural net models

- final_model_for_kaggle.hdf5: Contains the pre trained, complex neural net model.

Name		Name	Last commit message	Last commit date
Latest commit History 234 Commits
Kaggle submissions		Kaggle submissions
data		data
.gitignore		.gitignore
Kaggle_submissions.ipynb		Kaggle_submissions.ipynb
Preprocessing_on_test.ipynb		Preprocessing_on_test.ipynb
README.md		README.md
TODO.txt		TODO.txt
data_analysis.ipynb		data_analysis.ipynb
dataanalysis.py		dataanalysis.py
final_document_vectors.pkl		final_document_vectors.pkl
final_model_for_kaggle.hdf5		final_model_for_kaggle.hdf5
glove_module.py		glove_module.py
helpers.py		helpers.py
keggle_glove_unprepro_simple.csv		keggle_glove_unprepro_simple.csv
neural_nets.py		neural_nets.py
powerpuffz_kagglescore.csv		powerpuffz_kagglescore.csv
run.py		run.py
test_data.txt		test_data.txt
tokenizing.py		tokenizing.py
train_neg.txt		train_neg.txt
train_pos.txt		train_pos.txt
validation_and_prediction.py		validation_and_prediction.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Text Sentiment Classification

How to produce the prediction ( run.py )

Installations and requirements

Files downloaded and used in this project

Files included in this repository

Notebooks

Python files

Pickled files

Neural net models

About

Releases

Packages

Contributors 3

Languages

hbjornoy/twitter-sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Project Text Sentiment Classification

How to produce the prediction ( run.py )

Installations and requirements

Files downloaded and used in this project

Files included in this repository

Notebooks

Python files

Pickled files

Neural net models

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages