NOTE

this particular repo is tuned for Python3 instead of Python2 as the original repo was for.

Deep Learning-Based Document Modeling for Personality Detection from Text

This code implements the model discussed in Deep Learning-Based Document Modeling for Personality Detection from Text for detection of Big-Five personality traits, namely:

Extroversion
Neuroticism
Agreeableness
Conscientiousness
Openness

Requirements

Ubuntu 16.0.4 64bit (Tested)
Python 3 (Tested)
Theano 1.0.4 (Tested)
Pandas 0.24.2 (Tested)
Pre-trained GoogleNews word2vec vector (If you are using ssh try this)

Preprocessing

process_data.py prepares the data for training. It requires three command-line arguments:

Path to google word2vec file (GoogleNews-vectors-negative300.bin)
Path to essays.csv file containing the annotated dataset
Path to mairesse.csv containing Mairesse features for each sample/essay

This code generates a pickle file essays_mairesse.p.

Example:

python process_data.py ./GoogleNews-vectors-negative300.bin ./essays.csv ./mairesse.csv

Configuration for training the model

A. Running using CPU

Configure ~./theanorc:

[global]
floatX=float64
OMP_NUM_THREADS=20
openmp=True

B. Running using GPU

Install libgpuarray
Install cuDNN for faster training
Add CUDA path to .bashrc:

export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64
PATH=${CUDA_HOME}/bin:${PATH}

export PATH

Configure ~/.theanorc:

[cuda]
root=/usr/local/cuda
[global]
device=cuda
floatX = float32
OMP_NUM_THREADS=20
openmp=True

[nvcc]
fastmath=True

Training

Note: Before these changes, every epoch took about 5 hours to complete. After them, it took less than an hour on CPU and about 45s on GPU (Improvements depend on your system spec)

A. Running on GPU

conv_net_train_gpu.py trains and tests the model using GPU.(Alternatively, you can run "run.sh" and train all traits using word2vec at once)

B. Running on CPU

conv_net_train.py trains and tests the model using CPU.

Both scripts require three command-line arguments:

Mode:
- -static: word embeddings will remain fixed
- -nonstatic: word embeddings will be trained
Word Embedding Type:
- -rand: randomized word embedding (dimension is 300 by default; is hardcoded; can be changed by modifying default value of k in line 111 of process_data.py)
- -word2vec: 300 dimensional google pre-trained word embeddings
Personality Trait:
- 0: Extroversion
- 1: Neuroticism
- 2: Agreeableness
- 3: Conscientiousness
- 4: Openness

Example:

python conv_net_train.py -static -word2vec 2

Citation

If you use this code in your work then please cite the paper - Deep Learning-Based Document Modeling for Personality Detection from Text with the following:

@ARTICLE{7887639, 
 author={N. Majumder and S. Poria and A. Gelbukh and E. Cambria}, 
 journal={IEEE Intelligent Systems}, 
 title={{Deep} Learning-Based Document Modeling for Personality Detection from Text}, 
 year={2017}, 
 volume={32}, 
 number={2}, 
 pages={74-79}, 
 keywords={feedforward neural nets;information filtering;learning (artificial intelligence);pattern classification;text analysis;Big Five traits;author personality type;author psychological profile;binary classifier training;deep convolutional neural network;deep learning based method;deep learning-based document modeling;document vector;document-level Mairesse features;emotionally neutral input sentence filtering;identical architecture;personality detection;text;Artificial intelligence;Computational modeling;Emotion recognition;Feature extraction;Neural networks;Pragmatics;Semantics;artificial intelligence;convolutional neural network;distributional semantics;intelligent systems;natural language processing;neural-based document modeling;personality}, 
 doi={10.1109/MIS.2017.23}, 
 ISSN={1541-1672}, 
 month={Mar},}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
t0w2v		t0w2v
t1w2v		t1w2v
t2w2v		t2w2v
t3w2v		t3w2v
t4w2v		t4w2v
t4w2v_keras		t4w2v_keras
.gitignore		.gitignore
Emotion_Lexicon.csv		Emotion_Lexicon.csv
LICENSE		LICENSE
README.md		README.md
conv_net_classes.py		conv_net_classes.py
conv_net_classes_gpu.py		conv_net_classes_gpu.py
conv_net_train.py		conv_net_train.py
conv_net_train_gpu.py		conv_net_train_gpu.py
conv_net_train_keras.py		conv_net_train_keras.py
essays.csv		essays.csv
mairesse.csv		mairesse.csv
process_data.py		process_data.py
process_data_python3.py		process_data_python3.py
run.sh		run.sh

License

ichenjia/personality-detection

Folders and files

Latest commit

History

Repository files navigation

NOTE

Deep Learning-Based Document Modeling for Personality Detection from Text

Requirements

Preprocessing

Configuration for training the model

Training

Citation

About

Resources

License

Stars

Watchers

Forks

Languages