Deprecated

This codebase is deprecated. See this repo for an updated implementation of User2Vec

Usr2Vec

Implementation of the the usr2vec model to induce neural user embeddings, as described in the paper in the paper Quantifying Mental Health from Social Media with Neural User Embeddings. The resulting embeddings capture latent user aspects, e.g. political leanings and mental-health status (Figure 1)

A previous version of this model, described in the paper Modelling Context with User Embeddings for Sarcasm Detection in Social Media can be found here.

Figure 1 - User embeddings projected into 2-Dimensions and colored according to mental health status.

If you use this code please cite our paper as:

Amir, S., Coppersmith, G., Carvalho, P., Silva, M.J. and Wallace, B.C., 2017. Quantifying Mental Health from Social Media with Neural User Embeddings. In Journal of Machine Learning Research, W&C Track, Volume 68.

Requirements:

The software is implemented in python 2.7 and requires the following packages:

sma_toolkit
numpy
gensim
joblib
theano

Inputs/Outputs:

There are two inputs to this model:

a text file with the training data --- the system assumes that the documents can be tokenized using whitespace (we recommend pre-tokenzing with the appropriate tokenizer) and that all messages from a user appear sequentially (see raw_data/sample.txt for an example)
a text file with pre-trained word embeddings (e.g. word2vec, glove)

The output is a text file with a format similar to word2vec's, i.e. each line consists of user_id \t embedding.

Instructions

The software works in two main steps: (1) building the training the data; and (2) learning the user embeddings. The code can be executed as follow:

Setup
1. get the sma_toolkit
2. edit scripts/setup.sh and set the path to the sma_toolkit
3. run ./scripts/setup.sh
Build training data
1. get some pretrained word embeddings
2. edit the paths on the file scripts/build_data.sh (i.e. the variables DATA_PATH, WORD_EMBEDDINGS)
3. run ./scripts/build_data.sh
Train model: run ./scripts/build_data.sh [DATA_PATH] [OUTPUT_PATH]

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
code		code
raw_data		raw_data
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deprecated

Usr2Vec

Requirements:

Inputs/Outputs:

Instructions

About

Releases

Packages

Languages

samiroid/usr2vec

Folders and files

Latest commit

History

Repository files navigation

Deprecated

Usr2Vec

Requirements:

Inputs/Outputs:

Instructions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages