SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
axes
code
doc
.DS_Store
.gitignore
README.md

README.md

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment

Authors: Jisun An, Haewoon Kwak, and Yong-Yeol Ahn

Abstract

Because word semantics can substantially change across communities and contexts, capturing domain-specific word semantics is an important challenge. Here, we propose SemAxis, a simple yet powerful framework to characterize word semantics using many semantic axes in word-vector spaces beyond sentiment. We demonstrate that SemAxis can capture nuanced semantic representations in multiple online communities. We also show that, when the sentiment axis is examined, SemAxis outperforms the state-of-the-art approaches in building domain-specific sentiment lexicons.

Highlights

Building a lexicon for various semantic axes (including and beyond sentiment)

alt text

Content analysis with SemAxis

/r/The_Donald community feels Guns more safe than /r/SandersForPresident.

alt text

Citing SemAxis

If you make use of this work in your research please cite the following paper:

Jisun An, Haewoon Kwak, and Yong-Yeol Ahn. 2018. SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL'18)

Bibtex

@InProceedings{P18-1228,
author = "An, Jisun
and Kwak, Haewoon
and Ahn, Yong-Yeol",
title = "SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment",
booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2018",
publisher = "Association for Computational Linguistics",
pages = "2450--2461",
location = "Melbourne, Australia",
url = "http://aclweb.org/anthology/P18-1228"
}

Using the code

To use SemAxis you will need to download some that are pre-trained. Once this is done, you would specify the path (variable: EMBEDDING_PATH) to these embeddings in semaxis.py. The file semaxis.py contains implementations for computing semantic axes given two pole words and projecting target word on the semantic axes along with some comments/documentation on how to use them.

Pre-trained word embeddings used in the study

We make pre-trained word embeddings used in this study availalbe to download.

732 Pre-defined Semantic Axes for download

We systematically induce 732 semantic axes based on the antonym pairs from ConceptNet. You can download them in the following: 732 Pre-defined Semantic Axes for download. The file includes 732 antonym word pairs. The file is tab-separated.

Dependencies

An up-to-date Python 3.5 distribution, with the standard packages provided by the anaconda distribution is required.

In particular, the code was tested with:

numpy (1.14.0)
gensim (3.4.0)
scipy (1.0.0)