Skip to content
HashtagMaster: Segmentation tool for hashtags
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
neural_ranker Added small language models and also added output file for ranker Jun 13, 2019
word_breaker Added word-breaker baseline Mar 27, 2019
.gitignore Added small language models and also added output file for ranker Jun 13, 2019
README.md

README.md

HashtagMaster: Segmentation tool for hashtags

This repository contains the code and resources from the following paper

Repo Structure:

  1. word_breaker: Code for word-breaker beam search.

  2. neural_ranker: Code for our neural pairwise ranker models. (4 variants)

  3. data: Task datasets and other feature files. All the features files for the experiment are added except the language models. We provided a small sample of the language models. Please email us for the whole language model.

Instructions:

  1. First, run the "Word Breaker" to get the top-k candidates:

    python word_breaker/main.py --k 10 --lm data/small_gt.bin --out train_topk.tsv --input data/our_dataset/train_corrected.tsv

    python word_breaker/main.py --k 10 --lm data/small_gt.bin --out test_topk.tsv --input data/our_dataset/test_corrected.tsv

  2. Rerank the top-k candidates:

    python3 neural_ranker/main.py --train data/our_dataset/train_corrected.tsv --train_topk train_topk.tsv --test data/our_dataset/test_corrected.tsv --test_topk test_topk.tsv --out output.tsv

Citation

Please cite if you use the above resources for your research

@InProceedings{ACL-2019-Maddela,
  author = 	"Maddela, Mounica and Xu, Wei and Preoţiuc-Pietro, Daniel",
  title = 	"Multi-task Pairwise Neural Ranking for Hashtag Segmentation",
  booktitle = 	"Proceedings of the Association for Computational Linguistics (ACL)",
  year = 	"2019",
}

Please use Python 3 to run the code.

You can’t perform that action at this time.