Skip to content
HashtagMaster: Segmentation tool for hashtags
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
neural_ranker Added small language models and also added output file for ranker Jun 13, 2019
word_breaker Added word-breaker baseline Mar 27, 2019
.gitignore Added small language models and also added output file for ranker Jun 13, 2019

HashtagMaster: Segmentation tool for hashtags

This repository contains the code and resources from the following paper

Repo Structure:

  1. word_breaker: Code for word-breaker beam search.

  2. neural_ranker: Code for our neural pairwise ranker models. (4 variants)

  3. data: Task datasets and other feature files. All the features files for the experiment are added except the language models. We provided a small sample of the language models. Please email us for the whole language model.


  1. First, run the "Word Breaker" to get the top-k candidates:

    python word_breaker/ --k 10 --lm data/small_gt.bin --out train_topk.tsv --input data/our_dataset/train_corrected.tsv

    python word_breaker/ --k 10 --lm data/small_gt.bin --out test_topk.tsv --input data/our_dataset/test_corrected.tsv

  2. Rerank the top-k candidates:

    python3 neural_ranker/ --train data/our_dataset/train_corrected.tsv --train_topk train_topk.tsv --test data/our_dataset/test_corrected.tsv --test_topk test_topk.tsv --out output.tsv


Please cite if you use the above resources for your research

  author = 	"Maddela, Mounica and Xu, Wei and Preoţiuc-Pietro, Daniel",
  title = 	"Multi-task Pairwise Neural Ranking for Hashtag Segmentation",
  booktitle = 	"Proceedings of the Association for Computational Linguistics (ACL)",
  year = 	"2019",

Please use Python 3 to run the code.

You can’t perform that action at this time.