Switch branches/tags
Nothing to show
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Contents of this directory

This directory contains tweets labeled by crowdsourcing workers. Each tweet is accompanied by a label, which is the result of the majority voting among at least 3 crowdsourcing workers.

There is one sub-directory per crisis, for each of the following disasters:

On-topic/Off-topic files: *-ontopic_offtopic.csv

Contents: Each file contains approximately 10,000 tweets. 50% of these tweets were sampled from the geo-based sample, and 50% from the keywords-based sample. These two samples are described in [Olteanu et al. 2014].

Labels: These files contain labels indicating if a tweet is on-topic (related to the crisis at hand), or off-topic (not related to it).

File format: One tweet per line with the following comma-separated fields: tweet id, tweet text, tweet label


[Olteanu et al. 2014] Alexandra Olteanu, Carlos Castillo, Fernando Diaz, Sarah Vieweg: "CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises". ICWSM 2014.

For inquiries please contact Alexandra Olteanu, or Carlos Castillo, or Fernando Diaz, or Sarah Vieweg.

Version history

  • 2014-10-26: v1.0, initial release containing labeled tweets only.