Skip to content

Collection of scripts to acquire the dataset of emerging entities presented in: Graus, Odijk, de Rijke, The Birth of Collective Memories: Analyzing Emerging Entities in Text Streams

graus/emerging-entities-timeseries

Repository files navigation

This is a collection of scripts that accompany our paper: "The Birth of Collective Memories: Analyzing Emerging Entities in Text Streams". These scripts enable you to recreate the "FAKBAT" dataset from the paper (i.e., FAKBA1 with "entity age timestamps").

My apologies for the current state/lack of documentations of these scripts, they are currently very 'academic.' However, I did run through them to clean them up a bit, and the process shouldn't be too complex to follow. I'll likely do a clean-up soon.

If you use the dataset, please kindly cite:

@article {ASI:ASI24004,
author = {Graus, David and Odijk, Daan and de Rijke, Maarten},
title = {The birth of collective memories: Analyzing emerging entities in text streams},
journal = {Journal of the Association for Information Science and Technology},
issn = {2330-1643},
url = {http://dx.doi.org/10.1002/asi.24004},
doi = {10.1002/asi.24004},
year={2018}
}

Requirements

  • Python 2.7

Libraries/packages

  • dateutil

Data

Pipeline

  1. ./0_parse_FAKBA.py <location of FAKBA1 files> do a single pass through FAKBA1, collect (+count) all Freebase MIDs in FAKBA1;
  2. ./1_Wiki2Freebase.py: Generate mappings from Wikipedia IDs to Freebase MIDs (filtered with the MIDs yielded by the previous step);
  3. ./2_extract_WikiTimestamps.py: Pass through Wikipedia Meta Stub file, output Freebase MID to timestamp-mappings;
  4. ./3_create_fakbat.py: Pass through FAKBA1 file-by-file, add entity 'age' column.

TODOs

It is not absolutely necessary to first pass through FAKBA1 just to get counts. It is also easy to parallelize.

About

Collection of scripts to acquire the dataset of emerging entities presented in: Graus, Odijk, de Rijke, The Birth of Collective Memories: Analyzing Emerging Entities in Text Streams

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages