External Data for Signal Media One-Million News Articles Dataset used in NewsIR 16 ECIR Workshop
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
GDELT
ICEWS
dbpedia
google-trends
mediagalleries
phoenixdata
reddit
reportedly
twitter
wikilivemon
wikipedia
.gitignore
README.md
download.ipynb
download.py

README.md

NewsIR 16 Data:

More External Data from September 2015, adding to Signal Media One-Million News Articles Dataset used in NewsIR 16 ECIR Workshop.

Original Dataset: http://research.signalmedia.co/newsir16/signal-dataset.html

WIP

Downloading:

Run the notebook cells for the data you're interested in.

Run download.py to download everything (This will take a while, ~200 GB)

External Data:

  • Tweets: Public Stream (1% Sample)
  • Tweets: Curated Stream (Tweets from ~30,000 newsworthy accounts)
  • Wikipedia Current Events Portal
  • DBPedia Events
  • WikiLiveMon & MediaGalleries
  • ICEWS
  • GDELT
  • OEDA: Phoenix Data Project
  • Google Trends
  • Reddit
  • Reportedly