NewsIR 16 Data:
More External Data from September 2015, adding to Signal Media One-Million News Articles Dataset used in NewsIR 16 ECIR Workshop.
Original Dataset: http://research.signalmedia.co/newsir16/signal-dataset.html
Run the notebook cells for the data you're interested in.
Run download.py to download everything (This will take a while, ~200 GB)
- Tweets: Public Stream (1% Sample)
- Tweets: Curated Stream (Tweets from ~30,000 newsworthy accounts)
- Wikipedia Current Events Portal
- DBPedia Events
- WikiLiveMon & MediaGalleries
- OEDA: Phoenix Data Project
- Google Trends