Data
fcrimins edited this page May 18, 2017
·
12 revisions
Google's One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling (3/24/17)
- per here
NLTK has its own datasets (3/20/17)
- downloaded here: ~/nltk_data
- But what are the bestsellers from 1916 with sales normalized by year after publication?
Ratings datasets are figuratively just lying around the web these days, begging for someone to take notice and analyze them.
- Movie reviews from the Netflix Prize dataset
- Business reviews from the Yelp Academic Dataset, as summarized here
- Amazon book reviews from the Multi-domain Sentiment Dataset
- News ratings dataset from Reddit
41 Machine Learning Interview Questions (1/30/17)
- 19 Free Public Data Sets For Your First Data Science Project
- "check out Quandl for economic and financial data, and Kaggle’s Datasets collection for another great list