Browse files


  • Loading branch information...
mattmurray committed Aug 29, 2017
1 parent 14b4f21 commit 498248c3f87f55172ad310f194b17d354d1c488c
Showing with 2 additions and 0 deletions.
  1. +2 −0
@@ -4,6 +4,8 @@
This repo contains code for pre-processing and vectorizing raw text collected from 85,000 news articles downloaded from a variety of online broadsheet newspapers and newswires covering finance, business and the economy.
A detailed blog post can be found at
![Article counts by year](articles_by_year.png "Article counts by year")
The data was pre-processed with the removal of stop words, punctuation and numbers, and the words were stemmed using the Snowball stemmer.

0 comments on commit 498248c

Please sign in to comment.