text-analysis files on ferguson grand jury documents
the documents were compiled by @MitchFraas
download the source docs
Mitch also loaded them into Voyant Tools
I haven't interpreted the results yet, nor tried to visualize; am just providing my R script and my initial output files for DH / Data journalist types / others to explore for themselves. The R script does have a piece in it for making word clouds for the various topics, where the size of the word corresponds roughly to the importance of that word in the topic. For more on MALLET and visualizing the results, see the Journal of Digital Humanities, the work of Ted Underwood, Matt Jockers, Andrew Goldstone, etc.
As you look at the topic labels file, you'll see 'october' and 'november' and 'september' and similar (eg roman numerals, words that are appearing in the header/footer of every page etc) that should actually be added to the stoplist file. Then the analysis should be re-run. The stoplist being used is the default Mallet stoplist.
Further analysis might wish to see which documents or which topics correlate with one another (easiest way to do this is to run Excel's correlate function). Or one could look at the correlations of words within the documents. Is Brown always described the same way by witnesses? Do certain topics/discourses associate with Brown more than Wilson (and vice versa)? One could also do sentiment analysis, to see how Brown/Wilson are portrayed by the witnesses, the prosecutor, etc.
- Update 8.30 Nov 26: I compiled all of the text into a single file, then broke it apart into 1000 line chunks. I tried to clean up the text a bit to remove some of the extraneous info (title pages, etc), but it was very rough-and-ready. Some 'goreperry.com' etc will have crept in. At anyrate, the output files have now been updated, and I've also uploaded the source files as well as my trimmed files, so other folks can run their own analyses.