Find file
cf38845 Oct 25, 2013
@ptwobrussell @jwsy
95 lines (59 sloc) 11.1 KB

Mining the Social Web (2nd Edition)


Mining the Social Web, 2nd Edition is available through O'Reilly Media, Amazon, and other fine book retailers. Purchasing the ebook directly from O'Reilly offers a number of great benefits, including a variety of digital formats and continual updates to the text of book for life! Better yet, if you choose to use O'Reilly's DropBox or Google Drive synchronization, your ebooks will automatically update every time there's an update. In other words, you'll always have the latest version of the book if you purchase the ebook through O'Reilly, which is why it's the recommended option in comparison to a paper copy or other electronic version. (If you prefer a paperback or Kindle version from Amazon, that's a fine option as well.)

There's an incredible turn-key virtual machine experience for this second edition of the book that provides you with a powerful social web mining toolbox. This toolbox provides the ability to explore and run all of the source code in a hassle-free manner. All that you have to do is follow a few simple steps to get the virtual machine installed, and you'll be running the example code in as little as 20-30 minutes. (And by the way, most of that time is waiting for files to download.)

This short screencast demonstrates the steps involved in installing the virtual machine, which installs every single dependency for you automatically and save you a lot of time. Even sophisticated power users tend to prefer using it versus using their own environments.

If you experience any problems at all with installation of the virtual machine, file an issue here on GitHub. Be sure to also follow @SocialWebMining on Twitter and like on Facebook.

Be sure to also visit for additional content, news, and updates about the book and code in this GitHub repository.

Preview the Full-Text of Chapter 1 (Mining Twitter)

Chapter 1 of the book provides a gentle introduction to hacking on Twitter data. It's available in a variety of convenient formats

Choose one, or choose them all. There's no better way to get started than following along with the opening chapter.

Preview the IPython Notebooks

This edition of Mining the Social Web extensively uses IPython Notebook to facilitate the learning and development process. If you're interested in what the example code for any particular chapter does, the best way to preview it is with the links below. When you're ready to develop, pull the source for this GitHub repository and follow the instructions for installing the virtual machine to get started.

A bundle of all of these links is also available:

Blog & Screencasts

Be sure to bookmark the Mining the Social Web Vimeo Channel to stay up to date with short instructional videos that demonstrate how to use the tools in this repository. More screencasts are being added all the time, so check back often -- or better yet, subscribe to the channel.

Installing the Virtual Machine
A ~3 minute screencast on installing a powerful toolbox for social web mining.
View a collection of all available screencasts at

You might also benefit from the content that is being regularly added to the companion blog at

The Mining the Social Web Virtual Machine

You may enjoy this short screencast that demonstrates the step-by-step instructions involved in installing the book's virtual machine.

The code for Mining the Social Web is organized by chapter in an IPython Notebook format to maximize enjoyment of following along with examples as part of an interactive experience. Unfortunately, some of the Python dependencies for the example code can be a little bit tricky to get installed and configured, so providing a completely turn-key virtual machine to make your reading experience as simple and enjoyable as possible is in order. Even if you are a seasoned developer, you may still find some value in using this virtual machine to get started and save yourself some time. The virtual machine is powered with Vagrant, an amazing development tool that you'll probably want to know about and arguably makes working with virtualization even easier than a native Virtualbox or VMWare image.

Quick Start Guide

The recommended way of getting started with the example code is by taking advantage of the Vagrant-powered virtual machine as illusrated in this short screencast. After all, you're more interested in following along and learning from the examples than installing and managing all of the system dependencies just to get to that point, right?

Appendix A - Virtual Machine Experience provides clear step-by-step instructions for installing the virtual machine and is intended to serve as a quick start guide.

The Mining the Social Web Wiki

This project takes advantage of its GitHub repository's wiki to act as a point of collaboration for consumers of the source code. Feel free to use the wiki however you'd like to share your experiences, and create additional pages as needed to curate additional information.

One of the more important wiki pages that you may want to bookmark is the Advisories page, which is an archive of notes about particularly disruptive commits or other changes that may affect you.

Another page of interest is a listing of all 100+ numbered examples from the book that conveniently hyperlink to read-only version of the IPython Notebooks

"Premium Support"

The source code in this repository is free for your use however you'd like. If you'd like to complete a more rigorous study about social web mining much like you would experience by following along with a textbook in a classroom, however, you should consider picking up a copy of Mining the Social Web and follow along. Think of the book as offering a form of "premium support" for this open source project.

The publisher's description of the book follows for your convenience:

How can you tap into the wealth of social web data to discover who’s making connections with whom, what they’re talking about, and where they’re located? With this expanded and thoroughly revised edition, you’ll learn how to acquire, analyze, and summarize data from all corners of the social web including Facebook, Twitter, LinkedIn, Google+, GitHub, email, websites, and blogs.

  • Employ IPython Notebook, the Natural Language Toolkit, NetworkX, and other scientific computing tools to mine popular social web sites
  • Apply advanced text-mining techniques, such as clustering and TF-IDF, to extract meaning from human language data
  • Bootstrap interest graphs from GitHub by discovering affinities among people, programming languages, and coding projects
  • Build interactive visualizations with D3.js, a state-of-the-art HTML5 and JavaScript toolkit
  • Take advantage of more than two-dozen Twitter recipes presented in O’Reilly’s popular and well-known cookbook format

The example code for this data science book is maintained in a public GitHub repository and is designed to be especially accessible through a turn-key virtual machine that facilitates interactive learning with an easy-to-use collection of IPython Notebooks.