The official online compendium for Mining the Social Web (O'Reilly, 2011)
Switch branches/tags
Nothing to show
Pull request Compare This branch is 63 commits behind ptwobrussell:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


This respository contains the latest definitive bug fixed source code for examples from Mining the Social Web. This code is available for you to hack on, fork, and discuss regardless of whether or not you have any interest in the book. Feel free to do whatever you'd like with it. However, it is almost definitely the case that if you find this code even mildly interesting, you'll really benefit from picking up a non-pirated copy of the book. As of late January 2011, the first edition of the book is complete. The eBook is already available for purchase, and Amazon has it in stock; new copies are reasonably priced at just under $25 USD.

In addition to Mining the Social Web, you might also enjoy two companion products: Matthew Russell on Mining the Social Web, a spin-off video series that talks through much of the Twitter-related content in the book, and 21 Recipes for Mining Twitter, a collection of useful starting points that you can piece together and adapt to solve lots of Twitter-related data mining problems.

Don't Steal This Book

Let me be the first to tell you that you could find pirated copies of this book online before it was even in stock on Amazon, but I implore you not to go to those places and get it. In the end, everybody really ends up losing if you steal eBooks, because it kills the supply chain at its roots -- stripping the motivation right out of authors and other original/creative content producers who work hard to bring you interesting content that takes a lot of energy to produce. Here are lots of reasons not to steal this book:

  • All of the source code is already being given away right here as a convenient GitHub project. Do whatever you want with it.
  • You can already "Click to Look Inside" at Amazon and explore the content, and surely, lots of reviews are forthcoming on Amazon.
  • You can get a free trial to Safari that allows you to explore the entire book's content without paying a cent.
  • You can download a full quality "free sampler" of the entire first chapter as a DRM-free PDF.
  • There's a Facebook Page where you can connect with friends who "like" the book -- ask them for an honest opinion.
  • You can ask all the questions you want on Twitter at @SocialWebMining, and you will get a helpful response.
  • You can preview approximately 20% of the content with Google Books.

If these reasons don't provide enough information for you to make an informed decision about whether or not spending ~$25 is a good investment and that there are ample alternatives to "casual copying", then please send me a message and tell me what it would take to convince you. I'd love to chat about it and hear your thoughts.

It would be great if there were a 6th reason -- the book being available for Kindle. If you feel the same way, please consider clicking on the "I’d like to read this book on Kindle" link on the Amazon product page.

Getting Started

These simple instructions walk you through an exercise in visualizing retweets and make a great starting point. All that's needed is a basic familiarity with Python and easy_install, and you'll be up and running in no time at all.

Getting Help

Feel free to submit pull requests if you uncover any bugs or come up with improvements you'd like to share with everyone, or file an issue here on GitHub if you need help with something. Feel free to also submit questions to @SocialWebMining on Twitter.


Join the book's official Twitter account, @SocialWebMining, or check out its Facebook Page to keep up with news, speaking events, webinar announcements, additional content that will be published, occasional book giveaways, and more.

Marketing Description


Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites

Popular social networks such as Facebook, Twitter, and LinkedIn generate a tremendous amount of valuable social data. Who's talking to whom? What are they talking about? How often are they talking? Where are they at? This concise and practical book shows you how to answer these types of questions and more. Each chapter presents a soup-to-nuts approach that combines popular social web data, analysis techniques, and visualization so that you can find the needles you've been looking for as well as some of the ones you didn't even know to look for in the first place.

With Mining the Social Web, intermediate to advanced Python programmers will learn how to collect and analyze social data in way that lends itself to hacking as well as more industrial-strength analysis. The book is highly readable from cover to cover and tells a coherent story, but chapters of interest could just as easily be cherry-picked if you need to narrow in on a specific topic in a hurry.

  • Get a concise and straightforward synopsis of the social web landscape so you know which 20% of the space to spend 80% of your time on
  • Use easily adaptable scripts hosted on GitHub to harvest data from popular social network APIs including Twitter, Facebook, and LinkedIn
  • Learn how to slice and dice social web data with easy to use Python tools as well as apply more advanced mining techniques such as TF-IDF, cosine similarity, collocation analysis, document summarization, and clique detection
  • Build interactive visualizations with easily adaptable web technologies built upon HTML5 and JavaScript toolkits


"A rich, compact, useful practical introduction to a galaxy of tools, techniques and theories for exploring structured and unstructured data" -- Alex Martelli, Senior Staff Engineer, Google; author of Python in a Nutshell

"Data from the social Web is different: networks and text, not tables and numbers, are the rule, and familiar query languages are replaced with rapidly evolving web service APIs. Let Matthew Russell serve as your guide to working with social data sets old (email, blogs) and new (Twitter, LinkedIn, Facebook). Mining the Social Web is a natural successor to Programming Collective Intelligence: a practical, hands-on approach to hacking on data from the social Web with Python." -- Jeff Hammerbacher, Chief Scientist, Cloudera

"Few things will impact us the way automated understanding of human communication by software will in the coming years. This subject is broad and deep. It has been the subject of thousands of papers and hundreds of dissertations. What Matthew has pulled together is something that has really been missing: an applied introduction to a diverse and deep set of technologies and topics that make the knowledge buried in human communication inside the social web accessible. It is the work of a powerful technologist--someone who can equip capable programmers with new tools that are truly valuable.

Read this book. It will open up doors to where software is going in the next decade." -- Tim Estes, Founder and CEO, Digital Reasoning

"Mining the Social Web is a must-read as data is distributed at a dizzying pace. A great primer for API jockeys, social media junkies, and data scientists alike, [Matthew] Russell deftly distills the prodigious opportunity in mining social media data." -- Nick Ducoff, CEO of Infochimps, Inc.

"This is an essential guide to tapping the new generation of online data sources. Russell has done a great job creating an accessible manual for anyone working with social information on the web, covering both how to access it and simple methods for extracting surprising insights from all that raw data." -- Pete Warden, Founder of

"Mining the Social Web is now my go-to book for any project that involves analyzing social data. It contains a multitude of useful examples and is highly recommended for any data mining project you’re considering. Great for beginners and advanced readers alike." -- Abe Music, Principal, Zaffra

"This book is clearly a labor of love for the author. He has deftly woven together the use of classic text and graph mining libraries with current social media applications. Examples are concrete and concise while providing useful insights that facilitate future development and exploration by the reader. This text is a great primer for those just beginning their forays into extracting understanding from social networks, and also for advanced researchers needing access to the latest social media APIs." -- Chris Augeri, Senior Research Fellow, University of Nebraska

"This is a phenomenal book for anyone wanting to get started mining social data. It is well- researched and provides plenty of examples to get one going from the very first chapter. It is also very easy to follow and a real pleasure to read. This book is my first recommendation for anyone interested in the mining, analysis, and visualization of data from the social web." -- Jeffrey Humphries, PhD; Computer Scientist

"Mining the Social Web is a great resource on how to get the most out of the Twitter API." -- Raffi Krikorian, Platform Services group, Twitter

"Matthew covers an interesting and eclectic group of data sources, analysis techniques, data management tools, and visualizations that provide a thorough survey of the latest thinking on how to gain insight from the social web. His examples are vivid and serve as great starting points for further exploration. Matthew clearly cares that the reader understands the material; the book is chock full of timely, knowing, and truly helpful hints and advice. Mining the Social Web has me excited to dive further into this rich area of analysis." -- Roger Magoulas, Director of Market Research, O’Reilly Media