Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Don't Use This Repository - October 2013

Mining the Social Web, 2nd Edition is now available for purchase, and you may find its (much improved) source code repository here on GitHub at All readers and consumers of this source code are encouraged to use the 2nd Edition version of the book and source code.

If you're interested in a copy of the book, the recommended option is to purchase an ebook directly from O'Reilly, because it entails automatically receiving free updates for the (hopefully, very long) life of the book. (If you prefer a paperback or Kindle version from Amazon, that's a fine option as well.)

The original README, including some updates leading up to the release of the 2nd Edition are retained below. The source code in this repository will be maintained with bug fixes at least through the end of 2013. After that time, however, support will be on a best-effort basis and eventually phased out in favor of the 2nd Edition. Thanks for your interest in this project!


This respository contains the latest definitive bug fixed source code for examples from Mining the Social Web (1st Edition). This code is available for you to hack on, fork, and discuss regardless of whether or not you have any interest in the book. Feel free to do whatever you'd like with it. However, it is almost definitely the case that if you find this code even mildly interesting, you'll really benefit from picking up a non-pirated copy of the book. The eBook is available for purchase from its publisher, and Amazon has it in stock; new copies are reasonably priced at around $25 USD.

Important Update - May 2013

Mining the Social Web, 2nd Edition is well underway and currently available for purchase through O'Reilly's Early Access program. Although the source code in this repository and readers of the 1st Edition will continue to receive great support from both me as the author and O'Reilly as the publisher, the future of Mining the Social Web lies with the 2nd Edition, and you are highly encouraged to migrate in that direction or purchase the 2nd Edition as an ebook directly from O'Reilly if you are interested in sharpening your social web mining skills.

The source code for the 2nd Edition is available in a separate GitHub repository, is significantly simpler, heavily leans on IPython Notebook for a much better user experience, and comes with a turn-key virtual machine that trivializes installation and configuration issues that so many readers encountered with this 1st Edition. Feel free to ask qny question you might have by reaching out on Facebook or Twitter. Enjoy!

Update - March 2013

Twitter is retiring v1 of its API that the examples in this book were based upon. The (print and ebook) copies that are in circulation are not updated to reflect the new v1.1 Twitter API, but as a stop-gap measure, iPython Notebooks and standard Python exports of those notebooks are now checked in for chapters 1, 4, and 5, which feature Twitter data.

You can view read-only version of these notebooks if you're interested:

Reach out on the Facebook page or Twitter with any questions or concerns that you may have.

Don't Steal This Book

Let me be the first to tell you that you could find pirated copies of this book online before it was even in stock on Amazon, but I implore you not to go to those places and get it. In the end, everybody really ends up losing if you steal eBooks, because it kills the supply chain at its roots -- stripping the motivation right out of authors and other original/creative content producers who work hard to bring you interesting content that takes a lot of energy to produce. Here are lots of reasons not to steal this book:

  • All of the source code is already being given away right here as a convenient GitHub project. Do whatever you want with it.
  • You can already "Click to Look Inside" at Amazon and explore the content, and surely, lots of reviews are forthcoming on Amazon.
  • You can get a free trial to Safari that allows you to explore the entire book's content without paying a cent.
  • You can download a full quality "free sampler" of the entire first chapter as a DRM-free PDF.
  • There's a Facebook Page where you can connect with friends who "like" the book -- ask them for an honest opinion.
  • You can ask all the questions you want on Twitter at @SocialWebMining, and you will get a helpful response.
  • You can preview approximately 20% of the content with Google Books.
  • The book is available for Kindle on the Amazon product page, so it's available for a deeper discount than the in-print version if you already own a Kindle.

If these reasons don't provide enough information for you to make an informed decision about whether or not spending ~$25 is a good investment and that there are ample alternatives to "casual copying", then please send me a message and tell me what it would take to convince you. I'd love to chat about it and hear your thoughts.

Getting Started

These simple instructions walk you through an exercise in visualizing retweets and make a great starting point. All that's needed is a basic familiarity with Python and easy_install, and you'll be up and running in no time at all.

Getting Help

Feel free to submit pull requests if you uncover any bugs or come up with improvements you'd like to share with everyone, or file an issue here on GitHub if you need help with something. Feel free to also submit questions to @SocialWebMining on Twitter.


Join the book's official Twitter account, @SocialWebMining, or check out its Facebook Page to keep up with news, speaking events, webinar announcements, additional content that will be published, occasional book giveaways, and more.

Marketing Description


Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites

Popular social networks such as Facebook, Twitter, and LinkedIn generate a tremendous amount of valuable social data. Who's talking to whom? What are they talking about? How often are they talking? Where are they at? This concise and practical book shows you how to answer these types of questions and more. Each chapter presents a soup-to-nuts approach that combines popular social web data, analysis techniques, and visualization so that you can find the needles you've been looking for as well as some of the ones you didn't even know to look for in the first place.

With Mining the Social Web, intermediate to advanced Python programmers will learn how to collect and analyze social data in way that lends itself to hacking as well as more industrial-strength analysis. The book is highly readable from cover to cover and tells a coherent story, but chapters of interest could just as easily be cherry-picked if you need to narrow in on a specific topic in a hurry.

  • Get a concise and straightforward synopsis of the social web landscape so you know which 20% of the space to spend 80% of your time on
  • Use easily adaptable scripts hosted on GitHub to harvest data from popular social network APIs including Twitter, Facebook, and LinkedIn
  • Learn how to slice and dice social web data with easy to use Python tools as well as apply more advanced mining techniques such as TF-IDF, cosine similarity, collocation analysis, document summarization, and clique detection
  • Build interactive visualizations with easily adaptable web technologies built upon HTML5 and JavaScript toolkits


"A rich, compact, useful practical introduction to a galaxy of tools, techniques and theories for exploring structured and unstructured data" -- Alex Martelli, Senior Staff Engineer, Google; author of Python in a Nutshell

"Data from the social Web is different: networks and text, not tables and numbers, are the rule, and familiar query languages are replaced with rapidly evolving web service APIs. Let Matthew Russell serve as your guide to working with social data sets old (email, blogs) and new (Twitter, LinkedIn, Facebook). Mining the Social Web is a natural successor to Programming Collective Intelligence: a practical, hands-on approach to hacking on data from the social Web with Python." -- Jeff Hammerbacher, Chief Scientist, Cloudera

"Few things will impact us the way automated understanding of human communication by software will in the coming years. This subject is broad and deep. It has been the subject of thousands of papers and hundreds of dissertations. What Matthew has pulled together is something that has really been missing: an applied introduction to a diverse and deep set of technologies and topics that make the knowledge buried in human communication inside the social web accessible. It is the work of a powerful technologist--someone who can equip capable programmers with new tools that are truly valuable.

Read this book. It will open up doors to where software is going in the next decade." -- Tim Estes, Founder and CEO, Digital Reasoning

"Mining the Social Web is a must-read as data is distributed at a dizzying pace. A great primer for API jockeys, social media junkies, and data scientists alike, [Matthew] Russell deftly distills the prodigious opportunity in mining social media data." -- Nick Ducoff, CEO of Infochimps, Inc.

"This is an essential guide to tapping the new generation of online data sources. Russell has done a great job creating an accessible manual for anyone working with social information on the web, covering both how to access it and simple methods for extracting surprising insights from all that raw data." -- Pete Warden, Founder of

"Mining the Social Web is now my go-to book for any project that involves analyzing social data. It contains a multitude of useful examples and is highly recommended for any data mining project you’re considering. Great for beginners and advanced readers alike." -- Abe Music, Principal, Zaffra

"This book is clearly a labor of love for the author. He has deftly woven together the use of classic text and graph mining libraries with current social media applications. Examples are concrete and concise while providing useful insights that facilitate future development and exploration by the reader. This text is a great primer for those just beginning their forays into extracting understanding from social networks, and also for advanced researchers needing access to the latest social media APIs." -- Chris Augeri, Senior Research Fellow, University of Nebraska

"This is a phenomenal book for anyone wanting to get started mining social data. It is well- researched and provides plenty of examples to get one going from the very first chapter. It is also very easy to follow and a real pleasure to read. This book is my first recommendation for anyone interested in the mining, analysis, and visualization of data from the social web." -- Jeffrey Humphries, PhD; Computer Scientist

"Mining the Social Web is a great resource on how to get the most out of the Twitter API." -- Raffi Krikorian, Platform Services group, Twitter

"Matthew covers an interesting and eclectic group of data sources, analysis techniques, data management tools, and visualizations that provide a thorough survey of the latest thinking on how to gain insight from the social web. His examples are vivid and serve as great starting points for further exploration. Matthew clearly cares that the reader understands the material; the book is chock full of timely, knowing, and truly helpful hints and advice. Mining the Social Web has me excited to dive further into this rich area of analysis." -- Roger Magoulas, Director of Market Research, O’Reilly Media


The official online compendium for Mining the Social Web (O'Reilly, 2011)







No releases published


No packages published