Skip to content


Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.


Python 3.x code to create networks from Twitter data collected via Martin Hawksey's TAGS (Twitter Archiving Google Sheet)

This is code intended to supplement what was originally a manual workflow for collecting and transforming Twitter data into a format amenable for import into network analysis software (specifically, Gephi). TwitterAnalyticsWorkflow.doc is a Word document explaining the original manual workflow, including setting up TAGS for data collection.

The majority of this code is intended to function on TAGS 6.0, including

  •, which automates the transition from TAGS archive sheet to a csv of edges in source/target format.
  •, which does the same thing as but also includes the hashtag as a separate column (useful for if you want to combine multiple Twitter networks but maintain information about which hashtag brought each tweet/node into the network. N.B. If the same tweet has multiple hashtags in it, this may cause you to accidentally duplicate that tweet in your final network.)
  •, which is a combination of and It produces a csv of edges in source/target format as an intermediate step.

Additional code that is not TAGS 6.0-specific includes

  •, which deduplicates a spreadsheet based on the first column (Twitter's unique IDs for each tweet) and can be used if you ended up getting duplicate tweets in your dataset for any reason.
  •, which calculates some basic network metrics on a source/target list.

For those using TAGS 6.1, the field with the full tweet string is the 17th column instead of the 16th. If running leads to the creation of a blank edge list, with just a header row, try instead. If that works fine, you can run all the TAGS 6.0 code by finding the for loop lines where I've set entities_str = line[16] and change it to line[17].


Code to create networks from Twitter data collected via TAGS







No releases published


No packages published