Python/IPython code to analyze one's text messages. Intended to work out of the box, see README for details.
Switch branches/tags
Nothing to show
Clone or download
Permalink
Failed to load latest commit information.
.LICENSE Clarifying name in LICENSE May 15, 2016
.gitignore Adding requirements.txt for pip and documenting .gitignore May 25, 2016
CabinSketch-Bold.ttf Adding a word cloud visualization Sep 5, 2016
README.md Update README Mar 18, 2018
colorbrewer.min.js Adding a steamgraph visualization in D3 and minor cleanup Aug 29, 2016
d3.min.js Adding a steamgraph visualization in D3 and minor cleanup Aug 29, 2016
example word tree.png Initial check in of sms analysis code. May 14, 2016
facebook_connector.py Fix Facebook Connector by adapting it to the new message format (#20) Feb 17, 2018
iPhone Address Book DB Structure.png Initial check in of sms analysis code. May 14, 2016
iPhone Message DB Structure.png Initial check in of sms analysis code. May 14, 2016
iphone_connector.py Update SQL date handling to handle Apple's new format Mar 11, 2018
requirements.txt Implemented Clustering idea Feb 22, 2018
sms_analysis.ipynb Cleaning up the clustering code a bit (#23) Apr 7, 2018
steamgraph.js Changing the stacking algorithim for the steamgraph Sep 5, 2016
steamgraph_screenshot.png Adding a steamgraph visualization in D3 and minor cleanup Aug 29, 2016
tfidf_diff_screenshot.png Adding a screenshot of the TFIDF contact comparison Jan 8, 2017
wordcloud_screenshot.png Adding a screenshot of the word cloud visualization Sep 5, 2016
wordtree.py Breaking out the wordtree HTML into its own file. May 30, 2016
wordtree_template.html Breaking out the wordtree HTML into its own file. May 30, 2016

README.md

sms-analysis

Python/IPython code to analyze one's text messages. Intended to work out of the box.

Author: Michael Dezube <michael dezube at gmail dot com>

For further discussions: Join the chat at https://gitter.im/mdezube/sms-analysis

Overview of code

This code will:

  1. Find your latest iPhone sync (currently only supports doing this automatically on Macs), for PCs edit table_connector.py to find the file
  2. Load up the messages database and address book database locally
  3. Merge the databases together into fully_merged_messages_df which you can freely play with
  4. Visualize a word tree of your text messages with a specific contact, see word tree screenshot
  5. Show you who you text the most
  6. Create an interactive streamgraph to visualize how your texting with people has trended over time, see steamgraph screenshot
  7. Create a word cloud of the words you use, and those used by your contacts, see word cloud screenshot
  8. Use TFIDF to understand what words identify your contacts' verbiage
  9. Use TFIDF to understand what words identify the difference between contacts' verbiage. For example: how do high school friends talk differently from college friends, see tfidf contact comparison
  10. Use TFIDF to show you what topics were popular in texts you sent, or texts sent to you, and how this progressed over the years

Note: none of your data is modified nor sent anywhere during execution

Dependencies easy install

If you don't have pip, see https://pip.pypa.io/en/stable/installing/, or if using a Mac run sudo easy_install pip

Then run pip install -r requirements.txt and pip install "matplotlib>=1.4"

If the second comamnd fails, then you'll have to follow these detailed Matplotlib install instructions

Dependencies with details

  1. Pandas
  2. IPython
  3. Matplotlib
    • The majority of the code will work without this, but certain graphs will fail
  4. An iPhone, having synced with this computer
  5. If running on a Mac, code will work out of the box. If running on a PC, change the variable BASE_DIR in table_connector.py to the directory of your backups
    • This post seems to specify the location of backups on Windows.
  6. Internet connection to load the google visualization API, it's a very small file though

Quick Start - Jupyter Notebook

  1. Start the IPython notebook like so: jupyter notebook sms_analysis.ipynb
  2. Under the menu choose Cell --> Run All
  3. Edit the CONTACT_NAME and ROOT_WORD in the last cell to alter the visualization and then re-run that cell, under menu choose: Cell --> Run Cell

Quick Start - Command Line

  • Run python table_connector.py to see a sample of the messages and address book data
  • Run python table_connector.py --full to see a sample of the messages and address book data with all of their columns
  • Run python table_connector.py <output directory> to output the messages and address book data into CSV files
  • Run python table_connector.py --full <output directory> to output the messages and address book data into CSV files with all of their columns
  • SEE THE ARGS DOCUMENTATION: python table_connector.py --help to see the arguments and their options

Screenshots from running the code

Example word tree

Example steamgraph

Example word cloud

Example TFIDF contact comparison

Example of Clustering