🐘 A personal document with reports, analysis, and plotting of personal analytics data using R.
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


What is persanalytics?

persanalytics is a repo in which I collect and visualize personal analytics.

persanalytics contains (for now):

This is all so I can play around with data and practice plotting and analyzing it, and get some insight into changes over time in the process. My goal is to collect and visualize data that goes back years. Keystrokes, emails, messages/SMS, and any physical activities I can record.

See also:



Polar plot

Keystrokes time series

Keystrokes time series for last 7 days


These are completed todos per day. The data look too erratic on their own, so I added a loess smoothing function.

After a while, the above plot loses its usefulness. The lines are pushed against each other and, aside from the loess smoothing overlay, the viewer doesn't get any information from the line plot itself.

This is a good time to use a simple rolling mean.

That's better.


My sensors don't measure which gear I'm in, so I created a pseudo "average gear" score:

gear score = total strokes / distance

The higher the value, the smaller the gear.

  • The larger the point, the higher the gear.
  • The redder the point, the higher the average heart rate .

Average heart rate is a good metric for how intense a training session was.


I am especially happy that I can finally get my music data. There is a clear pattern that I long knew/suspected, but am still impressed I can see in the plots: I love music, but I've been listening to less of it lately. The main reason is that I'm listening to more and more podcasts.

Even though the following plot implies that I am "loving" fewer and fewer tracks as time goes on, I think that's misleading. I am always finding tracks that I can't stop listening to, I just don't use the "Love this track" feature of last.fm as much as I used to.

instant messaging

I love regular expressions.

I got around to merging and parsing the logs from my most recent IM account. According to the final tally, there are 79715 messages that I've sent and received using that account since September 2012.

Technical notes

persanalytics is a collection of R (.R) scripts written in RStudio. Each script, when run, will perform all the needed merging, crunching, nip 'n tucking, and plotting needed to arrive at the final plots, and saves them to peranalytics/plots/.

All plots are made using the ggplot2 library/package because it's awesome.


As mentioned above, keystrokes frequencies per min are collected using minute-agent. keystrokes.R reads the .log file and handles the rest.


I use t to manage my todos, and I love it. It is the only task-management system that has ever worked for me.

t saves current todos in a file called tasks.txt and all completed todos in .tasks.txt.done. todos.R reads both text files, counts the number of current and completed tasks, and appends it to data/todos.csv. It also produces the above plot.

I am using LaunchControl (a GUI for managing launchd jobs on OS X) to run the following command every 3600 seconds:

/usr/bin/rscript --vanilla Users/sherif/persanalytics/todos.R


I scrobble all my music. last.fm allows you to request your listening archive. You receive .tsv and .json files of your total, loved, banned, bootstrapped, and skipped tracks. music.R reads in the .tsv file and does what it does.

instant messaging

I've been using Adium since I first started using a Mac. This was a great decision because it means that even back when I was using Google Chat, I had all my chat logs stored locally. Adium saves chat logs in

~/Library/Application Support/Adium 2.0/Users/Default/Logs

However, I keep my Adium 2.0 folder symlinked on Dropbox, which means that all my settings and chat logs are kept in sync between my two computers. It works surprisingly well.

Within Logs, there is a folder per account. Within those there is a folder per contact. Within those a folder (with an extension .chatlog) per conversation, and within those .xml files with the content of the conversations.

IM.R executes a bash command to cat all of those scattered .xml files into one mergedIM.xml file, which it then reads in and does some regex stuff to split it into the interesting components.

If there is an easier way to parse xml using R, I don't know it.

Data on the todo list

The following data is being collected, I just need to figure out how to obtain/parse them.