# RescueTime Downloader

Code to collect and export RescueTime Activity Logs, includes options to collect in hourly or minute bins. Default is hourly.

**NOTE:** Collecting Full History takes some time, depending how many years of data you have. I recommend you configure the script below to pull data in yearly chunks, though it should work if you attempt to export full history. 

------

## Setup and Installation

* Go to [RescueTime API](https://www.rescuetime.com/anapi/manage) and copy an API Key 
* Copy credentials-sample.json to create credentials.json and add your RescueTime Key.
* This project depends on no additional code besides standard python libraries and Pandas. 

-----

## Dependencies

In [291]:
import requests
import os
from datetime import date, datetime, timedelta as td
import pandas as pd

----

## Credentials

In [292]:
import json

with open("credentials.json", "r") as file:
    credentials = json.load(file)
    rescuetime_cr = credentials['rescuetime']
    KEY = rescuetime_cr['KEY']

In [293]:
baseurl = 'https://www.rescuetime.com/anapi/data?key='

In [294]:
url =  baseurl + KEY

----

## Export Dates Configuration

In [295]:
# Configure These to Your Preferred Dates
start_date = '2017-01-01'  # Start date for data
end_date   = '2017-12-31'  # End date for data

------

## Function to Get RescueTime Activities

In [296]:
# Adjustable by Time Period
def rescuetime_get_activities(start_date, end_date, resolution='hour'):
    # Configuration for Query
    # SEE: https://www.rescuetime.com/apidoc
    payload = {
        'perspective':'interval',
        'resolution_time': resolution, #1 of "month", "week", "day", "hour", "minute"
        'restrict_kind':'document',
        'restrict_begin': start_date,
        'restrict_end': end_date,
        'format':'json' #csv
    }
    
    # Setup Iteration - by Day
    d1 = datetime.strptime(payload['restrict_begin'], "%Y-%m-%d").date()
    d2 = datetime.strptime(payload['restrict_end'], "%Y-%m-%d").date()
    delta = d2 - d1
    
    activities_list = []
    
    # Iterate through the days, making a request per day
    for i in range(delta.days + 1):
        # Find iter date and set begin and end values to this to extract at once.
        d3 = d1 + td(days=i) # Add a day
        if d3.day == 1: print('Pulling Monthly Data for ', d3)

        # Update the Payload
        payload['restrict_begin'] = str(d3) # Set payload days to current
        payload['restrict_end'] = str(d3)   # Set payload days to current

        # Request
        try: 
            r = requests.get(url, payload) # Make Request
            iter_result = r.json() # Parse result
            # print("Collecting Activities for " + str(d3))
        except: 
            print("Error collecting data for " + str(d3))
    
        for i in iter_result['rows']:
            activities_list.append(i)
            
    return activities_list

---

## Collect Report of Activites By Day

In [297]:
# activities_day_log = rescuetime_get_activities(start_date, end_date, 'day')

In [298]:
# activities_daily = pd.DataFrame.from_dict(activities_day_log)

In [299]:
# activities_daily.info()

In [300]:
# activities_daily.describe()

In [301]:
# activities_daily.tail()

----

## Collect Report of Activites By Hour

In [302]:
activities_hour_log = rescuetime_get_activities(start_date, end_date, 'hour')

Pulling Monthly Data for  2017-01-01
Pulling Monthly Data for  2017-02-01
Pulling Monthly Data for  2017-03-01
Pulling Monthly Data for  2017-04-01
Pulling Monthly Data for  2017-05-01
Pulling Monthly Data for  2017-06-01
Pulling Monthly Data for  2017-07-01
Pulling Monthly Data for  2017-08-01
Pulling Monthly Data for  2017-09-01
Pulling Monthly Data for  2017-10-01
Pulling Monthly Data for  2017-11-01
Pulling Monthly Data for  2017-12-01


In [303]:
activities_hourly = pd.DataFrame.from_dict(activities_hour_log)

In [304]:
activities_hourly.columns = ['Date', 'Seconds', 'NumberPeople', 'Actitivity', 'Document', 'Category', 'Productivity']

In [305]:
activities_hourly.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 125535 entries, 0 to 125534
Data columns (total 7 columns):
Date            125535 non-null object
Seconds         125535 non-null int64
NumberPeople    125535 non-null int64
Actitivity      125535 non-null object
Document        125535 non-null object
Category        125535 non-null object
Productivity    125535 non-null int64
dtypes: int64(3), object(4)
memory usage: 6.7+ MB


In [306]:
activities_hourly.describe()

Unnamed: 0,Seconds,NumberPeople,Productivity
count,125535.0,125535.0,125535.0
mean,62.490453,1.0,0.856813
std,189.186666,0.0,1.350972
min,1.0,1.0,-2.0
25%,2.0,1.0,0.0
50%,9.0,1.0,1.0
75%,41.0,1.0,2.0
max,3886.0,1.0,2.0


In [307]:
activities_hourly.tail()

Unnamed: 0,Date,Seconds,NumberPeople,Actitivity,Document,Category,Productivity
125530,2017-12-31T23:00:00,1,1,medium.com,Untitled,Writing,2
125531,2017-12-31T23:00:00,1,1,en.wikipedia.org,"https://en.wikipedia.org/wiki/Earl_Park,_Arncl...",General Reference & Learning,1
125532,2017-12-31T23:00:00,1,1,en.wikipedia.org,Untitled,General Reference & Learning,1
125533,2017-12-31T23:00:00,1,1,trainasone.com,No Details,Health & Medicine,1
125534,2017-12-31T23:00:00,1,1,medium.com,https://medium.com/40-weeks/37-772d7f519f9,Writing,2


In [308]:
activities_hourly.to_csv('data/rescuetime-hourly-' + start_date + '-to-' + end_date + '.csv')

## Collect Report of Activites By Minute

In [309]:
# activities_minute_log = rescuetime_get_activities(start_date, end_date, 'minute')

In [310]:
# activities_per_minute = pd.DataFrame.from_dict(activities_minute_log)

In [311]:
# Date', u'Time Spent (seconds)', u'Number of People', u'Activity', u'Document', u'Category', u'Productivity'
# activities_per_minute.columns = ['Date', 'Seconds', 'NumberPeople', 'Actitivity', 'Document', 'Category', 'Productivity']

In [312]:
# activities_per_minute.head()

In [313]:
# activities_per_minute.info()

In [314]:
# activities_per_minute.describe()

In [315]:
# activities_per_minute.to_csv('data/rescuetime-by-minute' + start_date + '-to-' + end_date + '.csv')

-----

## Simple Analysis (Using Exported Logs)

In [316]:
import glob
import os

In [317]:
# import hourly data exports and create a single data frame
path = 'data/'
allFiles = glob.glob(path + "/rescuetime-hourly*.csv")
timelogs = pd.DataFrame()
list_ = []
for file_ in allFiles:
    df = pd.read_csv(file_,index_col=None, header=0)
    list_.append(df)
activities = pd.concat(list_)

In [318]:
len(activities) # 312477

689193

In [319]:
# total hours
activities.Seconds.sum() / 60 / 60

12194.12361111111

In [325]:
# total days
activities.Seconds.sum() / 60 / 60 / 24

508.08848379629626

In [320]:
activities.head()

Unnamed: 0.1,Unnamed: 0,Date,Seconds,NumberPeople,Actitivity,Document,Category,Productivity
0,0,2012-08-15T00:00:00,1033,1,Skype,Skype,Instant Message,0
1,1,2012-08-15T00:00:00,524,1,iTerm,1. ssh,Systems Operations,2
2,2,2012-08-15T00:00:00,372,1,projects.int3c.com,No Details,Project Management,2
3,3,2012-08-15T00:00:00,319,1,course-notes.org,No Details,General Software Development,2
4,4,2012-08-15T00:00:00,215,1,TextMate,untitled 84,Editing & IDEs,2


In [321]:
activities.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 689193 entries, 0 to 131879
Data columns (total 8 columns):
Unnamed: 0      689193 non-null int64
Date            689193 non-null object
Seconds         689193 non-null int64
NumberPeople    689193 non-null int64
Actitivity      689193 non-null object
Document        689183 non-null object
Category        689193 non-null object
Productivity    689193 non-null int64
dtypes: int64(4), object(4)
memory usage: 47.3+ MB


In [322]:
activities.describe()

Unnamed: 0.1,Unnamed: 0,Seconds,NumberPeople,Productivity
count,689193.0,689193.0,689193.0,689193.0
mean,61140.872689,63.696011,1.0,0.818431
std,37237.128353,195.538838,0.0,1.3032
min,0.0,1.0,1.0,-2.0
25%,28716.0,2.0,1.0,0.0
50%,58908.0,10.0,1.0,1.0
75%,93368.0,40.0,1.0,2.0
max,131879.0,7428.0,1.0,2.0


In [323]:
# create columns for year, month, day, and dow

In [324]:
# pivot table 
# activities.pivot(index='date', columns='Category', values='seconds')
# temp.pivot(columns='Category', values='Seconds')