# RescueTime Downloader

Code to collect and export RescueTime Activity Logs, includes options to collect in hourly or minute bins. Default is hourly.

**NOTE:** Collecting Full History takes some time, depending how many years of data you have. I recommend you configure the script below to pull data in yearly chunks, though it should work if you attempt to export full history. 

------

## Setup and Installation

* Go to [RescueTime API](https://www.rescuetime.com/anapi/manage) and copy an API Key 
* Copy credentials-sample.json to create credentials.json and add your RescueTime Key.
* This project depends on no additional code besides standard python libraries and Pandas. 

-----

## Dependencies

In [1]:
import requests
import os
from datetime import date, datetime, timedelta as td
import pandas as pd

----

## Credentials

In [2]:
import json

with open("credentials.json", "r") as file:
    credentials = json.load(file)
    rescuetime_cr = credentials['rescuetime']
    KEY = rescuetime_cr['KEY']

In [3]:
baseurl = 'https://www.rescuetime.com/anapi/data?key='

In [4]:
url =  baseurl + KEY

----

## Export Dates Configuration

In [5]:
# Configure These to Your Preferred Dates
start_date = '2019-06-10'  # Start date for additional data export
end_date   = '2019-06-12'  # End date for data

------

## Function to Get RescueTime Activities

In [6]:
# Adjustable by Time Period
def rescuetime_get_activities(start_date, end_date, resolution='hour'):
    # Configuration for Query
    # SEE: https://www.rescuetime.com/apidoc
    payload = {
        'perspective':'interval',
        'resolution_time': resolution, #1 of "month", "week", "day", "hour", "minute"
        'restrict_kind':'document',
        'restrict_begin': start_date,
        'restrict_end': end_date,
        'format':'json' #csv
    }
    
    # Setup Iteration - by Day
    d1 = datetime.strptime(payload['restrict_begin'], "%Y-%m-%d").date()
    d2 = datetime.strptime(payload['restrict_end'], "%Y-%m-%d").date()
    delta = d2 - d1
    
    activities_list = []
    
    # Iterate through the days, making a request per day
    for i in range(delta.days + 1):
        # Find iter date and set begin and end values to this to extract at once.
        d3 = d1 + td(days=i) # Add a day
        if d3.day == 1: print('Pulling Monthly Data for ', d3)

        # Update the Payload
        payload['restrict_begin'] = str(d3) # Set payload days to current
        payload['restrict_end'] = str(d3)   # Set payload days to current

        # Request
        try: 
            r = requests.get(url, payload) # Make Request
            iter_result = r.json() # Parse result
            # print("Collecting Activities for " + str(d3))
        except: 
            print("Error collecting data for " + str(d3))
    
        if len(iter_result) != 0:
            for i in iter_result['rows']:
                activities_list.append(i)
        else:
            print("Appears there is no RescueTime data for " + str(d3))
        
    return activities_list

---

## Collect Report of Activites By Day

In [7]:
# activities_day_log = rescuetime_get_activities(start_date, end_date, 'day')

In [8]:
# activities_daily = pd.DataFrame.from_dict(activities_day_log)

In [9]:
# activities_daily.info()

In [10]:
# activities_daily.describe()

In [11]:
# activities_daily.tail()

----

## Collect Report of Activites By Hour

In [12]:
activities_hour_log = rescuetime_get_activities(start_date, end_date, 'hour')

In [13]:
activities_hourly = pd.DataFrame.from_dict(activities_hour_log)

In [14]:
activities_hourly.columns = ['Date', 'Seconds', 'NumberPeople', 'Actitivity', 'Document', 'Category', 'Productivity']

In [15]:
activities_hourly.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 635 entries, 0 to 634
Data columns (total 7 columns):
Date            635 non-null object
Seconds         635 non-null int64
NumberPeople    635 non-null int64
Actitivity      635 non-null object
Document        635 non-null object
Category        635 non-null object
Productivity    635 non-null int64
dtypes: int64(3), object(4)
memory usage: 34.8+ KB


In [16]:
activities_hourly.describe()

Unnamed: 0,Seconds,NumberPeople,Productivity
count,635.0,635.0,635.0
mean,48.579528,1.0,1.029921
std,117.375901,0.0,0.940192
min,1.0,1.0,-2.0
25%,3.0,1.0,0.0
50%,11.0,1.0,1.0
75%,36.5,1.0,2.0
max,1271.0,1.0,2.0


In [17]:
activities_hourly.tail()

Unnamed: 0,Date,Seconds,NumberPeople,Actitivity,Document,Category,Productivity
630,2019-06-12T23:00:00,1,1,google.com,Untitled - Google Chrome,Search,0
631,2019-06-12T23:00:00,1,1,stackoverflow.com,"git add - Git add only all new files, not modi...",General Software Development,2
632,2019-06-12T23:00:00,1,1,en.wikipedia.org,Tachisme - Wikipedia - Google Chrome,General Reference & Learning,1
633,2019-06-12T23:00:00,1,1,en.wikipedia.org,Find in page\n Pop art - Wikipedia,General Reference & Learning,1
634,2019-06-12T23:00:00,1,1,google.com,most famous abstract artist - Google Search - ...,Search,0


In [18]:
activities_hourly.to_csv('data/rescuetime-hourly-' + start_date + '-to-' + end_date + '.csv')

## Collect Report of Activites By Minute

In [19]:
# activities_minute_log = rescuetime_get_activities(start_date, end_date, 'minute')

In [20]:
# activities_per_minute = pd.DataFrame.from_dict(activities_minute_log)

In [21]:
# Date', u'Time Spent (seconds)', u'Number of People', u'Activity', u'Document', u'Category', u'Productivity'
# activities_per_minute.columns = ['Date', 'Seconds', 'NumberPeople', 'Actitivity', 'Document', 'Category', 'Productivity']

In [22]:
# activities_per_minute.head()

In [23]:
# activities_per_minute.info()

In [24]:
# activities_per_minute.describe()

In [25]:
# activities_per_minute.to_csv('data/rescuetime-by-minute' + start_date + '-to-' + end_date + '.csv')

-----

## Simple Analysis (Using Exported Logs)

In [26]:
import glob
import os

In [27]:
# import hourly data exports and create a single data frame
path = 'data/'
allFiles = glob.glob(path + "/rescuetime-hourly*.csv")
timelogs = pd.DataFrame()
list_ = []
for file_ in allFiles:
    df = pd.read_csv(file_,index_col=None, header=0)
    list_.append(df)
activities = pd.concat(list_)

In [28]:
activities = activities.reset_index(drop=True)

In [29]:
# drop old index column
activities = activities.drop("Unnamed: 0", 1)
activities.columns

Index(['Date', 'Seconds', 'NumberPeople', 'Actitivity', 'Document', 'Category',
       'Productivity'],
      dtype='object')

In [30]:
# drop any duplicates
activities = activities.drop_duplicates()

In [31]:
len(activities) # 795418

935822

In [32]:
# total hours
activities.Seconds.sum() / 60 / 60

16133.543611111112

In [33]:
# total days
activities.Seconds.sum() / 60 / 60 / 24

672.2309837962963

In [34]:
activities.tail()

Unnamed: 0,Date,Seconds,NumberPeople,Actitivity,Document,Category,Productivity
940474,2019-10-13T23:00:00,2,1,sublime text,index.js — Header — increment,Editing & IDEs,2
940475,2019-10-13T23:00:00,2,1,Gmail,FW: I tried to hack my insomnia with technolog...,Email,1
940476,2019-10-13T23:00:00,2,1,Gmail,Useful Article: Can I Safely Exercise with Hyp...,Email,1
940477,2019-10-13T23:00:00,1,1,Google Forms,Untitled - Google Chrome,General Business,2
940478,2019-10-13T23:00:00,1,1,idealistla2019.eventbrite.com,No Details,Calendars,0


In [35]:
activities.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 935822 entries, 0 to 940478
Data columns (total 7 columns):
Date            935822 non-null object
Seconds         935822 non-null int64
NumberPeople    935822 non-null int64
Actitivity      935822 non-null object
Document        935812 non-null object
Category        935822 non-null object
Productivity    935822 non-null int64
dtypes: int64(3), object(4)
memory usage: 57.1+ MB


In [36]:
activities.describe()

Unnamed: 0,Seconds,NumberPeople,Productivity
count,935822.0,935822.0,935822.0
mean,62.063894,1.0,0.905027
std,196.56013,0.0,1.243194
min,1.0,1.0,-2.0
25%,2.0,1.0,0.0
50%,9.0,1.0,1.0
75%,38.0,1.0,2.0
max,7428.0,1.0,2.0


In [37]:
# pivot table 
# activities.pivot(index='Date', columns='Category', values='Seconds')
# temp.pivot(columns='Category', values='Seconds')

In [38]:
activities.to_csv("data/rescuetime-full-data-export.csv")