# Word Counter Data Collector and Parser`

**Extract all of your raw writing stats from WordCounter, broken down by date, app, hour and word count**. 

For some basic data analysis and data visualization, see: [wordcounter_data_analysis.ipynb](https://github.com/markwk/qs_ledger/tree/master/wordcounter/WordCounter_data_analysis.ipynb)

NOTE: In order to use this code you'll need to have installed and been tracking your word count typing with [WordCounter App](https://wordcounterapp.com/) for Mac. 

For additional Mac integration and reports checkout, [Alfred Integration for Word Counter App](https://github.com/markwk/alfred-workflow-wordcounterapp)

------

## Collect, Extract and Process Stats

### Parse and Extract to Hourly Stats

In [4]:
import os
import plistlib
from datetime import datetime as dt
import numpy as np
import pandas as pd

In [5]:
import warnings
warnings.filterwarnings('ignore')

In [9]:
# read PLIST for WordCounter Tracking Logs
records = os.path.expanduser('~/Library/Application Support/WordCounter/app_records.plist')
# data = plistlib.load(records) 
# Python3 depreciation fix:
with open(records, 'rb') as f:
    data = plistlib.load(f)

In [11]:
# Extract Single Row for Each Date, App and Hourly Count
print("Extracting and Parsing WordCounter History")
print("NOTE: This Process May Take Several Minutes, especially if you have a lot past data")

# create blank df
df_wordcounter = pd.DataFrame(columns=['date','app','hour','count'])

for date in data.keys():
        apps = data[date]
        for app in apps:
            app_name = app['id']
            items = app['counts']
            hour = 0
            for i in items:
                df_wordcounter = df_wordcounter.append({
                    'date' : date, 
                    'app' : app_name,
                    'hour': hour,
                    'count': i
                    } , ignore_index=True)
                hour = hour + 1

Extracting and Parsing WordCounter History
NOTE: This Process May Take Several Minutes, especially if you have a lot past data


In [12]:
len(df_wordcounter)

66000

### Data Processing

In [13]:
df_wordcounter['date'] = pd.to_datetime(df_wordcounter['date'])

# extract date dimensions
df_wordcounter['year'] = df_wordcounter['date'].dt.year
get_month = lambda x: '{}-{:02}'.format(x.year, x.month)
df_wordcounter['month'] = df_wordcounter['date'].map(get_month)
df_wordcounter['dow'] = df_wordcounter['date'].dt.weekday

### Export to csv

In [14]:
df_wordcounter.to_csv("data/wordcounter_hourly.csv")