<a href="https://colab.research.google.com/github/invisilico/DigitalRhythmsProject/blob/master/AndroidTimestampParser.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# HTML version My Activity timestamp extractor. 


---


Accepts all variations of timestamps and stores timezones separately (see dataframe structure below). Works only with the HTML version of android "My Activity" files obtained from takeout.google.com . 


```
# DataFrame Structure

App       TimeStamp             TimeZone

Appname   YYYY-MM-DD HH:MM:SS   UTC
Appname   YYYY-MM-DD HH:MM:SS   UTC
Appname   YYYY-MM-DD HH:MM:SS   UTC
Appname   YYYY-MM-DD HH:MM:SS   UTC
```



In [3]:
#@title Set-Up 
#@markdown This cell **must** be run. Other cells will not function without doing so.

from google.colab import files
import os
import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt
from datetime import datetime
from dateutil.parser import parse

In [None]:
#@title Upload data file
#@markdown Consider using  trial datasets for testing purposes.
file = files.upload()

os.rename(r'My Activity.html',r'actdata.txt')
with open('actdata.txt','r') as file:
  data = file.readlines()
print("Data loaded from file.")
!rm *.* #removes file, bash shell command
print("File deleted from Colab, verify in files on left panel.")

In [None]:
#@title Run to load example data file

!curl https://raw.githubusercontent.com/invisilico/DigitalRhythmsProject/master/Sample%20Datasets/My%20Activity.tar.gz --output MyActivity.tar.gz
!tar -xvf  'MyActivity.tar.gz'

os.rename(r'My Activity.html',r'actdata.txt')
with open('actdata.txt','r') as file:
  data = file.readlines()
print("Data loaded from file.")
!rm *.* #removes file, bash shell command
print("File deleted from Colab, verify in files on left panel.")

In [None]:
#@title Build Dataframe

actdat = data[32]

preapp = [app.end(0) for app in re.finditer('<p class="mdl-typography--title">', actdat)] 
postapp = [app.start(0) for app in re.finditer('<br></p></div><div class="content-cell mdl-cell mdl-cell--6-col mdl-typography--body-1">', actdat)]
posttime = [time.start(0) for time in re.finditer('</div><div class="content-cell mdl-cell mdl-cell--6-col mdl-typography--body-1 mdl-typography--text-right">', actdat)]

appname = []
datetime = []
timezone = []

for i in range(len(posttime)):

  appname += [actdat[preapp[i]:postapp[i]]]

  stamp = actdat[posttime[i]-30:posttime[i]]
  idx = [app.end(0) for app in re.finditer('>', stamp)]
  if len(idx) > 0:
    stamp = stamp[idx[-1]:]

  datetime += [parse(stamp[:-4])]
  timezone += [stamp[-3:]]

dataframe = pd.DataFrame(list(zip(appname,datetime,timezone)),columns = ['App','Timestamp','TimeZone'])
print(dataframe.head())

In [None]:
#@title Privacy Filter
#@markdown Removes app names for privacy when sharing data. All apps are renamed to "app" with exception of clock/alarm apps which are renamed to "clock"

dataframe.loc[dataframe['App'].str.contains('clock', case=False), 'App'] = 'clock'
dataframe.loc[dataframe['App'] != "clock", "App"] = "app"

print("appnames have been removed, replaced with "+str(dataframe.App.unique())+"\n\n")
print(dataframe.head())

In [None]:
#@title Download dataframe as csv
dataframe.to_csv("AllData.csv")
files.download("AllData.csv")

In [None]:
#@title Clean Slate
#@markdown Deletes all files from colab when run. 
#@markdown
#@markdown (run AFTER downloading CSVs)
!rm *.*
print("Done and Dusted!")