# Full version history
Natalia Vélez, June 2020


So far, we've tried different data sources to generate the version history:
 
 * OHOL Wiki (December 2019) - Out of date
 * Scraped from [changeLog](https://github.com/jasonrohrer/OneLife/blob/master/documentation/changeLog.txt) (January 2020) - Skips over certain versions
 
Now, we're going to just read the release history directly from the [OneLifeData7 repository](https://github.com/jasonrohrer/OneLifeData7/releases). I navigated to an up-tod-date, local copy of the repository and saved the tag history using the following bash command:

```
git for-each-ref --format="%(refname:short)$(echo -e '\t')%(creatordate)" refs/tags/* > onelife_tag_timestamps_20190617.txt
```

Then saved the resulting .txt file to `inputs/`, here. In this script, we're just going to clean this up and parse the timestamp.

In [1]:
import pandas as pd
import datetime, re

Input file:

In [2]:
log_original = pd.read_csv('inputs/onelife_tag_timestamps_20200617.txt', sep='\t', names=['release','timestamp'])
log_original.head()

Unnamed: 0,release,timestamp
0,OneLife_v101,Fri May 25 23:39:23 2018 +0000
1,OneLife_v103,Fri Jun 1 01:44:06 2018 +0000
2,OneLife_v104,Fri Jun 1 17:06:36 2018 +0000
3,OneLife_v106,Mon Jun 4 20:00:16 2018 +0000
4,OneLife_v108,Sat Jun 9 00:57:56 2018 +0000


Clean up:

In [3]:
?pd.DataFrame.sort_values

In [4]:
# Helper: Parse Git timestamp
def parse_tstamp(s): return datetime.datetime.strptime(s, '%a %b %d %H:%M:%S %Y %z')

log_df = log_original.copy()
log_df['release'] = log_df.release.str.replace('vStart', 'v0')
log_df['release'] = log_df.release.str.extract(r'([0-9]+)')
log_df['release'] = pd.to_numeric(log_df.release)

log_df['timestamp'] = log_df.timestamp.apply(parse_tstamp)
log_df['timestamp'] = log_df.timestamp.apply(lambda t: t.timestamp())

log_df = log_df.sort_values('timestamp', ignore_index = True)
log_df.head()


Unnamed: 0,release,timestamp
0,0,1490908000.0
1,16,1492207000.0
2,17,1492226000.0
3,19,1492472000.0
4,20,1495232000.0


Save to file:

In [5]:
log_df.to_csv('outputs/version_history.tsv', sep='\t', index=None)