# Full version history
Natalia Vélez, June 2020


So far, we've tried different data sources to generate the version history:
 
 * OHOL Wiki (December 2019) - Out of date
 * Read the release history directly from the [OneLife repository](https://github.com/jasonrohrer/OneLife/releases)
 * Read the release history directly from the [OneLifeData7 repository](https://github.com/jasonrohrer/OneLifeData7/releases)
 
 
The last two sources have a lot of overlap, but there are some releases that appear in the game updates, but not in the data updates, and vice versa. (This includes some very important releases, such as the boundless world update.) In this script, we're going to scrape these two sources and get the union between the two to get a more complete version history.

In [1]:
import pandas as pd
import numpy as np
import datetime, re, requests

# Building release history

I navigated to an up-to-date clone of the OneLifeData7 and OneLife repositories and saved the tag history to a file using the following bash command:

```
git for-each-ref --format="%(refname:short)$(echo -e '\t')%(creatordate)" refs/tags/* > onelife_tag_timestamps_20190709.txt
```


Read tag history:

In [2]:
# Source: jasonrohrer/OneLifeData7
tag_onelifedata = pd.read_csv('inputs/onelifedata_tag_timestamps_2021-04-07.txt', sep='\t', names=['release','timestamp'])

# Source: jasonrohrer/OneLife
tag_onelife = pd.read_csv('inputs/onelife_tag_timestamps_2021-04-07.txt', sep='\t', names=['release','timestamp'])

# Merge
tag_orig = pd.concat([tag_onelifedata, tag_onelife])
tag_orig['timestamp'] = pd.to_datetime(tag_orig.timestamp)
tag_orig = tag_orig.sort_values(by = 'timestamp')

tag_orig.head()

Unnamed: 0,release,timestamp
1,OneLife_v1,2016-12-29 14:58:31-08:00
138,OneLife_v5,2017-01-03 11:38:36-08:00
154,OneLife_v8,2017-01-10 08:19:18-08:00
20,OneLife_v14,2017-01-20 17:12:38-08:00
196,OneLife_vStart,2017-03-30 14:10:24-07:00


Clean up:

In [3]:
tag_df = tag_orig.copy()
tag_df['release'] = tag_df.release.str.replace('vStart', 'v0')
tag_df['release'] = tag_df.release.str.extract(r'([0-9]+)')
tag_df['release'] = pd.to_numeric(tag_df.release)

tag_df['timestamp'] = tag_df.timestamp.apply(lambda t: t.timestamp())

tag_df = tag_df.sort_values('timestamp', ignore_index = True)
tag_df = tag_df[np.isfinite(tag_df['release']) & (tag_df['release'] > 0)]
tag_df['release'] = tag_df['release'].astype(np.int)
tag_df

Unnamed: 0,release,timestamp
0,1,1.483052e+09
1,5,1.483472e+09
2,8,1.484065e+09
3,14,1.484961e+09
5,16,1.492207e+09
...,...,...
350,359,1.600734e+09
351,360,1.601062e+09
352,361,1.601071e+09
353,362,1.603995e+09


Check duplicate tags?

In [4]:
data_releases = tag_onelifedata.release.values
game_releases = tag_onelife.release.values

print('OneLifeData7 releases: %i entries' % len(data_releases))
print('OneLife releases: %i entries' % len(game_releases))
print('Latest update: %i' % np.max(tag_df.release))

OneLifeData7 releases: 197 entries
OneLife releases: 159 entries
Latest update: 363


Overlaps between the two?

In [5]:
duplicate_releases = np.intersect1d(data_releases, game_releases)
print('Overlaps: %s' % len(duplicate_releases))
print(duplicate_releases)

Overlaps: 2
['OneLife_v16' 'OneLife_v20']


In [6]:
tag_df_nodupe = tag_df.groupby('release')['timestamp'].min().reset_index()
tag_df_nodup = tag_df_nodupe.sort_values(by='timestamp')
tag_df_nodupe

Unnamed: 0,release,timestamp
0,1,1.483052e+09
1,5,1.483472e+09
2,8,1.484065e+09
3,14,1.484961e+09
4,16,1.492207e+09
...,...,...
347,359,1.600734e+09
348,360,1.601062e+09
349,361,1.601071e+09
350,362,1.603995e+09


Save to file:

In [7]:
tag_df_nodupe.to_csv('outputs/version_history.tsv', sep='\t', index=None)