# Full version history
Natalia VÃ©lez, June 2020


So far, we've tried different data sources to generate the version history:
 
 * OHOL Wiki (December 2019) - Out of date
 * Scraped from [changeLog](https://github.com/jasonrohrer/OneLife/blob/master/documentation/changeLog.txt) (January 2020)
 * Read the release history directly from the [OneLifeData7 repository](https://github.com/jasonrohrer/OneLifeData7/releases)
 
 
The last two sources have a lot of overlap, but there are some releases that appear in the release history, but not in the change log, and vice versa. (This includes some very important releases, such as the boundless world update.) In this script, we're going to scrape these two sources and get the union between the two to get a more complete version history.

In [1]:
import pandas as pd
import numpy as np
import datetime, re, requests

## Source 1: Release history

I navigated to an up-to-date clone of the OneLifeData7 repository and saved the tag history to a file using the following bash command:

```
git for-each-ref --format="%(refname:short)$(echo -e '\t')%(creatordate)" refs/tags/* > onelife_tag_timestamps_20190617.txt
```


Read tag history:

In [2]:
tag_orig = pd.read_csv('inputs/onelife_tag_timestamps_20200617.txt', sep='\t', names=['release','timestamp'])
tag_orig.head()

Unnamed: 0,release,timestamp
0,OneLife_v101,Fri May 25 23:39:23 2018 +0000
1,OneLife_v103,Fri Jun 1 01:44:06 2018 +0000
2,OneLife_v104,Fri Jun 1 17:06:36 2018 +0000
3,OneLife_v106,Mon Jun 4 20:00:16 2018 +0000
4,OneLife_v108,Sat Jun 9 00:57:56 2018 +0000


Clean up:

In [3]:
# Helper: Parse Git timestamp
def parse_tstamp(s): return datetime.datetime.strptime(s, '%a %b %d %H:%M:%S %Y %z')

tag_df = tag_orig.copy()
tag_df['release'] = tag_df.release.str.replace('vStart', 'v0')
tag_df['release'] = tag_df.release.str.extract(r'([0-9]+)')
tag_df['release'] = pd.to_numeric(tag_df.release)

tag_df['timestamp'] = tag_df.timestamp.apply(parse_tstamp)
tag_df['timestamp'] = tag_df.timestamp.apply(lambda t: t.timestamp())

tag_df = tag_df.sort_values('timestamp', ignore_index = True)
tag_df.head()

Unnamed: 0,release,timestamp
0,0,1490908000.0
1,16,1492207000.0
2,17,1492226000.0
3,19,1492472000.0
4,20,1495232000.0


## Source 2: Change log

Download text file:

In [4]:
log_url = 'https://raw.githubusercontent.com/jasonrohrer/OneLife/master/documentation/changeLog.txt'
log_request = requests.get(log_url)
log_txt = log_request.text

print(log_txt)

This file only list changes to game code.  Changes to content can be found here:

http://onehouronelife.com/updateLog.php


Server Fixes

--Fixed family tree server to accept longer last words.

--Fixed bug that caused Eve spawn to jump to the west of a donkeytown homeland.
  Fixes #633.





Version 342    2020-May-30

NOTE:  must update server first this week.
NOTE:  must disable clearCurseCountsOnStartup after servers restart this week.

--Player character now flips facing direction as mouse moves to left and right.
  Fixes #623

--Object editor page shows current object ID.

--Fixed crash in Animation editor when right-click dragging with empty object.

--I FORGIVE YOU (or I FORGIVE JOHN SMITH) to forgive a personal curse.



Server Fixes

--Server no longer disconnects client when it sends an unknown message type.
  For future protocol updates, this allows client to be updated before server.

--Picking up a 3+ child to nurse them clears their starving emote.  Fixes #626

--CurseDB

Extract version numbers and dates from log:

In [5]:
search_txt = 'Version ([0-9]+)\s+([0-9]+-[A-Za-z]+-[0-9]+)'
log_search = re.findall(search_txt, log_txt)
print(*log_search[:10], sep='\n')

('342', '2020-May-30')
('340', '2020-May-22')
('337', '2020-May-15')
('336', '2020-May-8')
('334', '2020-May-7')
('332', '2020-May-1')
('330', '2020-April-23')
('328', '2020-April-17')
('326', '2020-April-9')
('324', '2020-April-2')


Save results to dataframe:

In [6]:
?datetime.datetime.strptime

In [7]:
log_df = pd.DataFrame(log_search, columns=['release', 'timestamp'])
log_df.release = pd.to_numeric(log_df.release)
log_df.timestamp = pd.to_datetime(log_df.timestamp)
log_df.timestamp = log_df.timestamp.apply(lambda t: t.timestamp())

log_df = log_df.sort_values('timestamp', ignore_index = True)
log_df.head()

Unnamed: 0,release,timestamp
0,26,1498867000.0
1,27,1498867000.0
2,28,1500595000.0
3,30,1501200000.0
4,32,1502755000.0


## Merge and save

Original dataframe dimensions:

In [8]:
tag_releases = tag_df.release.values
log_releases = log_df.release.values

print('Release history: %i entries' % len(tag_releases))
print('Change logs: %i entries' % len(log_releases))

Release history: 185 entries
Change logs: 140 entries


Overlaps between the two?

In [9]:
duplicate_releases = np.intersect1d(tag_releases, log_releases)
print('Overlaps: %s' % len(duplicate_releases))
print(duplicate_releases)

Overlaps: 1
[27]


Where there are duplicates, use the timestamp in the release history instead (more accurate):

In [10]:
log_df_nodupe = log_df[~log_df.release.isin(duplicate_releases)].reset_index(drop=True)
log_df_nodupe

Unnamed: 0,release,timestamp
0,26,1.498867e+09
1,28,1.500595e+09
2,30,1.501200e+09
3,32,1.502755e+09
4,35,1.510704e+09
...,...,...
134,334,1.588810e+09
135,336,1.588896e+09
136,337,1.589501e+09
137,340,1.590106e+09


Merge dataframes:

In [11]:
ver_df = pd.concat([tag_df,log_df_nodupe]).drop_duplicates().reset_index(drop=True)
ver_df

Unnamed: 0,release,timestamp
0,0,1.490908e+09
1,16,1.492207e+09
2,17,1.492226e+09
3,19,1.492472e+09
4,20,1.495232e+09
...,...,...
319,334,1.588810e+09
320,336,1.588896e+09
321,337,1.589501e+09
322,340,1.590106e+09


Save to file:

In [12]:
ver_df.to_csv('outputs/version_history.tsv', sep='\t', index=None)