Have something you would like to do but can't get started with python? Write a comment in this Kernel and I will try to get you some starter code for working with the desired data. 

Current sections 

 - how to add region information to match.csv
 - how to add patch version to match.csv
 - join purchase_log.csv and match.csv

**NOTE** This is a working document so things will be a little messy at times. I will try to regularly tidy things and leave notes as to the status of various sections. 

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns

from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

# Any results you write to the current directory are saved as output.

###Add region information to match table
*This sections is usable* 

Left join using pandas.merge() on cluster, for the tables match.csv, and cluster_region.csv.  
Take a look at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.merge.html for information on merge.  Additionally the documentation on on this page http://pandas.pydata.org/pandas-docs/stable/merging.html was helpful. 

In [None]:
match = pd.read_csv('../input/match.csv')
cluster_regions = pd.read_csv('../input/cluster_regions.csv')

In [None]:
match.iloc[:5,8:]

In [None]:
cluster_regions.head()

In [None]:
match = pd.merge(match, cluster_regions, how='left',left_on='cluster', right_on='cluster')

In [None]:
match.iloc[:5,8:]

###Patch Version
*requires cleaning up*

Shows how to get patch version from start time. 

In [None]:
patch_dates = pd.read_csv('../input/patch_dates.csv')
patch_dates.iloc[:,0] = pd.to_datetime(patch_dates['patch_date'])
patch_dates.iloc[-5:]

In [None]:
match.loc[:,'start_time'] = pd.to_datetime(match.loc[:,'start_time'], unit='s')

In [None]:
match.iloc[:5,:5]

In [None]:
def get_patch_version(start_time, patch_date):
    """Determine patch version based on date
    
       There are faster ways to do this if processing more data, 
       This also fails in edge cases
    """
    for e,i in enumerate(patch_date['patch_date']):
        if start_time <= i:
            return patch_date.iloc[e-1,1]

In [None]:
match.loc[:,'patch_version'] = match.loc[:,'start_time'].apply(get_patch_version, patch_date=patch_dates)

In [None]:
match.loc[:,['start_time','patch_version']].head()

In [None]:
match.loc[:,['start_time','patch_version']].tail()

In [None]:
test_matches = pd.read_csv('../input/test_labels.csv')
test_players = pd.read_csv('../input/test_player.csv')

In [None]:
test_matches.head()

In [None]:
test_players.head()

From this it becomes apparent that patch version is not available in the test set. I will fix this for the next update of the data. 

In [None]:
%xdel test_matches
%xdel test_players

### Join purchase log with match
*Unpolished could use editing* 

This should point in one of the possible directions to join purchase_log, with match. Aggregation of some sort is required. This example finds the time of the first tpscroll purchase

In [None]:
purchase_log = pd.read_csv('../input/purchase_log.csv')

In [None]:
purchase_log.head()

In [None]:
purchase_log['player_slot'].value_counts()

In [None]:
players_info = pd.read_csv('../input/players.csv',usecols=['match_id','account_id','player_slot'])

In [None]:
players_info.head(10)

In [None]:
match.head()

purchase_log cannot be merged with *players* or *match* immediately.

In [None]:
example_log = purchase_log.query('match_id == 0 and player_slot == 0').copy()
example_log

Lets first replace the item_ids with item names to make this a little more intelligible. 

In [None]:
item_id_names = pd.read_csv('../input/item_ids.csv')
item_id_names.head()

In [None]:
# for the above example we can use item_id_names to to replace the item ids. For the full data this would probably 
# cause a memory error as pandas is memory inefficient when replacing. 
example_log.loc[:, 'item_id'].replace(item_id_names['item_id'].values.tolist(),
                                      item_id_names['item_name'].values.tolist(), inplace=True)

In [None]:
example_log.head()

In [None]:
# now for instance say we were only interested in the time of first tpscroll purchase
first_tp_purch = example_log[example_log['item_id'] == 'tpscroll'].iloc[0,1]
first_tp_purch

In [None]:
# lets turn this into a function and see if it fails for when there are no tpscroll purchases
# note I didn't ac
def first_tpscroll_purch(log,item_id_names):
    """
    :param log: all item purchases for a single player during a single match
    :return: time of first tpscroll purchase
    """
    log.loc[:, 'item_id'].replace(item_id_names['item_id'].values.tolist(),
                                  item_id_names['item_name'].values.tolist(), inplace=True)
    return log[log['item_id'] == 'tpscroll'].iloc[0,1]

In [None]:
# now use groupby on purchase log, 
purch_group = purchase_log.groupby(['match_id','player_slot'])

In [None]:
# take a look at a group
for group in purch_group:
    print(group)
    break

In [None]:
# note this is unfinished

first_purch_arr = np.zeros((100, 3)) # going to start with 100 to see if it breaks anything
for e,group in enumerate(purch_group):
    purch_time = first_tpscroll_purch(group[1].copy(),item_id_names)
    first_purch_arr[e,2] = purch_time
    first_purch_arr[e,0] = group[0][0] # match_id
    first_purch_arr[e,1] = group[0][1] # player_slot
    if e > 98:
        break

In [None]:
first_purch_df = pd.DataFrame(first_purch_arr.astype(int), columns=['match_id','player_slot','tpscroll_first_purch'])

In [None]:
first_purch_df

This can now be merged with *player*

In [None]:
tmp = pd.merge(first_purch_df, players_info, how='left',
               left_on=['match_id','player_slot'],
               right_on=['match_id','player_slot'])

In [None]:
# now merge with with match if you want but this is creating redundent info
# 
tmp2 = pd.merge(tmp, match, how='left',
               left_on=['match_id'],
               right_on=['match_id'])

In [None]:
tmp

In [None]:
tmp2.head()

The above can easily cause a memory error if done on larger amount of data, some more aggregation is probably a better idea then merging with *match*