## Problem Statement
The input data has information related to some players playing a game. 

Each row indicates the player id and the start-time of a game. 

For each entry in the input data, we want to analyze how many times the player played the game before/after 5 minutes of the current record.

**For example, let's consider the below sample:**
```
trans_id, player_id, start_time
0, A, 09:15
1, B, 09:18
2, A, 09:16
3, A, 09:20
4, B, 09:25
5, B, 09:35
6, B, 09:38
7, A, 09:40
```

**Then we want the following as output:**
```
trans_id, player_id, start_time, count
0, A, 09:15, 2
1, B, 09:18, 1
2, A, 09:16, 3
3, A, 09:21, 2
4, B, 09:25, 1
5, B, 09:35, 2
6, B, 09:38, 2
7, A, 09:40, 1
```

**Explanation:**
For example, for the 3rd transaction (trans_id: 2) player A, started the game at 09:16.
He played the same game at 09:15, 09:21, 09:40
So, the 5 mins before/after the current record (transaction id: 2, start_time: 09:16) are 09:15 (trans_id: 0) and 09:21 (trans_id: 3). The last transaction (trans_id: 7) started at 9:40 which doesn't fall in the 5-minute boundary, so we do not count it. 

Hence the count value for transaction 2 is 2 (09:15, 09:21)+ 1(including itself) = 3

In [1]:
import time
import numpy as np
import pandas as pd

cust_id = list("ABCDE")
st_time = pd.Series(pd.date_range("2025-01-01 09:00:00", "2025-01-01 09:30:00", freq="min"))

def gen_random_data(N):
    df = pd.DataFrame()
    df["trans_id"] = range(N)
    df["player_id"] = np.random.choice(cust_id, N)
    df["start_time"] = np.random.choice(st_time, N)
    return df

In [50]:


def get_count(data_all, PLAYER_ID, START_TIME):
        data_all["left"] = data_all[START_TIME] - pd.offsets.Minute(5)
        data_all["right"] = data_all[START_TIME] + pd.offsets.Minute(5)

        def helper(x):
            tmp = data_all[(data_all[PLAYER_ID] == x[PLAYER_ID])
                                   & (x["left"] <= data_all[START_TIME])
                                   & (data_all[START_TIME] <= x["right"])]
            return len(tmp)

        data_all["count"] = data_all.apply(lambda x: helper(x), axis=1)
        return data_all[["trans_id", "player_id", "start_time", "count"]]

In [51]:
sample = pd.DataFrame()
sample["trans_id"] = range(8)
sample["player_id"] = ["A", "B", "A", "A", "B", "B", "B", "A"]
sample["start_time"] = pd.Series(pd.to_datetime(["09:15", "09:18", "09:16", "09:21", "09:25", "09:35", "09:38", "09:40"], format="%H:%M"))
sample

Unnamed: 0,trans_id,player_id,start_time
0,0,A,1900-01-01 09:15:00
1,1,B,1900-01-01 09:18:00
2,2,A,1900-01-01 09:16:00
3,3,A,1900-01-01 09:21:00
4,4,B,1900-01-01 09:25:00
5,5,B,1900-01-01 09:35:00
6,6,B,1900-01-01 09:38:00
7,7,A,1900-01-01 09:40:00


In [52]:
%%time
get_count(sample, "player_id", "start_time")

CPU times: user 5.69 ms, sys: 10.4 ms, total: 16.1 ms
Wall time: 15.1 ms


Unnamed: 0,trans_id,player_id,start_time,count
0,0,A,1900-01-01 09:15:00,2
1,1,B,1900-01-01 09:18:00,1
2,2,A,1900-01-01 09:16:00,3
3,3,A,1900-01-01 09:21:00,2
4,4,B,1900-01-01 09:25:00,1
5,5,B,1900-01-01 09:35:00,2
6,6,B,1900-01-01 09:38:00,2
7,7,A,1900-01-01 09:40:00,1


In [55]:
df = gen_random_data(10**4)
df

Unnamed: 0,trans_id,player_id,start_time
0,0,B,2025-01-01 09:04:00
1,1,C,2025-01-01 09:28:00
2,2,A,2025-01-01 09:24:00
3,3,E,2025-01-01 09:24:00
4,4,A,2025-01-01 09:23:00
...,...,...,...
9995,9995,B,2025-01-01 09:14:00
9996,9996,A,2025-01-01 09:15:00
9997,9997,B,2025-01-01 09:24:00
9998,9998,A,2025-01-01 09:11:00


In [56]:
%%time
get_count(df, "player_id", "start_time")

CPU times: user 10 s, sys: 32.3 ms, total: 10.1 s
Wall time: 10.1 s


Unnamed: 0,trans_id,player_id,start_time,count
0,0,B,2025-01-01 09:04:00,698
1,1,C,2025-01-01 09:28:00,493
2,2,A,2025-01-01 09:24:00,744
3,3,E,2025-01-01 09:24:00,720
4,4,A,2025-01-01 09:23:00,746
...,...,...,...,...
9995,9995,B,2025-01-01 09:14:00,703
9996,9996,A,2025-01-01 09:15:00,709
9997,9997,B,2025-01-01 09:24:00,679
9998,9998,A,2025-01-01 09:11:00,706
