You are given a [dataset](https://raw.githubusercontent.com/erood/interviewqs.com_code_snippets/master/Datasets/sample_message_dataset.csv) with information around messages sent between users in a P2P messaging application. Below is the dataset's schema:

In [64]:
#@title
import numpy as np
import pandas as pd

In [3]:
#@title
pd.DataFrame(
    data=[
          ["date", "string", "date of the message sent/received, format is 'YYYY-mm-dd'"],
          ["timestamp", "integer", "timestamp of the message sent/received, epoch seconds"],
          ["sender_id", "integer", "id of the message sender"],
          ["receiver_id", "integer", "id of the message receiver"]
          ], 
    columns=["Column Name", "Data Type", "Description"]
)

Unnamed: 0,Column Name,Data Type,Description
0,date,string,"date of the message sent/received, format is '..."
1,timestamp,integer,"timestamp of the message sent/received, epoch ..."
2,sender_id,integer,id of the message sender
3,receiver_id,integer,id of the message receiver


Given this, write code to find the fraction of messages that are sent between the same sender and receiver within five minutes (e.g. the fraction of messages that receive a response within 5 minutes). 

In [5]:
df = pd.read_csv("https://raw.githubusercontent.com/erood/interviewqs.com_code_snippets/master/Datasets/sample_message_dataset.csv")
df

Unnamed: 0,date,timestamp,sender_id,receiver_id
0,2018-03-01,1519923378,1,5
1,2018-03-01,1519942810,1,4
2,2018-03-01,1519918950,1,5
3,2018-03-01,1519930114,1,2
4,2018-03-01,1519920410,1,2
...,...,...,...,...
81,2018-03-01,1519866004,4,7
82,2018-03-01,1519912343,3,8
83,2018-03-01,1519896177,4,8
84,2018-03-01,1519878235,3,8


In [63]:
# Create auxiliary column to identify messages between the same two people
df["user_ids"] = df.apply(
    lambda x: "".join([str(c) for c in [x.sender_id, x.receiver_id]]),
    axis=1
)
df["next_user_ids"] = df.user_ids.shift(-1)

In [53]:
# Sort by people and ts
df = df.sort_values(["user_ids", "timestamp"])

In [82]:
# Compute de difference between message ts
df["ts_difference"] = np.nan
df.ts_difference.where(df.user_ids != df.next_user_ids, df.timestamp.shift(-1) - df.timestamp, inplace=True)

In [72]:
# Flag messages that receive a response within 5 minutes
df["next_message_within_5"] = 1
df.next_message_within_5 = df.next_message_within_5.where(df.ts_difference/60 < 5, 0)

In [81]:
print(f"The percentage of messages that receive a response within 5 minutes is {(df.next_message_within_5 == 1).sum() / df.next_message_within_5.count():.0%}")

The percentage of messages that receive a response within 5 minutes is 3%
