# **Finding User Purchases**

Identify returning active users by finding users who made a second purchase within 1 to 7 days after their first purchase.

Ignore same-day purchases. Output a list of these `user_ids`.

### **Understanding The Data**
We have a Pandas DataFrame amazon_transactions:

- `user_id`

- `created_at` (timestamp)

We want to detect users whose second purchase was within 1-7 days of the first.

### **The Problem**
We:

1. Convert timestamps to dates.

2. Remove duplicates.

3. Rank purchases.

4. Pivot to first and second dates.

5. Filter by day difference.

In [None]:
# Import
import pandas as pd

# Started here!
amazon_transactions['created_at'] = pd.to_datetime(amazon_transactions['created_at'])

# determined the first purchase date for each user
first_transaction = amazon_transactions.groupby('user_id', as_index=False)['created_at'].min()

# merge first and current transaction df
merged_transactions = pd.merge(amazon_transactions, first_transaction, on="user_id", suffixes=("","_first")).rename(columns={
    "created_at": "current_trans",
    "created_at_first": "first_trans"
}).drop(columns=['id'])

# compute difference between current and first transaction
merged_transactions['difference'] = merged_transactions['current_trans'].dt.day - merged_transactions['first_trans'].dt.day
columns_of_interest = ['user_id', 'item', 'current_trans', 'revenue', 'first_trans', 'difference']

# select users that made purchase 1-7 days after the first purchase
active_users = merged_transactions[merged_transactions['difference'].between(1,7)][columns_of_interest].sort_values(by="user_id")

# get unique users
active_users.user_id.unique()