# Day 3: Disney Parks Guest Spending Behavior

You are a data analyst working with the Disney Parks revenue team to understand nuanced guest spending patterns across different park experiences. The team wants to develop a comprehensive view of visitor purchasing behaviors. Your goal is to uncover meaningful insights that can drive personalized marketing strategies.

In [1]:
import pandas as pd
import numpy as np

fct_guest_spending_data = [
  {
    "guest_id": 1,
    "visit_date": "2024-07-05",
    "amount_spent": 50,
    "park_experience_type": "Attraction"
  },
  {
    "guest_id": 2,
    "visit_date": "2024-07-06",
    "amount_spent": 30,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 3,
    "visit_date": "2024-07-10",
    "amount_spent": 20.5,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 4,
    "visit_date": "2024-07-12",
    "amount_spent": 40,
    "park_experience_type": "Entertainment"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-07-15",
    "amount_spent": 35,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 5,
    "visit_date": "2024-07-20",
    "amount_spent": 60,
    "park_experience_type": "Attraction"
  },
  {
    "guest_id": 6,
    "visit_date": "2024-07-25",
    "amount_spent": 25,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-08-03",
    "amount_spent": 55,
    "park_experience_type": "Attraction"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-08-15",
    "amount_spent": 45,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 2,
    "visit_date": "2024-08-05",
    "amount_spent": 22,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 2,
    "visit_date": "2024-08-20",
    "amount_spent": 38,
    "park_experience_type": "Entertainment"
  },
  {
    "guest_id": 7,
    "visit_date": "2024-08-10",
    "amount_spent": 15,
    "park_experience_type": "Character Meet"
  },
  {
    "guest_id": 3,
    "visit_date": "2024-08-25",
    "amount_spent": 28,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 3,
    "visit_date": "2024-08-27",
    "amount_spent": 32,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-09-02",
    "amount_spent": 65,
    "park_experience_type": "Attraction"
  },
  {
    "guest_id": 8,
    "visit_date": "2024-09-05",
    "amount_spent": 50,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 9,
    "visit_date": "2024-09-15",
    "amount_spent": 40,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 10,
    "visit_date": "2024-09-20",
    "amount_spent": 70,
    "park_experience_type": "Entertainment"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-09-25",
    "amount_spent": 35,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 8,
    "visit_date": "2024-09-28",
    "amount_spent": 10,
    "park_experience_type": "Character Meet"
  }
]
fct_guest_spending = pd.DataFrame(fct_guest_spending_data)


## Question 1

What is the average spending per guest per visit for each park experience type during July 2024? Ensure that park experience types with no recorded transactions are shown with an average spending of 0.0. This analysis helps establish baseline spending differences essential for later segmentation.

In [3]:
fct_guest_spending['visit_date'] = pd.to_datetime(fct_guest_spending['visit_date'])

all_experience_types = pd.DataFrame({
    'park_experience_type': fct_guest_spending['park_experience_type'].unique()
})

# Filter July data
july_data = fct_guest_spending[
  (fct_guest_spending['visit_date'].dt.month == 7) &
  (fct_guest_spending['visit_date'].dt.year == 2024)
]

# Spending per guest per visit by park experience type
guest_visit_spending = (
  july_data
  .groupby(['guest_id', 'visit_date', 'park_experience_type'], as_index = False)
  .agg(total_spent = ('amount_spent', 'sum'))
)

# avg spending per guest per visit for each park experience type
avg_spending = (
  guest_visit_spending
  .groupby('park_experience_type', as_index = False)
  .agg(avg_spending_per_guest_per_visit = ('total_spent', 'mean'))
)

# Final result
final_result = (
    all_experience_types
    .merge(avg_spending, on='park_experience_type', how='left')
    .fillna({'avg_spending_per_guest_per_visit': 0.0})
    .sort_values('park_experience_type')
)

print(final_result)

  park_experience_type  avg_spending_per_guest_per_visit
0           Attraction                             55.00
4       Character Meet                              0.00
1               Dining                             32.50
3        Entertainment                             40.00
2               Retail                             22.75


## Question 2

For guests who visited our parks more than once in August 2024, what is the difference in spending between their first and their last visit? This investigation, using sequential analysis, will reveal any shifts in guest spending behavior over multiple visits.

In [4]:
# Filter August data
aug_data = fct_guest_spending[
   (fct_guest_spending['visit_date'] >= '2024-08-01') &
   (fct_guest_spending['visit_date'] < '2024-09-01')
 ]

# Total spending per guest per visit date
guest_daily_spend = (
    aug_data
    .groupby(['guest_id', 'visit_date'], as_index=False)
    .agg(total_spent=('amount_spent', 'sum'))
)

# Count how many visits each guest made
visit_counts = guest_daily_spend['guest_id'].value_counts()
repeat_visitors = visit_counts[visit_counts > 1].index

# Filter only those guests who visited more than once
repeat_visits = guest_daily_spend[guest_daily_spend['guest_id'].isin(repeat_visitors)]

# Get first and last visit spending per guest
first_last_spend = (
    repeat_visits
    .sort_values(['guest_id', 'visit_date'])
    .groupby('guest_id')
    .agg(
        first_visit_date=('visit_date', 'first'),
        first_spend=('total_spent', 'first'),
        last_visit_date=('visit_date', 'last'),
        last_spend=('total_spent', 'last')
    )
)

# Difference between first and last visit
first_last_spend['spending_difference'] = first_last_spend['last_spend'] - first_last_spend['first_spend']

print(first_last_spend.reset_index())

   guest_id first_visit_date  first_spend last_visit_date  last_spend  \
0         1       2024-08-03         55.0      2024-08-15        45.0   
1         2       2024-08-05         22.0      2024-08-20        38.0   
2         3       2024-08-25         28.0      2024-08-27        32.0   

   spending_difference  
0                -10.0  
1                 16.0  
2                  4.0  


## Question 3

In September 2024, how can guests be categorized into distinct spending segments such as Low, Medium, and High based on their total spending? Use the following thresholds for categorization: 
-Low: Includes values from $0 up to, but not including, $50.
-Medium: Includes values from $50 up to, but not including, $100.
-High: Includes values from $100 and above. 
Exclude guests who did not make any purchases in the period.

In [5]:
# Filter September data
september_data = fct_guest_spending[
    (fct_guest_spending['visit_date'] >= '2024-09-01') &
    (fct_guest_spending['visit_date'] < '2024-10-01')
]

# Total spending per guest
guest_total_spending = (
    september_data
    .groupby('guest_id', as_index=False)
    .agg(total_spent=('amount_spent', 'sum'))
)

# Step 4: Exclude guests with 0 total spending
guest_total_spending = guest_total_spending[guest_total_spending['total_spent'] > 0]

# Categorize spending segment
def categorize_spending(amount):
    if amount < 50:
        return 'Low'
    elif amount < 100:
        return 'Medium'
    else:
        return 'High'

guest_total_spending['spending_segment'] = guest_total_spending['total_spent'].apply(categorize_spending)

# Final output
print(guest_total_spending)

   guest_id  total_spent spending_segment
0         1        100.0             High
1         8         60.0           Medium
2         9         40.0              Low
3        10         70.0           Medium


Made with ❤️ by [Interview Master](https://www.interviewmaster.ai)