# Day 3: Disney Parks Guest Spending Behavior

You are a data analyst working with the Disney Parks revenue team to understand nuanced guest spending patterns across different park experiences. The team wants to develop a comprehensive view of visitor purchasing behaviors. Your goal is to uncover meaningful insights that can drive personalized marketing strategies.

In [None]:
import pandas as pd
import numpy as np

fct_guest_spending_data = [
  {
    "guest_id": 1,
    "visit_date": "2024-07-05",
    "amount_spent": 50,
    "park_experience_type": "Attraction"
  },
  {
    "guest_id": 2,
    "visit_date": "2024-07-06",
    "amount_spent": 30,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 3,
    "visit_date": "2024-07-10",
    "amount_spent": 20.5,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 4,
    "visit_date": "2024-07-12",
    "amount_spent": 40,
    "park_experience_type": "Entertainment"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-07-15",
    "amount_spent": 35,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 5,
    "visit_date": "2024-07-20",
    "amount_spent": 60,
    "park_experience_type": "Attraction"
  },
  {
    "guest_id": 6,
    "visit_date": "2024-07-25",
    "amount_spent": 25,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-08-03",
    "amount_spent": 55,
    "park_experience_type": "Attraction"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-08-15",
    "amount_spent": 45,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 2,
    "visit_date": "2024-08-05",
    "amount_spent": 22,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 2,
    "visit_date": "2024-08-20",
    "amount_spent": 38,
    "park_experience_type": "Entertainment"
  },
  {
    "guest_id": 7,
    "visit_date": "2024-08-10",
    "amount_spent": 15,
    "park_experience_type": "Character Meet"
  },
  {
    "guest_id": 3,
    "visit_date": "2024-08-25",
    "amount_spent": 28,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 3,
    "visit_date": "2024-08-27",
    "amount_spent": 32,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-09-02",
    "amount_spent": 65,
    "park_experience_type": "Attraction"
  },
  {
    "guest_id": 8,
    "visit_date": "2024-09-05",
    "amount_spent": 50,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 9,
    "visit_date": "2024-09-15",
    "amount_spent": 40,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 10,
    "visit_date": "2024-09-20",
    "amount_spent": 70,
    "park_experience_type": "Entertainment"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-09-25",
    "amount_spent": 35,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 8,
    "visit_date": "2024-09-28",
    "amount_spent": 10,
    "park_experience_type": "Character Meet"
  }
]
fct_guest_spending = pd.DataFrame(fct_guest_spending_data)


## Question 1

What is the average spending per guest per visit for each park experience type during July 2024? Ensure that park experience types with no recorded transactions are shown with an average spending of 0.0. This analysis helps establish baseline spending differences essential for later segmentation.

In [None]:
#Guest visit on July 2024
fct_guest_spending['visit_date'] = pd.to_datetime(fct_guest_spending['visit_date'])
# Filter for July 2024
july_2024_df = fct_guest_spending[fct_guest_spending['visit_date'].dt.strftime('%Y-%m') == '2024-07'].copy()
# Get all park types from entire dataset
all_park_types = fct_guest_spending['park_experience_type'].unique()
# Calculate the average spending per guest for each park in July 2024
july_averages = july_2024_df.groupby('park_experience_type')['amount_spent'].mean()
# Reindex to include all types, filling missing with 0.0
result = july_averages.reindex(all_park_types, fill_value=0.0).reset_index()
result

## Question 2

For guests who visited our parks more than once in August 2024, what is the difference in spending between their first and their last visit? This investigation, using sequential analysis, will reveal any shifts in guest spending behavior over multiple visits.

In [None]:
# Filter for guests who visited in August 2024
august_2024_df = fct_guest_spending[fct_guest_spending['visit_date'].dt.strftime('%Y-%m') == '2024-08'].copy()

# Sort by guest_id and visit_date to ensure chronological order
august_2024_df.sort_values(by=['guest_id', 'visit_date'], inplace=True)

# Filter for guests who visited more than once
august_2024_df = august_2024_df.groupby('guest_id').filter(lambda x: len(x) > 1)
# The difference in spending between the first and last visit
first_last_spending = august_2024_df.groupby('guest_id').agg(
    first_spending=('amount_spent', 'first'),
    last_spending=('amount_spent', 'last')
).reset_index()
first_last_spending['spending_difference'] = first_last_spending['last_spending'] - first_last_spending['first_spending']

first_last_spending

## Question 3

In September 2024, how can guests be categorized into distinct spending segments such as Low, Medium, and High based on their total spending? Use the following thresholds for categorization: 
-Low: Includes values from $0 up to, but not including, $50.
-Medium: Includes values from $50 up to, but not including, $100.
-High: Includes values from $100 and above. 
Exclude guests who did not make any purchases in the period.

In [None]:
#FIlter for September 2024
september_2024_df = fct_guest_spending[fct_guest_spending['visit_date'].dt.strftime('%Y-%m') == '2024-09'].copy()
# Step 2: Calculate TOTAL spending per guest (sum all transactions)
guest_totals = september_2024_df.groupby('guest_id')['amount_spent'].sum().reset_index()
guest_totals.rename(columns={'amount_spent': 'total_spending'}, inplace=True)

# Step 3: Exclude guests with no purchases ($0 spending)
guests_with_purchases = guest_totals[guest_totals['total_spending'] > 0].copy()

# Step 4: Categorize into spending segments
guests_with_purchases['spending_segment'] = pd.cut(
    guests_with_purchases['total_spending'],
    bins=[0, 50, 100, np.inf],           # [0,50), [50,100), [100,∞)
    labels=['Low', 'Medium', 'High'],    # Category names
    right=False,                         # Left-closed, right-open intervals
    include_lowest=True                  # Handle boundary cases properly
)

# Step 5: Count guests in each segment
spending_segments = guests_with_purchases['spending_segment'].value_counts().reset_index()
spending_segments.columns = ['spending_segment', 'guest_count']
spending_segments = spending_segments.sort_values('spending_segment')

print("Guest Spending Segmentation Results:")
print(spending_segments)

Made with ❤️ by [Interview Master](https://www.interviewmaster.ai)