In [11]:
import pandas as pd


You are a data analyst working with the **Disney** Parks revenue team to understand nuanced guest spending patterns across different park experiences. The team wants to develop a comprehensive view of visitor purchasing behaviors. Your goal is to uncover meaningful insights that can drive personalized marketing strategies.

In [12]:
# Load the CSV file into a DataFrame
fct_guest_spending = pd.read_csv('fct_guest_spending.csv')

# Display the DataFrame
print("DataFrame loaded from fct_guest_spending.csv:")
print(fct_guest_spending)


DataFrame loaded from fct_guest_spending.csv:
    guest_id  visit_date  amount_spent park_experience_type
0          1  2024-07-05          50.0           Attraction
1          2  2024-07-06          30.0               Dining
2          3  2024-07-10          20.5               Retail
3          4  2024-07-12          40.0        Entertainment
4          1  2024-07-15          35.0               Dining
5          5  2024-07-20          60.0           Attraction
6          6  2024-07-25          25.0               Retail
7          1  2024-08-03          55.0           Attraction
8          1  2024-08-15          45.0               Dining
9          2  2024-08-05          22.0               Retail
10         2  2024-08-20          38.0        Entertainment
11         7  2024-08-10          15.0       Character Meet
12         3  2024-08-25          28.0               Retail
13         3  2024-08-27          32.0               Dining
14         1  2024-09-02          65.0           Attra

### Question 1 of 3

What is the average spending per guest per visit for each park experience type during July 2024? Ensure that park experience types with no recorded transactions are shown with an average spending of 0.0. This analysis helps establish baseline spending differences essential for later segmentation.

In [13]:
# Filter for July 2024
df_july = fct_guest_spending[
    (fct_guest_spending['visit_date'] >= '2024-07-01') &
    (fct_guest_spending['visit_date'] <= '2024-07-31')
]

# Get all unique park experience types from the full dataset
all_types = fct_guest_spending['park_experience_type'].unique()

# Calculate average spending per guest per visit for each park experience type
avg_spending = df_july.groupby('park_experience_type')['amount_spent'].mean().reindex(all_types, fill_value=0.0).reset_index()

# Rename columns for clarity
avg_spending.columns = ['park_experience_type', 'avg_spending_per_guest_per_visit']

print("Average spending per guest per visit for each park experience type during July 2024:")
print(avg_spending)


Average spending per guest per visit for each park experience type during July 2024:
  park_experience_type  avg_spending_per_guest_per_visit
0           Attraction                             55.00
1               Dining                             32.50
2               Retail                             22.75
3        Entertainment                             40.00
4       Character Meet                              0.00


### Question 2 of 3

For guests who visited our parks more than once in August 2024, what is the difference in spending between their first and their last visit? This investigation, using sequential analysis, will reveal any shifts in guest spending behavior over multiple visits.

In [14]:
# Filter for August 2024
df_aug = fct_guest_spending[
    (fct_guest_spending['visit_date'] >= '2024-08-01') &
    (fct_guest_spending['visit_date'] <= '2024-08-31')
]

# Count visits per guest
visit_counts = df_aug.groupby('guest_id')['visit_date'].nunique()
multi_visitors = visit_counts[visit_counts > 1].index

# Filter guests with more than one visit
df_multi = df_aug[df_aug['guest_id'].isin(multi_visitors)]

# Aggregate spending per guest per visit date
guest_visits = df_multi.groupby(['guest_id', 'visit_date'])['amount_spent'].sum().reset_index()

# Find first and last visit for each guest
first_visits = guest_visits.sort_values('visit_date').groupby('guest_id').first().reset_index()
last_visits = guest_visits.sort_values('visit_date').groupby('guest_id').last().reset_index()

# Merge to compute difference
spending_diff = pd.merge(
    first_visits[['guest_id', 'amount_spent']],
    last_visits[['guest_id', 'amount_spent']],
    on='guest_id',
    suffixes=('_first', '_last')
)
spending_diff['spending_difference'] = spending_diff['amount_spent_last'] - spending_diff['amount_spent_first']

print("Difference in spending between first and last visit for guests with multiple visits in August 2024:")
print(spending_diff[['guest_id', 'spending_difference']])


Difference in spending between first and last visit for guests with multiple visits in August 2024:
   guest_id  spending_difference
0         1                -10.0
1         2                 16.0
2         3                  4.0


### Question 3 of 3

In September 2024, how can guests be categorized into distinct spending segments such as Low, Medium, and High based on their total spending? Use the following thresholds for categorization:

1. Low: Includes values from $0 up to, but not including, $50.
2. Medium: Includes values from $50 up to, but not including, $100.
3. High: Includes values from $100 and above.

Exclude guests who did not make any purchases in the period.

In [15]:
# Filter for September 2024
df_sep = fct_guest_spending[
    (fct_guest_spending['visit_date'] >= '2024-09-01') &
    (fct_guest_spending['visit_date'] <= '2024-09-30')
]

# Calculate total spending per guest
guest_spending = df_sep.groupby('guest_id')['amount_spent'].sum().reset_index()

# Exclude guests with no purchases
guest_spending = guest_spending[guest_spending['amount_spent'] > 0]

# Categorize spending segments
def categorize(amount):
    if amount < 50:
        return 'Low'
    elif amount < 100:
        return 'Medium'
    else:
        return 'High'

guest_spending['spending_segment'] = guest_spending['amount_spent'].apply(categorize)

print("Guest spending segments for September 2024:")
print(guest_spending[['guest_id', 'amount_spent', 'spending_segment']])


Guest spending segments for September 2024:
   guest_id  amount_spent spending_segment
0         1         100.0             High
1         8          60.0           Medium
2         9          40.0              Low
3        10          70.0           Medium
