# Dynamic Pricing Model for Local Events & Experiences

This notebook demonstrates the workflow for the dynamic pricing model, including:

1. Data loading and preprocessing
2. Demand estimation
3. Price optimization
4. Pricing strategy simulation
5. Visualization and analysis

The model helps event organizers set optimal prices based on various factors such as event characteristics, demand patterns, and market conditions.

## Setup and Imports

In [None]:
import sys
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta

# Add parent directory to path to import from src
sys.path.append(os.path.abspath('../'))

# Import custom modules
try:
    from src.data.preprocessor import EventDataPreprocessor
    from src.models.demand import DemandEstimator, GoogleTrendsAnalyzer
    from src.models.pricing import DynamicPricingModel
    from src.simulation.simulation import PricingSimulation
except ImportError as e:
    print(f"Error importing modules: {e}")
    print("Make sure you're running this notebook from the project root directory.")

# Set up plotting
sns.set(style='whitegrid')
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 12

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)

## 1. Data Loading and Preprocessing

First, we'll load the event data that has been scraped from sources like Eventbrite and Meetup. If no real data is available, we'll create a synthetic dataset for demonstration purposes.

In [None]:
# Check if real data exists
raw_data_path = '../data/raw/'
eventbrite_path = os.path.join(raw_data_path, 'eventbrite_events.csv')
meetup_path = os.path.join(raw_data_path, 'meetup_events.csv')

if os.path.exists(eventbrite_path) or os.path.exists(meetup_path):
    print("Loading real event data...")
    
    # Initialize preprocessor
    preprocessor = EventDataPreprocessor()
    
    # Load and preprocess data
    if os.path.exists(eventbrite_path):
        eventbrite_df = preprocessor.load_data(eventbrite_path)
        eventbrite_df = preprocessor.preprocess_eventbrite_data(eventbrite_df)
        print(f"Loaded {len(eventbrite_df)} events from Eventbrite")
    else:
        eventbrite_df = pd.DataFrame()
    
    if os.path.exists(meetup_path):
        meetup_df = preprocessor.load_data(meetup_path)
        meetup_df = preprocessor.preprocess_meetup_data(meetup_df)
        print(f"Loaded {len(meetup_df)} events from Meetup")
    else:
        meetup_df = pd.DataFrame()
    
    # Merge data if both sources are available
    if not eventbrite_df.empty and not meetup_df.empty:
        events_df = preprocessor.merge_data([eventbrite_df, meetup_df])
    elif not eventbrite_df.empty:
        events_df = eventbrite_df
    elif not meetup_df.empty:
        events_df = meetup_df
    else:
        events_df = pd.DataFrame()
    
    # Add features
    if not events_df.empty:
        events_df = preprocessor.add_features(events_df)
        print(f"Final dataset contains {len(events_df)} events with {len(events_df.columns)} features")
else:
    print("No real data found. Creating synthetic dataset for demonstration...")
    
    # Create synthetic data
    np.random.seed(42)  # For reproducibility
    
    # Define event categories
    categories = ['Concert', 'Workshop', 'Conference', 'Festival', 'Exhibition', 'Networking', 'Sports']
    locations = ['Downtown', 'Convention Center', 'Park', 'University', 'Theater', 'Stadium', 'Gallery']
    
    # Generate random events
    n_events = 100
    
    # Generate dates between now and 3 months from now
    start_date = datetime.now()
    end_date = start_date + timedelta(days=90)
    dates = [start_date + timedelta(days=np.random.randint(1, 91)) for _ in range(n_events)]
    
    # Generate synthetic data
    events_data = {
        'title': [f"{np.random.choice(categories)} Event {i}" for i in range(1, n_events+1)],
        'category': np.random.choice(categories, n_events),
        'location': np.random.choice(locations, n_events),
        'event_date': dates,
        'price_value': np.random.uniform(5, 100, n_events),  # Random prices between $5 and $100
        'attendance': np.random.randint(20, 500, n_events),  # Random attendance between 20 and 500
        'organizer': [f"Organizer {i}" for i in range(1, n_events+1)],
        'description_length': np.random.randint(100, 1000, n_events),  # Length of description in characters
        'source': np.random.choice(['Eventbrite', 'Meetup'], n_events)
    }
    
    # Create DataFrame
    events_df = pd.DataFrame(events_data)
    
    # Add derived features
    events_df['days_until_event'] = (events_df['event_date'] - datetime.now()).dt.days
    events_df['day_of_week'] = events_df['event_date'].dt.dayofweek
    events_df['is_weekend'] = events_df['day_of_week'].isin([5, 6]).astype(int)
    events_df['month'] = events_df['event_date'].dt.month
    
    # Add price category
    events_df['price_category'] = pd.cut(
        events_df['price_value'], 
        bins=[0, 15, 30, 50, 100, float('inf')],
        labels=['Very Low', 'Low', 'Medium', 'High', 'Very High']
    )
    
    # Add trend score (simulated)
    events_df['trend_score'] = np.random.uniform(0, 100, n_events)
    
    print(f"Created synthetic dataset with {len(events_df)} events")

# Display the first few rows of the dataset
events_df.head()

Let's explore the dataset to understand the distribution of prices, attendance, and other key variables.

In [None]:
# Basic statistics
print("Dataset summary statistics:
")
events_df.describe()


In [None]:
# Visualize price distribution
plt.figure(figsize=(10, 6))
sns.histplot(events_df['price_value'], bins=20, kde=True)
plt.title('Distribution of Event Prices')
plt.xlabel('Price ($)')
plt.ylabel('Frequency')
plt.show()

# Visualize attendance distribution
plt.figure(figsize=(10, 6))
sns.histplot(events_df['attendance'], bins=20, kde=True)
plt.title('Distribution of Event Attendance')
plt.xlabel('Attendance')
plt.ylabel('Frequency')
plt.show()

# Price vs. Attendance scatter plot
plt.figure(figsize=(10, 6))
sns.scatterplot(x='price_value', y='attendance', hue='category', data=events_df)
plt.title('Price vs. Attendance by Category')
plt.xlabel('Price ($)')
plt.ylabel('Attendance')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

# Day of week analysis
plt.figure(figsize=(10, 6))
day_names = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
day_counts = events_df['day_of_week'].value_counts().sort_index()
sns.barplot(x=[day_names[i] for i in day_counts.index], y=day_counts.values)
plt.title('Events by Day of Week')
plt.xlabel('Day of Week')
plt.ylabel('Number of Events')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## 2. Demand Estimation

Now we'll use the `DemandEstimator` class to build a model that predicts attendance based on event characteristics.

In [None]:
# Initialize demand estimator
demand_estimator = DemandEstimator()

# Split data into training and testing sets
from sklearn.model_selection import train_test_split

train_df, test_df = train_test_split(events_df, test_size=0.2, random_state=42)
print(f"Training set: {len(train_df)} events")
print(f"Testing set: {len(test_df)} events")

# Train demand estimation model
model = demand_estimator.train_model(train_df, demand_col='attendance')

if model is not None:
    # Make predictions on test set
    predictions_df = demand_estimator.predict_demand(test_df)
    
    # Evaluate model performance
    from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
    
    y_true = test_df['attendance']
    y_pred = predictions_df['predicted_demand']
    
    mae = mean_absolute_error(y_true, y_pred)
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))
    r2 = r2_score(y_true, y_pred)
    
    print(f"Model Evaluation Metrics:")
    print(f"Mean Absolute Error: {mae:.2f}")
    print(f"Root Mean Squared Error: {rmse:.2f}")
    print(f"R² Score: {r2:.2f}")
    
    # Visualize actual vs. predicted attendance
    plt.figure(figsize=(10, 6))
    plt.scatter(y_true, y_pred, alpha=0.5)
    plt.plot([y_true.min(), y_true.max()], [y_true.min(), y_true.max()], 'r--')
    plt.title('Actual vs. Predicted Attendance')
    plt.xlabel('Actual Attendance')
    plt.ylabel('Predicted Attendance')
    plt.tight_layout()
    plt.show()
    
    # Analyze demand factors
    analysis = demand_estimator.analyze_demand_factors(train_df)
    
    # Plot feature importance
    if 'feature_importance' in analysis:
        feature_importance = pd.DataFrame(analysis['feature_importance'])
        plt.figure(figsize=(10, 6))
        sns.barplot(x='Importance', y='Feature', data=feature_importance)
        plt.title('Feature Importance for Demand Estimation')
        plt.xlabel('Importance')
        plt.tight_layout()
        plt.show()
    
    # Print price elasticity
    if 'price_elasticity' in analysis and analysis['price_elasticity'] is not None:
        print(f"Price Elasticity of Demand: {analysis['price_elasticity']:.2f}")
        print(f"Interpretation: A 1% increase in price leads to a {abs(analysis['price_elasticity']):.2f}% {'decrease' if analysis['price_elasticity'] < 0 else 'increase'} in demand")
else:
    print("Failed to train demand estimation model")

## 3. Price Optimization

Now we'll use the `DynamicPricingModel` class to optimize prices based on the demand model and other factors.

In [None]:
# Initialize pricing model
pricing_model = DynamicPricingModel()

# Train pricing model
model = pricing_model.train_model(train_df, price_col='price_value')

if model is not None:
    # Select a sample event for optimization
    sample_event = test_df.iloc[0].to_dict()
    
    print(f"Sample Event: {sample_event['title']}")
    print(f"Current Price: ${sample_event['price_value']:.2f}")
    print(f"Current Attendance: {sample_event['attendance']}")
    
    # Optimize price for revenue maximization
    revenue_price = pricing_model.optimize_price(sample_event, objective='revenue')
    print(f"
Revenue-Maximizing Price: ${revenue_price:.2f}")
    
    # Optimize price for profit maximization
    profit_price = pricing_model.optimize_price(sample_event, objective='profit')
    print(f"Profit-Maximizing Price: ${profit_price:.2f}")
    
    # Generate price tiers
    tiers = pricing_model.generate_price_tiers(sample_event, num_tiers=3)
    
    print("
Recommended Price Tiers:")
    for tier in tiers:
        print(f"{tier['name']}: ${tier['price']:.2f} - {', '.join(tier['features'])}")
    
    # Visualize price-demand curve
    fig = pricing_model.visualize_price_demand_curve(sample_event)
    plt.show()
else:
    print("Failed to train pricing model")

## 4. Pricing Strategy Simulation

Now we'll use the `PricingSimulation` class to simulate different pricing strategies and compare their performance.

In [None]:
# Initialize simulation
simulation = PricingSimulation()

# Generate scenarios
scenarios_df = simulation.generate_event_scenarios(test_df, num_scenarios=3)
print(f"Generated {len(scenarios_df)} event scenarios")

# Define pricing strategies
strategies = [
    {'name': 'Fixed Price', 'type': 'fixed', 'params': {}},
    {'name': 'Revenue Maximizing', 'type': 'optimize', 'params': {'objective': 'revenue'}},
    {'name': 'Profit Maximizing', 'type': 'optimize', 'params': {'objective': 'profit'}},
    {'name': 'Early Bird Discount', 'type': 'time_based', 'params': {'early_discount': 0.2, 'regular_price': 1.0, 'late_premium': 0.1}},
    {'name': 'Demand Based', 'type': 'demand_based', 'params': {'low_demand_discount': 0.15, 'high_demand_premium': 0.15}}
]

# Simulate pricing strategies
results_df = simulation.simulate_pricing_strategies(scenarios_df, strategies=strategies)
print(f"Simulated {len(strategies)} pricing strategies for {len(scenarios_df)} events")

# Display simulation results
results_df.head()

In [None]:
# Analyze simulation results
analysis = simulation.analyze_simulation_results(results_df)

print(f"Best Revenue Strategy: {analysis['best_revenue_strategy']}")
print(f"Revenue Improvement: {analysis['revenue_improvement']['improvement_pct']:.2f}%")
print(f"Best Profit Strategy: {analysis['best_profit_strategy']}")
print(f"Profit Improvement: {analysis['profit_improvement']['improvement_pct']:.2f}%")

# Display strategy summary
strategy_summary = pd.DataFrame(analysis['strategy_summary'])
strategy_summary

## 5. Visualization and Analysis

Let's create some additional visualizations to better understand the simulation results.

In [None]:
# Revenue by strategy
plt.figure(figsize=(12, 6))
sns.barplot(x='strategy', y='revenue_sum', data=strategy_summary)
plt.title('Total Revenue by Pricing Strategy')
plt.xlabel('Strategy')
plt.ylabel('Total Revenue ($)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

# Profit by strategy
plt.figure(figsize=(12, 6))
sns.barplot(x='strategy', y='profit_sum', data=strategy_summary)
plt.title('Total Profit by Pricing Strategy')
plt.xlabel('Strategy')
plt.ylabel('Total Profit ($)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

# Price change by strategy
plt.figure(figsize=(12, 6))
sns.boxplot(x='strategy', y='price_change_pct', data=results_df)
plt.title('Price Change Percentage by Strategy')
plt.xlabel('Strategy')
plt.ylabel('Price Change (%)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

# Demand change by strategy
plt.figure(figsize=(12, 6))
sns.boxplot(x='strategy', y='demand_change_pct', data=results_df)
plt.title('Demand Change Percentage by Strategy')
plt.xlabel('Strategy')
plt.ylabel('Demand Change (%)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

# Price vs. Demand scatter plot by strategy
plt.figure(figsize=(12, 8))
sns.scatterplot(x='optimized_price', y='expected_demand', hue='strategy', size='revenue', data=results_df, sizes=(50, 200), alpha=0.7)
plt.title('Price vs. Demand by Strategy')
plt.xlabel('Price ($)')
plt.ylabel('Expected Demand')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

## 6. Recommendations and Insights

Based on our analysis, we can provide the following recommendations for event pricing:

In [None]:
# Get the best strategy
best_strategy = analysis['best_profit_strategy']

# Filter results for the best strategy
best_results = results_df[results_df['strategy'] == best_strategy]

# Calculate average improvements
avg_price_change = best_results['price_change_pct'].mean()
avg_demand_change = best_results['demand_change_pct'].mean()
avg_revenue_improvement = best_results['revenue'].sum() / results_df[results_df['strategy'] == 'Fixed Price']['revenue'].sum() - 1
avg_profit_improvement = best_results['profit'].sum() / results_df[results_df['strategy'] == 'Fixed Price']['profit'].sum() - 1

print(f"Recommended Pricing Strategy: {best_strategy}")
print(f"Average Price Change: {avg_price_change:.2f}%")
print(f"Average Demand Change: {avg_demand_change:.2f}%")
print(f"Overall Revenue Improvement: {avg_revenue_improvement*100:.2f}%")
print(f"Overall Profit Improvement: {avg_profit_improvement*100:.2f}%")

# Key insights
print("
Key Insights and Recommendations:")
print("1. Price elasticity is a critical factor in determining optimal pricing strategy.")

if 'price_elasticity' in dir(pricing_model) and pricing_model.price_elasticity is not None:
    elasticity = pricing_model.price_elasticity
    if elasticity < -1:
        print(f"   - Demand is elastic (elasticity = {elasticity:.2f}), suggesting that lower prices may increase revenue.")
    elif elasticity > -1:
        print(f"   - Demand is inelastic (elasticity = {elasticity:.2f}), suggesting that higher prices may increase revenue.")
    else:
        print(f"   - Demand is unit elastic (elasticity = {elasticity:.2f}), suggesting that price changes won't significantly affect revenue.")

print("2. Time-based pricing strategies can be effective for maximizing revenue and attendance.")
print("   - Early bird discounts can increase attendance and create early cash flow.")
print("   - Last-minute premiums can capture higher willingness to pay from late deciders.")

print("3. Different event categories may benefit from different pricing strategies.")
print("   - Consider the specific characteristics of each event when setting prices.")
print("   - Factors like venue capacity, fixed costs, and competition should be taken into account.")

print("4. Price tiering can increase overall revenue by capturing different willingness to pay.")
print("   - Offer multiple price points with different value propositions.")
print("   - Ensure clear differentiation between tiers to justify price differences.")

## 7. Next Steps

To further improve the dynamic pricing model, consider the following next steps:

1. **Collect more data**: Gather more event data, especially with actual attendance and revenue figures, to improve model accuracy.

2. **Incorporate more features**: Add more features such as weather forecasts, competing events, and social media sentiment.

3. **Implement A/B testing**: Test different pricing strategies on similar events to measure real-world impact.

4. **Develop real-time pricing**: Create a system that can adjust prices in real-time based on current demand and other factors.

5. **Integrate with event management systems**: Connect the pricing model with event management platforms for seamless implementation.

6. **Create a user-friendly dashboard**: Develop an interactive dashboard for event organizers to visualize and implement pricing recommendations.

7. **Expand to more event types**: Adapt the model for different types of events and experiences beyond the current scope.

## Conclusion

This notebook has demonstrated the complete workflow for a dynamic pricing model for local events and experiences. We've shown how to:

1. Load and preprocess event data
2. Build a demand estimation model
3. Optimize prices based on various objectives
4. Simulate different pricing strategies
5. Analyze and visualize the results

The model provides valuable insights for event organizers to set optimal prices that maximize revenue, profit, or attendance based on their specific goals and constraints. By implementing dynamic pricing strategies, event organizers can potentially increase their revenue and profit significantly compared to fixed pricing approaches.