# Demo: Habit Formation and Usage Segmentation Analysis with vivainsights

This notebook demonstrates how to analyze **habit formation** and **usage segmentation** using the **vivainsights** Python package. These analyses are particularly powerful for understanding technology adoption patterns, especially for new tools like **Microsoft Copilot** and other digital collaboration technologies.

## Why Habit and Usage Analysis Matters

Understanding how people form habits and segment into different usage patterns is crucial for:

🚀 **Technology Adoption:**
- Track how employees adopt new tools like Copilot
- Identify early adopters vs. late adopters
- Monitor the progression from trial to habitual use

📊 **Organizational Insights:**
- Understand which groups form productive habits faster
- Identify training and support needs
- Measure the success of change management initiatives

🎯 **Strategic Decision Making:**
- Optimize rollout strategies for new technologies
- Allocate resources based on usage patterns
- Design targeted interventions for different user segments

## What You'll Learn

In this comprehensive walkthrough, you will:

1. **Understand Habit Formation Analysis** with `identify_habit()`
   - How to define and measure digital habits
   - Different parameter configurations for various scenarios
   - Visualization techniques for habit patterns

2. **Master Usage Segmentation** with `identify_usage_segments()`
   - Classify users into meaningful segments (Non-user, Low User, Novice, Habitual, Power User)
   - Compare different time window approaches (4-week vs 12-week)
   - Create custom segmentation parameters for specific technologies

3. **Technology Adoption Case Studies**
   - Analyze Copilot adoption patterns
   - Compare adoption across different applications (Teams, Outlook, Excel, etc.)
   - Track progression from initial usage to habitual behavior

4. **Advanced Parameter Tuning**
   - Customize thresholds for different technologies
   - Optimize time windows for your analysis needs
   - Handle edge cases and data quality considerations

## Key Functions Covered

### `identify_habit(data, metric, threshold, width, max_window)`
Identifies whether individuals have formed habits based on consistent usage patterns over time.

### `identify_usage_segments(data, metric, version, threshold, width, max_window, power_thres)`
Classifies users into usage segments to understand the distribution of engagement levels across your organization.

Let's begin our journey into understanding digital behavior patterns! 🎯

In [None]:
# Import necessary libraries
import vivainsights as vi
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings

# Suppress warnings for cleaner output
warnings.filterwarnings("ignore")

print("📚 Libraries imported successfully!")
print(f"vivainsights version: {vi.__version__}")
print(f"pandas version: {pd.__version__}")
print(f"numpy version: {np.__version__}")

# Set display options for better DataFrame viewing
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 50)

## Step 1: Load and Explore the Demo Data

Let's start by loading the sample Person Query dataset and examining the available metrics, particularly those related to technology usage that would be relevant for habit and usage analysis.

In [None]:
# Load the demo data
pq_data = vi.load_pq_data()

print(f"📈 Dataset Overview:")
print(f"  Shape: {pq_data.shape[0]:,} rows × {pq_data.shape[1]} columns")
print(f"  Date range: {pq_data['MetricDate'].min().date()} to {pq_data['MetricDate'].max().date()}")
print(f"  Unique employees: {pq_data['PersonId'].nunique():,}")
print(f"  Time periods: {pq_data['MetricDate'].nunique()} unique weeks")

# Display basic info
print(f"\n📋 Sample of the data:")
pq_data.head()

In [None]:
# Explore metrics available for habit and usage analysis
print("🔍 Exploring available metrics for habit analysis:")
print()

# Get all numeric columns that could represent usage metrics
numeric_cols = pq_data.select_dtypes(include=[np.number]).columns.tolist()

# Filter for potential technology/usage metrics
usage_metrics = [col for col in numeric_cols if any(keyword in col.lower() 
                for keyword in ['email', 'meeting', 'chat', 'call', 'copilot', 'teams', 'outlook', 'multitask'])]

print(f"📊 Potential usage metrics for analysis ({len(usage_metrics)} found):")
for i, metric in enumerate(usage_metrics, 1):
    if metric in pq_data.columns:
        mean_val = pq_data[metric].mean()
        std_val = pq_data[metric].std()
        non_zero_pct = (pq_data[metric] > 0).mean() * 100
        print(f"  {i:2}. {metric:<25} | Mean: {mean_val:6.1f} | Std: {std_val:6.1f} | Non-zero: {non_zero_pct:5.1f}%")

# Check for Copilot-specific metrics (common in newer datasets)
copilot_metrics = [col for col in pq_data.columns if 'copilot' in col.lower()]
if copilot_metrics:
    print(f"\n🤖 Copilot-specific metrics found ({len(copilot_metrics)}):")
    for metric in copilot_metrics:
        mean_val = pq_data[metric].mean()
        non_zero_pct = (pq_data[metric] > 0).mean() * 100
        print(f"    • {metric}: Mean = {mean_val:.1f}, Usage rate = {non_zero_pct:.1f}%")
else:
    print(f"\n🤖 No Copilot-specific metrics found in this dataset")
    print(f"    We'll use general collaboration metrics to demonstrate the concepts")

# Show distribution of a key metric for context
key_metric = 'Emails_sent' if 'Emails_sent' in pq_data.columns else usage_metrics[0]
print(f"\n📈 Distribution of '{key_metric}' (our primary demo metric):")
print(f"  Min: {pq_data[key_metric].min():.1f}")
print(f"  25th percentile: {pq_data[key_metric].quantile(0.25):.1f}")
print(f"  Median: {pq_data[key_metric].median():.1f}")
print(f"  75th percentile: {pq_data[key_metric].quantile(0.75):.1f}")
print(f"  Max: {pq_data[key_metric].max():.1f}")
print(f"  Weeks with usage (>0): {(pq_data[key_metric] > 0).sum():,} / {len(pq_data):,} ({(pq_data[key_metric] > 0).mean()*100:.1f}%)")

## Step 2: Understanding Habit Formation with `identify_habit()`

A **habit** in digital collaboration represents consistent, regular usage of a technology or behavior over time. The `identify_habit()` function helps us identify when employees have developed sustained usage patterns.

### Key Parameters Explained:

- **`threshold`**: Minimum value for a week to count as "usage" (e.g., sent ≥1 email)
- **`width`**: Number of qualifying weeks needed within the time window to form a habit
- **`max_window`**: Rolling time window to evaluate (e.g., 4 weeks, 12 weeks)

### Example Scenarios:
- **Conservative habit**: 9 out of 12 weeks with usage (width=9, max_window=12)
- **Moderate habit**: 4 out of 4 weeks with usage (width=4, max_window=4)  
- **Flexible habit**: 6 out of 8 weeks with usage (width=6, max_window=8)

Let's explore different parameter combinations!

In [None]:
# Example 1: Conservative habit analysis (12-week window)
# This is suitable for established technologies where we want to see sustained adoption

print("📊 Example 1: Conservative Habit Analysis (12-week rolling window)")
print("=" * 60)
print("Use case: Measuring established behavior patterns")
print("Parameters: threshold=1, width=9, max_window=12")
print("Interpretation: Users need to show usage in 9 out of 12 consecutive weeks")
print()

# Analyze email sending habits with conservative parameters
habit_conservative = vi.identify_habit(
    data=pq_data,
    metric='Emails_sent',
    threshold=1,      # At least 1 email per week
    width=9,          # 9 qualifying weeks required
    max_window=12,    # Within a 12-week rolling window
    return_type='summary'
)

print("📈 Conservative Habit Analysis Results:")
for key, value in habit_conservative.items():
    if isinstance(value, float):
        if 'pct' in key.lower() or '%' in key:
            print(f"  {key}: {value:.1%}")
        else:
            print(f"  {key}: {value:.1f}")
    else:
        print(f"  {key}: {value:,}")

print(f"\n💡 Insights:")
print(f"  • This conservative approach identifies truly habitual users")
print(f"  • Good for measuring adoption of established tools")
print(f"  • Higher bar means fewer people qualify as having 'habits'")
print(f"  • Useful for understanding sustained engagement patterns")

In [None]:
# Example 2: Responsive habit analysis (4-week window)
# This is suitable for new technology adoption where we want to detect habits forming quickly

print("\n📊 Example 2: Responsive Habit Analysis (4-week rolling window)")
print("=" * 60)
print("Use case: Measuring rapid adoption of new technologies (like Copilot)")
print("Parameters: threshold=1, width=4, max_window=4")
print("Interpretation: Users need to show usage in all 4 consecutive weeks")
print()

# Analyze with responsive parameters
habit_responsive = vi.identify_habit(
    data=pq_data,
    metric='Emails_sent',
    threshold=1,      # At least 1 email per week
    width=4,          # All 4 weeks required
    max_window=4,     # Within a 4-week rolling window
    return_type='summary'
)

print("📈 Responsive Habit Analysis Results:")
for key, value in habit_responsive.items():
    if isinstance(value, float):
        if 'pct' in key.lower() or '%' in key:
            print(f"  {key}: {value:.1%}")
        else:
            print(f"  {key}: {value:.1f}")
    else:
        print(f"  {key}: {value:,}")

print(f"\n💡 Insights:")
print(f"  • This responsive approach captures emerging habits quickly")
print(f"  • Perfect for new technology rollouts (Copilot, new tools)")
print(f"  • More sensitive to recent behavioral changes")
print(f"  • Helps identify early adopters who develop habits fast")

# Compare the two approaches
print(f"\n🔄 Comparison of Conservative vs Responsive:")
print(f"  Conservative (12w): {habit_conservative['Most recent week - % of pop with habit']:.1%} have habits")
print(f"  Responsive (4w):    {habit_responsive['Most recent week - % of pop with habit']:.1%} have habits")
print(f"  Difference: {abs(habit_conservative['Most recent week - % of pop with habit'] - habit_responsive['Most recent week - % of pop with habit']):.1%} percentage points")

In [None]:
# Example 3: Custom habit analysis for specific technology adoption scenarios

print("\n📊 Example 3: Custom Parameters for Different Technology Adoption Scenarios")
print("=" * 70)

# Scenario A: New Copilot feature rollout (lenient to capture early adoption)
print("🤖 Scenario A: New Copilot Feature Rollout")
print("   Goal: Capture early adopters and encourage continued usage")
print("   Parameters: threshold=1, width=2, max_window=4 (2 out of 4 weeks)")

habit_copilot_early = vi.identify_habit(
    data=pq_data,
    metric='Emails_sent',  # Simulating Copilot usage with email data
    threshold=1,
    width=2,
    max_window=4,
    return_type='summary'
)

copilot_early_pct = habit_copilot_early['Most recent week - % of pop with habit']
print(f"   Result: {copilot_early_pct:.1%} showing early adoption patterns")

# Scenario B: Critical business tool (strict requirements)
print(f"\n💼 Scenario B: Critical Business Tool Adoption")
print("   Goal: Ensure consistent usage of essential business tool")
print("   Parameters: threshold=2, width=7, max_window=8 (7 out of 8 weeks, min 2 uses)")

habit_critical_tool = vi.identify_habit(
    data=pq_data,
    metric='Emails_sent',
    threshold=2,    # Higher threshold - more intensive usage required
    width=7,        # Very consistent usage required
    max_window=8,
    return_type='summary'
)

critical_tool_pct = habit_critical_tool['Most recent week - % of pop with habit']
print(f"   Result: {critical_tool_pct:.1%} meeting critical tool usage standards")

# Scenario C: Moderate engagement tool
print(f"\n🎯 Scenario C: Moderate Engagement Tool")
print("   Goal: Balance between accessibility and meaningful usage")
print("   Parameters: threshold=1, width=5, max_window=6 (5 out of 6 weeks)")

habit_moderate = vi.identify_habit(
    data=pq_data,
    metric='Emails_sent',
    threshold=1,
    width=5,
    max_window=6,
    return_type='summary'
)

moderate_pct = habit_moderate['Most recent week - % of pop with habit']
print(f"   Result: {moderate_pct:.1%} showing moderate engagement habits")

# Summary comparison
print(f"\n📊 SCENARIO COMPARISON SUMMARY:")
print(f"  {'Scenario':<25} {'Habit Rate':<12} {'Use Case'}")
print(f"  {'-'*25} {'-'*12} {'-'*50}")
print(f"  {'Copilot Early Adoption':<25} {copilot_early_pct:<11.1%} {'Encourage trial & exploration'}")
print(f"  {'Critical Business Tool':<25} {critical_tool_pct:<11.1%} {'Ensure essential tool mastery'}")
print(f"  {'Moderate Engagement':<25} {moderate_pct:<11.1%} {'Balance accessibility & commitment'}")

print(f"\n💡 Key Takeaways:")
print(f"  • Lenient parameters (2/4 weeks) capture more users - good for new tech")
print(f"  • Strict parameters (7/8 weeks) identify truly committed users")
print(f"  • Threshold affects minimum usage intensity per qualifying week")
print(f"  • Choose parameters based on technology maturity and business goals")

In [None]:
# Visualizing habit formation over time
print("📈 Visualizing Habit Formation Patterns Over Time")
print("=" * 50)

print("Creating time series visualization of habit formation...")
print("This shows how the percentage of people with habits changes over time")

# Create time series plot for the moderate scenario (good balance for visualization)
habit_plot = vi.identify_habit(
    data=pq_data,
    metric='Emails_sent',
    threshold=1,
    width=5,
    max_window=6,
    return_type='plot',
    plot_mode='time'
)

# Display the plot
habit_plot.show()

print(f"\n📊 What this visualization shows:")
print(f"  • Blue bars: Percentage of people with habits each week")
print(f"  • Grey bars: Percentage of people without habits each week")
print(f"  • Trends over time reveal adoption patterns and seasonal effects")
print(f"  • Peaks and valleys may correspond to organizational events or changes")

print(f"\n🎯 How to interpret habit formation plots:")
print(f"  • Increasing blue (habit) percentage = successful adoption")
print(f"  • Decreasing blue percentage = potential issues or natural fluctuation")
print(f"  • Sudden changes = investigate organizational events during those periods")
print(f"  • Stable patterns = mature technology with consistent usage")

## Step 3: Understanding Usage Segmentation with `identify_usage_segments()`

**Usage segmentation** goes beyond habit formation to classify users into meaningful categories based on their usage patterns. This is particularly powerful for understanding the full spectrum of technology adoption across your organization.

### The Five Usage Segments:

1. **Non-user** (0 usage): Haven't adopted the technology
2. **Low User** (>0 but <1 average): Minimal, sporadic usage  
3. **Novice User** (≥1 average, no habit): Regular usage but not habitual
4. **Habitual User** (habit formed, <power threshold): Consistent, habitual usage
5. **Power User** (habit + high usage): Heavy, habitual usage (champions/super users)

### Key Parameters:

- **`version`**: "12w" (12-week), "4w" (4-week), or None (custom)
- **`power_thres`**: Threshold for distinguishing power users from habitual users
- **Custom parameters**: When version=None, specify threshold, width, max_window, power_thres

Let's explore different segmentation approaches!

In [None]:
# Example 1: Standard 12-week usage segmentation
print("📊 Example 1: Standard 12-Week Usage Segmentation")
print("=" * 55)
print("Use case: Comprehensive view of technology adoption maturity")
print("Built-in parameters: threshold=1, width=9, max_window=12, power_thres=15")
print()

# Analyze usage segments with 12-week approach
segments_12w = vi.identify_usage_segments(
    data=pq_data,
    metric='Emails_sent',
    version='12w',
    return_type='table'
)

print("📈 12-Week Usage Segmentation Results:")
print("(Shows count of users in each segment by week)")
print()
print(segments_12w.head(10))  # Show first 10 weeks

# Get summary statistics for most recent week
latest_week = segments_12w.iloc[-1]
total_users = latest_week.sum()

print(f"\n📊 Most Recent Week Breakdown ({segments_12w.index[-1].date()}):")
print(f"  Total users: {total_users:,}")
print()
for segment in ['Non-user', 'Low User', 'Novice User', 'Habitual User', 'Power User']:
    count = latest_week[segment]
    pct = (count / total_users) * 100
    print(f"  {segment:<15}: {count:4.0f} users ({pct:5.1f}%)")

print(f"\n💡 12-Week Segmentation Insights:")
print(f"  • Conservative approach - higher bar for habit formation")
print(f"  • Requires 9 out of 12 weeks of usage to be 'habitual'")
print(f"  • Power users need 15+ average uses per week")
print(f"  • Good for measuring mature technology adoption")
print(f"  • Shows established, sustained usage patterns")

In [None]:
# Example 2: Responsive 4-week usage segmentation
print("\n📊 Example 2: Responsive 4-Week Usage Segmentation")
print("=" * 55)
print("Use case: Quick feedback on new technology adoption progress")
print("Built-in parameters: threshold=1, width=4, max_window=4, power_thres=15")
print()

# Analyze usage segments with 4-week approach
segments_4w = vi.identify_usage_segments(
    data=pq_data,
    metric='Emails_sent',
    version='4w',
    return_type='table'
)

print("📈 4-Week Usage Segmentation Results:")
print("(Shows count of users in each segment by week)")
print()
print(segments_4w.head(10))  # Show first 10 weeks

# Get summary statistics for most recent week
latest_week_4w = segments_4w.iloc[-1]
total_users_4w = latest_week_4w.sum()

print(f"\n📊 Most Recent Week Breakdown ({segments_4w.index[-1].date()}):")
print(f"  Total users: {total_users_4w:,}")
print()
for segment in ['Non-user', 'Low User', 'Novice User', 'Habitual User', 'Power User']:
    count = latest_week_4w[segment]
    pct = (count / total_users_4w) * 100
    print(f"  {segment:<15}: {count:4.0f} users ({pct:5.1f}%)")

print(f"\n💡 4-Week Segmentation Insights:")
print(f"  • Responsive approach - lower bar for habit formation")
print(f"  • Requires all 4 consecutive weeks of usage to be 'habitual'")
print(f"  • Same power user threshold (15+ uses/week)")
print(f"  • Perfect for new technology rollouts and rapid feedback")
print(f"  • Captures emerging patterns quickly")

# Compare 12w vs 4w approaches
print(f"\n🔄 Comparison: 12-Week vs 4-Week Segmentation")
print(f"  {'Segment':<15} {'12-Week':<10} {'4-Week':<10} {'Difference'}")
print(f"  {'-'*15} {'-'*10} {'-'*10} {'-'*15}")

for segment in ['Non-user', 'Low User', 'Novice User', 'Habitual User', 'Power User']:
    count_12w = latest_week[segment]
    count_4w = latest_week_4w[segment]
    diff = count_4w - count_12w
    pct_12w = (count_12w / total_users) * 100
    pct_4w = (count_4w / total_users_4w) * 100
    
    print(f"  {segment:<15} {pct_12w:7.1f}%   {pct_4w:7.1f}%   {pct_4w - pct_12w:+6.1f}pp")

print(f"\n📈 Key Differences:")
print(f"  • 4-week typically shows more 'Habitual' users (easier to qualify)")
print(f"  • 12-week shows more 'Novice' users (harder to reach habitual status)")
print(f"  • Choose based on your analysis timeframe and technology maturity")

In [None]:
# Example 3: Custom segmentation for Copilot adoption analysis
print("\n🤖 Example 3: Custom Segmentation for Copilot Technology Adoption")
print("=" * 65)
print("Use case: Specialized analysis for AI/Copilot tool adoption")
print("Custom parameters optimized for new AI technology:")
print("  • threshold=1 (any usage counts)")
print("  • width=3 (3 out of 6 weeks for habit - more lenient)")
print("  • max_window=6 (shorter evaluation period)")
print("  • power_thres=5 (lower threshold for AI tools - usage is typically less frequent)")
print()

# Create custom segmentation for Copilot-style analysis
segments_copilot = vi.identify_usage_segments(
    data=pq_data,
    metric='Emails_sent',  # Simulating Copilot usage
    version=None,  # Use custom parameters
    threshold=1,
    width=3,
    max_window=6,
    power_thres=5,  # Lower threshold for AI tools
    return_type='table'
)

print("📈 Custom Copilot-Style Segmentation Results:")
print("(Shows count of users in each segment by week)")
print()
print(segments_copilot.tail(10))  # Show last 10 weeks

# Get summary statistics for most recent week
latest_week_copilot = segments_copilot.iloc[-1]
total_users_copilot = latest_week_copilot.sum()

print(f"\n📊 Most Recent Week Breakdown ({segments_copilot.index[-1].date()}):")
print(f"  Total users: {total_users_copilot:,}")
print()
for segment in ['Non-user', 'Low User', 'Novice User', 'Habitual User', 'Power User']:
    count = latest_week_copilot[segment]
    pct = (count / total_users_copilot) * 100
    print(f"  {segment:<15}: {count:4.0f} users ({pct:5.1f}%)")

print(f"\n💡 Custom Copilot Segmentation Insights:")
print(f"  • Lenient habit formation (3/6 weeks vs 9/12 weeks)")
print(f"  • Lower power user threshold (5 vs 15 uses/week)")
print(f"  • Optimized for AI tools where usage patterns differ")
print(f"  • Faster identification of early adopters")
print(f"  • Better suited for technologies with lower baseline usage")

# Show all three approaches side by side
print(f"\n📊 COMPLETE COMPARISON: All Three Segmentation Approaches")
print(f"  {'Segment':<15} {'12-Week':<10} {'4-Week':<10} {'Copilot':<10}")
print(f"  {'-'*15} {'-'*10} {'-'*10} {'-'*10}")

for segment in ['Non-user', 'Low User', 'Novice User', 'Habitual User', 'Power User']:
    pct_12w = (latest_week[segment] / total_users) * 100
    pct_4w = (latest_week_4w[segment] / total_users_4w) * 100
    pct_copilot = (latest_week_copilot[segment] / total_users_copilot) * 100
    
    print(f"  {segment:<15} {pct_12w:7.1f}%   {pct_4w:7.1f}%   {pct_copilot:7.1f}%")

print(f"\n🎯 When to Use Each Approach:")
print(f"  📅 12-Week: Mature technologies, long-term behavior analysis")
print(f"  ⚡ 4-Week:  General new technology rollouts, quick feedback")
print(f"  🤖 Custom:  AI/specialized tools, unique usage patterns, specific business needs")

In [None]:
# Visualizing usage segments over time
print("\n📈 Visualizing Usage Segment Evolution Over Time")
print("=" * 52)

print("Creating stacked time series visualization...")
print("This shows how the distribution of user segments changes over time")

# Create visualization for the 4-week approach (good balance)
segments_plot = vi.identify_usage_segments(
    data=pq_data,
    metric='Emails_sent',
    version='4w',
    return_type='plot'
)

# Display the plot
segments_plot.show()

print(f"\n📊 How to Read Usage Segment Plots:")
print(f"  • Each colored band represents a different usage segment")
print(f"  • Grey: Non-users (haven't adopted)")
print(f"  • Light grey: Low users (minimal usage)")
print(f"  • Light blue: Novice users (regular but not habitual)")
print(f"  • Medium blue: Habitual users (consistent usage)")
print(f"  • Dark blue: Power users (heavy, habitual usage)")

print(f"\n🎯 What to Look For:")
print(f"  📈 Growing blue sections = successful adoption")
print(f"  📉 Shrinking grey sections = reducing non-user population")
print(f"  🔄 Movement between segments = natural progression")
print(f"  📊 Stable patterns = mature adoption phase")
print(f"  ⚡ Sudden changes = investigate organizational events")

print(f"\n💡 Business Applications:")
print(f"  • Track technology rollout success")
print(f"  • Identify when adoption plateaus")
print(f"  • Measure impact of training programs")
print(f"  • Plan resource allocation for support")
print(f"  • Set realistic adoption targets")

## Step 4: Advanced Analysis - Multi-Metric Technology Adoption

Real-world technology adoption often involves multiple related metrics. For example, Copilot adoption might be measured across different applications (Teams, Outlook, Excel, Word, PowerPoint). Let's demonstrate how to analyze multi-metric adoption patterns.

In [None]:
# Simulate multi-metric Copilot adoption analysis
print("🤖 Multi-Metric Copilot Adoption Analysis")
print("=" * 45)
print("Simulating Copilot usage across multiple applications")
print()

# Create simulated Copilot metrics from existing data
# In real scenarios, you would have actual Copilot metrics
pq_copilot = pq_data.copy()

# Simulate Copilot metrics with realistic patterns
np.random.seed(42)  # For reproducible results

# Simulate Copilot adoption with lower usage rates (typical for new AI tools)
pq_copilot['Copilot_Teams'] = np.random.poisson(0.8, len(pq_data))
pq_copilot['Copilot_Outlook'] = np.random.poisson(0.6, len(pq_data))
pq_copilot['Copilot_Excel'] = np.random.poisson(0.4, len(pq_data))
pq_copilot['Copilot_Word'] = np.random.poisson(0.5, len(pq_data))
pq_copilot['Copilot_PowerPoint'] = np.random.poisson(0.3, len(pq_data))

# List of Copilot metrics for analysis
copilot_metrics = [
    'Copilot_Teams',
    'Copilot_Outlook', 
    'Copilot_Excel',
    'Copilot_Word',
    'Copilot_PowerPoint'
]

print("📊 Simulated Copilot Usage Statistics:")
for metric in copilot_metrics:
    mean_usage = pq_copilot[metric].mean()
    usage_rate = (pq_copilot[metric] > 0).mean() * 100
    max_usage = pq_copilot[metric].max()
    print(f"  {metric:<20}: Avg {mean_usage:.1f}/week, {usage_rate:4.1f}% usage rate, Max {max_usage}")

print(f"\n💡 Simulated Usage Patterns:")
print(f"  • Teams has highest usage (communication-focused)")
print(f"  • Outlook second (email integration)")
print(f"  • Office apps lower (task-specific usage)")
print(f"  • PowerPoint lowest (presentation creation)")

In [None]:
# Analyze habit formation across multiple Copilot applications
print("🎯 Habit Formation Analysis Across Copilot Applications")
print("=" * 55)

# Parameters optimized for AI tool adoption (typically slower than traditional apps)
ai_habit_params = {
    'threshold': 2,    # Lower threshold for AI tools
    'width': 2,        # 2-week consistency window  
    'max_window': 8    # 8-week observation (AI adoption takes time)
}

copilot_habits = {}

for metric in copilot_metrics:
    print(f"\n📱 Analyzing {metric} habit formation...")
    
    # Analyze habit formation for this Copilot application
    habit_result = vi.identify_habit(
        data=pq_copilot,
        metric=metric,
        person_id='PersonId',
        date_id='MetricDate',
        threshold=ai_habit_params['threshold'],
        width=ai_habit_params['width'],
        max_window=ai_habit_params['max_window']
    )
    
    copilot_habits[metric] = habit_result
    
    # Calculate habit formation rate
    habit_rate = (habit_result['Habit'] == 'Habit').mean() * 100
    
    print(f"  Habit Formation Rate: {habit_rate:.1f}%")
    
    # Analyze habit timing
    habit_users = habit_result[habit_result['Habit'] == 'Habit']
    if len(habit_users) > 0:
        avg_habit_time = habit_users['Habit_Week'].mean()
        print(f"  Average Time to Habit: Week {avg_habit_time:.1f}")
    else:
        print(f"  No habit formation detected")

print(f"\n🔍 Key Insights:")
print(f"  • AI tools typically show lower habit formation rates")
print(f"  • Communication tools (Teams) form habits faster")
print(f"  • Task-specific tools (Excel, PowerPoint) require more time")
print(f"  • Consider role-based analysis for deeper insights")

In [None]:
# Comparative usage segmentation across Copilot applications
print("🏷️ Usage Segmentation Across Copilot Applications")
print("=" * 50)

# Analyze usage segments for each Copilot application
copilot_segments = {}

for metric in copilot_metrics:
    print(f"\n🔍 {metric} Usage Segmentation:")
    
    # Apply usage segmentation
    segment_result = vi.identify_usage_segments(
        data=pq_copilot,
        metric=metric,
        person_id='PersonId',
        date_id='MetricDate',
        window_width=4  # 4-week window for AI adoption analysis
    )
    
    copilot_segments[metric] = segment_result
    
    # Calculate segment distribution
    segment_dist = segment_result['Usage_Segment'].value_counts(normalize=True) * 100
    
    print(f"  📊 Segment Distribution:")
    for segment in ['Non-user', 'Low User', 'Novice User', 'Habitual User', 'Power User']:
        if segment in segment_dist.index:
            print(f"    {segment:<13}: {segment_dist[segment]:5.1f}%")
        else:
            print(f"    {segment:<13}:   0.0%")

print(f"\n🎯 Cross-Application Insights:")
print(f"  • Compare adoption rates across different Copilot tools")
print(f"  • Identify which applications drive power user behavior")
print(f"  • Spot opportunities for cross-application training")
print(f"  • Plan targeted interventions for low-adoption tools")

## Step 5: Best Practices and Parameter Selection

When working with habit formation and usage segmentation analysis for technology adoption, parameter selection is crucial for meaningful insights. Here are key considerations:

In [None]:
# Best practices summary for technology adoption analysis
print("📋 Best Practices for Technology Adoption Analysis")
print("=" * 50)

print("\n🎯 Parameter Selection Guidelines:")
print()

print("🔧 identify_habit() Parameters:")
print("  threshold:")
print("    • Traditional apps: 3-5 uses/week")
print("    • AI/New tech: 1-2 uses/week (lower barrier)")
print("    • Communication tools: 5-10 uses/week")
print()
print("  width:")
print("    • Quick adoption: 1-2 weeks")
print("    • Standard analysis: 2-3 weeks") 
print("    • Complex tools: 3-4 weeks")
print()
print("  max_window:")
print("    • New tech rollout: 8-12 weeks")
print("    • Feature adoption: 4-8 weeks")
print("    • Established tools: 4-6 weeks")

print("\n🏷️ identify_usage_segments() Parameters:")
print("  window_width:")
print("    • Initial adoption: 4 weeks (responsive)")
print("    • Mature adoption: 12 weeks (conservative)")
print("    • Quarterly reviews: 12-13 weeks")
print("    • Monthly reviews: 4-5 weeks")

print("\n💡 Technology-Specific Recommendations:")
print()
print("🤖 AI Tools (Copilot, ChatGPT, etc.):")
print("  • Lower thresholds (1-2 uses/week)")
print("  • Longer observation windows (8-12 weeks)")
print("  • Focus on consistent low usage vs. sporadic high usage")
print()
print("💬 Communication Tools (Teams, Slack):")
print("  • Higher thresholds (5-10 uses/week)")
print("  • Shorter habit formation windows (2-4 weeks)")
print("  • Daily usage patterns important")
print()
print("📊 Productivity Tools (Office Suite):")
print("  • Moderate thresholds (2-4 uses/week)")
print("  • Standard windows (4-8 weeks)")
print("  • Consider role-based analysis")

In [None]:
# Strategic recommendations for technology adoption programs
print("🚀 Strategic Recommendations for Technology Adoption")
print("=" * 55)

print("\n📈 Implementation Strategy:")
print()
print("1️⃣ Baseline Assessment:")
print("  • Run 4-week usage segmentation for current state")
print("  • Identify Non-users and Low Users for intervention")
print("  • Establish benchmark metrics before training")
print()
print("2️⃣ Intervention Planning:")
print("  • Target Low Users with basic training")
print("  • Convert Novice Users to Habitual Users")
print("  • Leverage Power Users as champions/trainers")
print()
print("3️⃣ Progress Monitoring:")
print("  • Weekly usage tracking during rollout")
print("  • 4-week habit formation analysis")
print("  • Quarterly usage segment reassessment")
print()
print("4️⃣ Success Metrics:")
print("  • Increase in Habitual User + Power User %")
print("  • Reduction in Non-user %")
print("  • Faster time-to-habit formation")
print("  • Cross-application usage correlation")

print("\n🎯 Focus Areas for Different User Segments:")
print()
print("👥 Non-users (0% usage):")
print("  • Basic awareness and onboarding")
print("  • Remove technical barriers")
print("  • Demonstrate clear value proposition")
print()
print("📊 Low Users (1-25th percentile):")
print("  • Targeted training on core features")
print("  • Use case identification")
print("  • Peer support programs")
print()
print("🌱 Novice Users (25-50th percentile):")
print("  • Advanced feature training")
print("  • Integration with daily workflows")
print("  • Habit formation support")
print()
print("✅ Habitual Users (50-90th percentile):")
print("  • Champion/trainer opportunities")
print("  • Advanced use case exploration")
print("  • Cross-application usage")
print()
print("⭐ Power Users (90th+ percentile):")
print("  • Innovation and feedback programs")
print("  • Advanced feature beta testing")
print("  • Mentoring other segments")

## Summary and Next Steps

This notebook demonstrated how to use `identify_habit()` and `identify_usage_segments()` for technology adoption analysis, with a particular focus on AI tools like Copilot. The combination of these functions provides powerful insights for driving successful technology rollouts and user engagement.

In [None]:
# Summary of key takeaways
print("📝 Key Takeaways from Habits & Usage Analysis")
print("=" * 45)

print("\n🔍 What We Learned:")
print()
print("✅ Function Capabilities:")
print("  • identify_habit(): Detects when users form consistent usage patterns")
print("  • identify_usage_segments(): Classifies users into 5 behavioral segments")
print("  • Both functions are highly configurable for different scenarios")
print()
print("✅ Parameter Flexibility:")
print("  • Different thresholds for different technology types")
print("  • Adjustable time windows for various adoption timelines")
print("  • Conservative vs. responsive analysis approaches")
print()
print("✅ Technology Adoption Insights:")
print("  • AI tools require different analysis parameters")
print("  • Communication tools form habits faster")
print("  • Multi-metric analysis reveals cross-application patterns")

print("\n🚀 Next Steps:")
print()
print("1️⃣ Apply to Your Data:")
print("  • Load your organization's collaboration metrics")
print("  • Run baseline usage segmentation analysis")
print("  • Identify habit formation patterns")
print()
print("2️⃣ Customize Parameters:")
print("  • Adjust thresholds based on your technology type")
print("  • Select appropriate time windows for your rollout timeline")
print("  • Test different parameter combinations")
print()
print("3️⃣ Build Action Plans:")
print("  • Target interventions by user segment")
print("  • Monitor progress with regular re-analysis")
print("  • Measure success through improved habit formation")
print()
print("4️⃣ Scale Analysis:")
print("  • Analyze multiple technologies simultaneously")
print("  • Compare adoption across departments/roles")
print("  • Track long-term adoption trends")

print(f"\n💡 Remember: Technology adoption is a journey, not a destination!")
print(f"    Use these tools to guide users through that journey effectively.")