# Demo Notebook for the vivainsights Python package

Welcome to the comprehensive demo of the **vivainsights** Python package! This notebook showcases the full analytical capabilities available for Microsoft Viva Insights data analysis.

**vivainsights** is a powerful Python library designed to help you:
- 📊 **Visualize** collaboration patterns and organizational metrics
- 🔍 **Analyze** employee engagement and wellbeing indicators  
- 📈 **Identify** trends, outliers, and areas for improvement
- 🌐 **Explore** collaboration networks and organizational dynamics
- ⚡ **Generate** actionable insights for leaders and HR teams

This demo covers the major function categories with real examples using sample Person Query data.

For more information about the package:
- [📚 Documentation](https://microsoft.github.io/vivainsights-py/) - Complete API reference and guides
- [💻 GitHub Repository](https://github.com/microsoft/vivainsights-py/) - Source code and issue tracking
- [🎯 Use Cases](https://microsoft.github.io/vivainsights-py/about.html) - Real-world applications and examples

## Getting Started: Loading Data and Libraries

The **vivainsights** package comes with built-in sample datasets that mirror the structure of real Viva Insights exports. The `load_pq_data()` function loads a representative Person Query dataset containing:

- **Individual metrics**: Collaboration hours, email activity, meeting patterns
- **Organizational attributes**: Function, level, organization, manager status  
- **Time series data**: Weekly observations for trend analysis
- **Network data**: Collaboration patterns across the organization

Let's start by loading the library and exploring the sample data:

In [None]:
import pip

def install_local_package(path_to_package):
    pip.main(['install', path_to_package])

# Example
install_local_package('C:\\Users\\martinchan\\OneDrive - Microsoft\\Documents\\GitHub\\vivainsights-py')

In [None]:
import vivainsights as vi

# load in-built datasets
pq_data = vi.load_pq_data() # load and assign in-built person query

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
pq_data.head()

`extract_hr()` returns all the HR or organizational attributes it identifies within the target DataFrame:

In [None]:
vi.extract_hr(pq_data)

## Core Visualization Functions

The **vivainsights** package provides a comprehensive suite of visualization functions, each designed for specific analytical needs. All visualization functions follow a consistent pattern:

**Key Parameters:**
- `data` - Your Viva Insights dataset (Person Query, Meeting Query, etc.)
- `metric` - The collaboration metric to analyze (e.g., 'Collaboration_hours', 'Emails_sent')
- `hrvar` - Organizational grouping variable (e.g., 'Organization', 'LevelDesignation')  
- `mingroup` - Minimum group size to display (privacy protection)
- `return_type` - Output format: `'plot'` (visualization) or `'table'` (summary data)

### Bar Charts: Comparing Groups

The `create_bar()` function creates person-averaged bar charts - it first calculates individual averages, then group averages. This prevents larger groups from dominating the analysis and ensures fair comparison across organizational segments.

In [None]:
plot_bar = vi.create_bar(data=pq_data, metric='Emails_sent', hrvar='Organization', mingroup=5)

You can also ask the function to return a summary table by specifying the parameter `return_type`. This summary table can be copied to a clipboard with `export()`.

In [None]:
tb = vi.create_bar(data=pq_data, metric='Emails_sent', hrvar='Organization', mingroup=5, return_type='table')
print(tb)

In [None]:
vi.export(tb)

Here are some other visual outputs, and their accompanying summary table outputs:

In [None]:
plot_line = vi.create_line(data=pq_data, metric='Emails_sent', hrvar='Level', mingroup=5, return_type='plot')

In [None]:
vi.create_line(data=pq_data, metric='Emails_sent', hrvar='Organization', mingroup=5, return_type='table').head()

#### Distribution & Inequality Analysis

Understanding how metrics are distributed across your organization is crucial for identifying patterns, outliers, and inequality. The **vivainsights** package provides powerful functions for distribution analysis that go beyond simple averages.

**Distribution Functions:**
- `create_boxplot()` - Visualizes metric distributions and identifies outliers across groups
- `create_lorenz()` - Analyzes inequality using Lorenz curves and Gini coefficients

These functions are particularly valuable when analyzing engagement metrics. For example, while the average "Collaboration hours" might appear healthy across your organization, the distribution could reveal that a small group of employees are experiencing unsustainable collaboration loads, while others might be under-collaborating.

In [None]:
plot_box = vi.create_boxplot(data=pq_data, metric='Emails_sent', hrvar='Organization', mingroup=5, return_type='plot')

In [None]:
vi.create_boxplot(data=pq_data, metric='Emails_sent', hrvar='Organization', mingroup=5, return_type='table')

### Lorenz Curve: Analyzing Inequality

The `create_lorenz()` function helps you understand inequality within your data by plotting Lorenz curves and calculating Gini coefficients. This is particularly useful for identifying whether certain metrics are concentrated among a small subset of employees.

A Gini coefficient close to 0 indicates equality (everyone has similar values), while a value close to 1 indicates high inequality (few people have most of the value).

In [None]:
# Lorenz curve for collaboration hours - shows inequality across the population
lorenz_plot = vi.create_lorenz(data=pq_data, metric='Collaboration_hours', return_type='plot')

In [None]:
# Get the Gini coefficient to quantify inequality
gini_coef = vi.create_lorenz(data=pq_data, metric='Collaboration_hours', return_type='gini')
print(f"Gini coefficient for Collaboration Hours: {gini_coef:.3f}")
print(f"Interpretation: {'High inequality' if gini_coef > 0.5 else 'Moderate inequality' if gini_coef > 0.3 else 'Low inequality'}")

### Incidence Analysis: Understanding Thresholds

The `create_inc()` function helps you understand what percentage of your population exceeds certain thresholds for key metrics. This is crucial for identifying employees who might be at risk of burnout or disengagement.

For example, you might want to know what percentage of employees in each organization have collaboration hours above a healthy threshold.

In [None]:
# Incidence analysis: What % of people have >20 collaboration hours per week?
inc_plot = vi.create_inc(
    data=pq_data,
    metric='Collaboration_hours',
    hrvar='Organization',
    threshold=20,  # Threshold of 20 hours per week
    position='above',  # Looking at people above this threshold
    mingroup=5,
    return_type='plot'
)

In [None]:
# Get the exact percentages as a table
inc_table = vi.create_inc(
    data=pq_data,
    metric='Collaboration_hours',
    hrvar='Organization',
    threshold=20,
    position='above',
    mingroup=5,
    return_type='table'
)
print("Percentage of employees with >20 collaboration hours per week:")
print(inc_table)

## Exploratory Data Analysis

### Multi-dimensional Ranking: Finding Top Contributors and Risk Groups

The `create_rank()` function is one of the most powerful tools for rapid organizational exploration. It allows you to:

- **Compare multiple organizational dimensions** simultaneously
- **Identify top and bottom performers** across any metric  
- **Discover hidden patterns** in your organizational structure
- **Prioritize attention** by ranking all groups by importance

This function is particularly valuable for leadership teams who need to quickly understand where to focus their attention across complex organizational hierarchies.

In [None]:
vi.create_rank(
    data=pq_data,
    metric='Collaboration_hours',
    hrvar = ['Organization', 'FunctionType', 'LevelDesignation', 'SupervisorIndicator'],
    mingroup=5,
    return_type = 'table'
)

This can be visualized as well:

In [None]:
plot_rank = vi.create_rank(
    data=pq_data,
    metric='Collaboration_hours',
    hrvar = ['Organization', 'FunctionType', 'LevelDesignation', 'SupervisorIndicator'],
    mingroup=5,
    return_type = 'plot'
)

### Validating / exploring the data

Since HR variables or organizational attributes are a key part of the analysis process, it is also possible to perform some exploration or validation before we begin the analysis. 

In [None]:
plot_hrcount = vi.hrvar_count(data=pq_data, hrvar='Organization', return_type='plot')

In [None]:
vi.hrvar_count(data=pq_data, hrvar='Organization', return_type='table')

## Additional Examples

Below are additional examples using the demo dataset `pq_data` for some of the newer functions in **vivainsights**.

### Bubble Plot: `create_bubble()`

The `create_bubble()` function visualizes the relationship between two metrics, with bubble size representing group size. This is useful for comparing two metrics across organizational groups.

In [None]:
# Bubble plot: Collaboration_hours vs. Multitasking_hours by Organization
bubble_plot = vi.create_bubble(
    data=pq_data,
    metric_x="Collaboration_hours",
    metric_y="Multitasking_hours",
    hrvar="Organization",
    mingroup=5,
    return_type="plot"
)

### Trend Plot: `create_trend()`

The `create_trend()` function provides a week-by-week heatmap view of a selected metric, grouped by an HR attribute. This helps identify trends and hotspots over time.

In [None]:
# Trend plot: Collaboration_hours by LevelDesignation
trend_plot = vi.create_trend(
    data=pq_data,
    metric="Collaboration_hours",
    hrvar="LevelDesignation",
    mingroup=5,
    return_type="plot"
)

### Key Metrics Scan: `keymetrics_scan()`

The `keymetrics_scan()` function summarizes multiple key metrics across a grouping variable, returning either a heatmap or a summary table. This is useful for a high-level scan of organizational health.

In [None]:
# Key metrics scan: heatmap by Organization
keymetrics_plot = vi.keymetrics_scan(
    data=pq_data,
    hrvar="Organization",
    mingroup=5,
    return_type="plot"
)

In [None]:
# Key metrics scan: summary table by Organization
keymetrics_table = vi.keymetrics_scan(
    data=pq_data,
    hrvar="Organization",
    mingroup=5,
    return_type="table"
)
keymetrics_table.head()

## Network Analysis & Flow Visualization

**vivainsights** includes powerful functions for analyzing collaboration networks and visualizing flows between organizational groups.

### Sankey Diagrams: Visualizing Organizational Flows

The `create_sankey()` function creates flow diagrams that show how people are distributed across different organizational attributes. This is particularly useful for understanding organizational structure and identifying potential silos.

In [None]:
# First, create a summary table for the Sankey diagram
# Sankey diagrams need aggregated data showing flows between two variables
sankey_data = pq_data.groupby(['Organization', 'LevelDesignation'])['PersonId'].nunique().reset_index(name='n')

# Create Sankey diagram showing flow from Organization to Level
sankey_plot = vi.create_sankey(
    data=sankey_data,
    var1='Organization',  # Left side of diagram
    var2='LevelDesignation',  # Right side of diagram  
    count='n'  # The flow volume
)

## Advanced Analytics

### Information Value Analysis: Predictive Insights

The `create_IV()` function helps you understand which organizational attributes are most predictive of key outcomes. This is particularly useful for identifying factors that drive engagement, performance, or retention.

Information Value (IV) measures the predictive strength of variables:
- IV < 0.02: Not useful for prediction
- 0.02 ≤ IV < 0.1: Weak predictive power  
- 0.1 ≤ IV < 0.3: Medium predictive power
- 0.3 ≤ IV < 0.5: Strong predictive power
- IV ≥ 0.5: Very strong (potentially suspicious)

In [None]:
# Information Value analysis: Which factors predict high collaboration?
# First, create a binary outcome variable for high collaboration (>median)
pq_data_iv = vi.load_pq_data()
copilot_median = pq_data_iv['Copilot_actions_taken_in_Teams'].median()
pq_data_iv['High_Copilot'] = (pq_data_iv['Copilot_actions_taken_in_Teams'] > copilot_median).astype(int)

# Define predictor variables
predictors = ['Email_hours', 'Meeting_hours', 'Uninterrupted_hours']

# Run Information Value analysis
vi.create_IV(
    data=pq_data_iv,
    predictors=predictors,
    outcome='High_Copilot',
    return_type='plot'
)

## Putting It All Together: Analysis Best Practices

The **vivainsights** package provides a comprehensive toolkit for organizational analytics. Here are some best practices for effective analysis:

### 1. Start with Exploration
- Use `create_rank()` and `keymetrics_scan()` to get a high-level overview
- Apply `hrvar_count()` to understand your population segments
- Check data quality with `identify_outlier()` and related functions

### 2. Dive Deep with Distribution Analysis  
- Use `create_boxplot()` to identify outliers and understand spread
- Apply `create_lorenz()` to assess inequality and concentration
- Leverage `create_inc()` to understand threshold exceedances

### 3. Understand Relationships
- Use `create_bubble()` to explore relationships between two metrics
- Apply `create_trend()` to identify patterns over time
- Visualize organizational flows with `create_sankey()`

### 4. Advanced Insights
- Use `create_IV()` for predictive analytics and identifying key drivers
- Apply network analysis functions for collaboration insights

### 5. Export and Share
- All functions support `return_type='table'` for extracting underlying data
- Use `export()` to copy tables to clipboard for easy sharing
- Combine multiple analyses to tell compelling data stories

This comprehensive approach ensures you can uncover meaningful insights about collaboration, engagement, and organizational health using Viva Insights data.