# Week 5 — Applied SaaS Notebook

This notebook is part of the 'Applied ML Foundations for SaaS Analytics' course. Conversational, mentor-style guidance is provided throughout.

In [None]:
from IPython.display import HTML
HTML('''
<style>
details {
  margin: 10px 0;
  padding: 8px 12px;
  border: 1px solid #d9e2ec;
  border-radius: 8px;
  background: #f9fbfd;
}
details summary {
  font-weight: 600;
  color: #0056b3;
  cursor: pointer;
}
details[open] {
  background: #f1f7ff;
  border-color: #c3d4f0;
}
details pre {
  background: #f8f9fa;
  padding: 8px;
  border-radius: 6px;
}
</style>
''')

## Scenario — Prepare a 3-panel KPI dashboard for the CEO

We will create time series, distribution, and segmentation plots to tell a clear story about adoption and revenue.


## Hands-on

Make a boxplot of usage_count by plan_type (join feature_usage with subscriptions).


<details>
<summary>💡 Hint</summary>

Try breaking the problem into smaller steps. For example, if you need to aggregate per-user metrics, first compute a grouped table, then convert to NumPy arrays for vectorized ops. Think about edge cases: missing users, zero counts, or extreme values.

</details>

<details>
<summary>✅ Solution (example)</summary>

```python
# Example solution snippet — adapt to your dataset & question.
import pandas as pd
import numpy as np

# Load data (adjust path as needed)
df = pd.read_csv('../data/feature_usage.csv', parse_dates=['date'], low_memory=False)

# Example: compute total usage per user and return top users
user_usage = df.groupby('user_id')['usage_count'].sum().reset_index(name='total_usage')
top_users = user_usage.sort_values('total_usage', ascending=False).head(10)
top_users
```

**Why this works:** We use `groupby` to aggregate events by `user_id`, then sort to find the heaviest users. Converting to NumPy arrays can speed up numeric-only operations.

</details>

In [None]:

import pandas as pd, matplotlib.pyplot as plt, seaborn as sns
subs = pd.read_csv('../data/subscriptions.csv', parse_dates=['signup_date','churn_date'])
events = pd.read_csv('../data/user_events.csv', parse_dates=['timestamp'], nrows=50000)  # sample for plot speed
# daily DAU
events['date'] = events['timestamp'].dt.date
dau = events.groupby('date')['user_id'].nunique().reset_index()
plt.figure(figsize=(10,3))
plt.plot(dau['date'], dau['user_id'])
plt.title('Daily Active Users (sampled events)')
plt.tight_layout()
plt.show()


## Reflection

Which single chart would you show in a 30-second update to the CEO?
