<a href="https://colab.research.google.com/github/mikeogunmakin/research/blob/main/growth%20data%20science/202511_root_cause_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Root Cause Analysis**

Diagnosing a change in KPI or other metric is a common task. Maybe it was a sharp increase or decrease or a steady change over time. Either way, getting to the root cause can be very insightful.

## Steps to take

### 1. Clarify the scope of the change
Before you dig into the data, you need to understand what exactly you are evaluating. Asking clarifying questions can help:

* What is the definition of the metric?
* Why is this metric important?
* What’s the magnitude of the change?
* Clarify the time frame that the change was observed and if it was it sudden or gradual.
* Additionally, what time period are they comparing it to?


### 2. Hypothesize contributing factors

Metric changes are almost always due to one or more of the following factors: product changes, seasonal factors, competition, mix shift, and data quality.

* Was it accidental — is there a problem with the data pipeline?
* Was it natural — is the data seasonal? Do you see the same change at the same time of week/month/year?
* Was it internal —was there a recent product launch, change, or bug fix? Was there a marketing campaign that started — or ended?
* Was it external — what’s going on with the competition? Was there a significant world event? Was there an external technical change to the browser or operating system?


### 3. Validate each contributing factor

Check demographic segments to see if you can isolate the issue to categories such as age, gender, device type, operating system, browser type, location, language, length of use, new vs returning users, other user type, etc.
Look at upstream metrics — has behavior shifted elsewhere or is there a bigger issue with the product?

### 4. Classify each factor (What category does it fall in?)

* Root cause
* Contributing
* Correlated
* Unrelated

## Example Applying Framework

#### **Scenario #1:**

You’re a data analyst at a fintech company and notice that **the app’s daily active users (DAU)** have dropped by **15% week-over-week**.
<br>
#### **1. Clarify the scope of the change**

Before jumping into the data, you clarify what’s being measured and compared.

* **Metric definition:**
  DAU = number of unique users who open and actively use the app in a given day.

* **Importance:**
  DAU is a key engagement metric that indicates user retention and app health.

* **Magnitude of change:**
  15% decrease — significant enough to warrant investigation.

* **Time frame:**
  Drop started **on Monday, 7 October**, and continued throughout the week.

* **Comparison period:**
  Compared to **30 September – 6 October** (previous week).

* **Pattern:**
  The drop appears **sudden**, not gradual.
<br>

#### **2. Hypothesize contributing factors**

You brainstorm potential causes under four categories:

| Category                         | Possible Cause                                                                                    |
| -------------------------------- | ------------------------------------------------------------------------------------------------- |
| **Accidental (Data/Tech issue)** | Tracking script failed or event ingestion stopped in the data pipeline                            |
| **Natural (Seasonality)**        | Early October might have lower activity due to holidays in key markets                            |
| **Internal (Product/Marketing)** | A new app version was released on 6 October; a push notification campaign also ended on 5 October |
| **External (Environment)**       | A competitor launched a major promotion; iOS 18 update rolled out globally                        |

<br>

#### **3. Validate each contributing factor**

You test each hypothesis with data:

| Hypothesis            | Validation                                                            | Result                                                               |
| --------------------- | --------------------------------------------------------------------- | -------------------------------------------------------------------- |
| **Tracking failure**  | Check event logs in the data warehouse and pipeline health dashboards | ✅ *No failures detected; data pipeline healthy*                      |
| **Seasonality**       | Review last year’s DAU trend for the same period                      | ⚠️ *Minor dip last year (~3%), not 15%*                              |
| **App version issue** | Segment DAU by app version                                            | ✅ *Users on version 6.2 dropped by 25%, while older versions stable* |
| **Campaign ended**    | Correlate DAU with marketing campaign dates                           | ✅ *Engagement fell immediately after campaign ended*                 |
| **Competitor promo**  | Check app store reviews and social media buzz                         | ⚠️ *Competitor campaign trending, but timing slightly later*         |
| **iOS 18 update**     | Segment DAU by OS                                                     | ✅ *iOS users show 20% decline, Android stable*                       |
<br>

#### **4. Classify each factor**

You classify the findings to communicate impact clearly:

| Factor                                   | Category         | Explanation                                 |
| ---------------------------------------- | ---------------- | ------------------------------------------- |
| App version 6.2 causing lower engagement | **Root cause**   | Users experienced login issues after update |
| iOS 18 rollout                           | **Contributing** | Compatibility bug increased crash rate      |
| Campaign ended                           | **Contributing** | Temporary engagement boost removed          |
| Seasonality                              | **Correlated**   | Slight dip typical this time of year        |
| Competitor promotion                     | **Unrelated**    | Timing doesn’t align fully with drop        |

<br>

#### **Root Cause Summary**

The **main driver** of the DAU drop was **a login issue introduced in app version 6.2**, which affected mostly **iOS users after the iOS 18 update**.
The decline was **amplified by the end of a marketing campaign** that had been sustaining engagement.

<br>

#### **Next Steps**

* Roll out a **hotfix** to resolve login issues for iOS users.
* **Extend** the push campaign or re-engage affected users once the issue is fixed.
* **Monitor** DAU recovery and crash logs over the next 7 days.




#### **Scenario #2:**

You’re a marketing analyst for a fintech company. You notice that the **conversion rate from paid search campaigns** dropped from **5.2% to 3.4% week-over-week** — a **35% decline**.



#### **1. Clarify the scope of the change**

Before analysing, you define exactly what’s being measured and compared.

* **Metric definition:**
  Conversion Rate = (Number of approved credit applications ÷ Number of paid search clicks).

* **Importance:**
  This metric measures the efficiency of paid search spend — lower conversion means lower ROI.

* **Magnitude of change:**
  35% decline — significant and unusual.

* **Time frame:**
  Drop observed starting **Monday, 14 October**, continuing for the entire week.

* **Comparison period:**
  Compared against **7–13 October**.

* **Pattern:**
  Drop was **sudden**, not gradual — indicating a specific trigger.

<br>

#### **2. Hypothesise contributing factors**

You brainstorm potential explanations under the four diagnostic lenses:

| Category                             | Possible Cause                                                                                          |
| ------------------------------------ | ------------------------------------------------------------------------------------------------------- |
| **Accidental (Data/Tracking issue)** | Pixel or tag misfiring on the landing page; analytics integration broken after recent update            |
| **Natural (Seasonality)**            | Lower demand after payday week; fewer credit applications mid-month                                     |
| **Internal (Marketing/Product)**     | Change in ad copy or landing page test; budget reallocation to new keywords; new eligibility criteria   |
| **External (Market/Environment)**    | Competitors increased bids; new credit regulation announcement; macro events (e.g., interest rate news) |

<br>

#### **3. Validate each contributing factor**

You test each hypothesis using the data:

| Hypothesis                       | Validation                                                                 | Result                                                                                    |
| -------------------------------- | -------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------- |
| **Pixel/tag issue**              | Check tag health in Google Tag Manager and conversions logged in analytics | ✅ *Tag firing correctly; conversion data complete*                                        |
| **Seasonality**                  | Compare with same week last month and last year                            | ⚠️ *Small dip mid-month typical (~5%), not 35%*                                           |
| **Ad copy or landing page test** | Review recent A/B experiments and creative changes                         | ✅ *New landing page launched 13 Oct with different headline and longer form*              |
| **Keyword mix or bid change**    | Compare spend by keyword and CPA before/after 14 Oct                       | ✅ *More spend shifted to generic “credit score” keywords with lower intent*               |
| **Competitor activity**          | Review Auction Insights report in Google Ads                               | ⚠️ *Slight increase in competitor impression share (+5%), not enough to explain decline*  |
| **External event**               | Check finance news and regulation updates                                  | ✅ *New FCA guidance on credit affordability published on 14 Oct; customers more hesitant* |

<br>

#### **4. Classify each factor**

You group findings into categories based on impact and causality:

| Factor                            | Category         | Explanation                                         |
| --------------------------------- | ---------------- | --------------------------------------------------- |
| New landing page with longer form | **Root cause**   | Increased friction caused lower conversions         |
| Shift to generic keywords         | **Contributing** | Attracted lower-intent traffic, reducing efficiency |
| FCA guidance announcement         | **Contributing** | Temporarily reduced customer confidence in applying |
| Mid-month seasonality             | **Correlated**   | Minor expected dip                                  |
| Competitor bid changes            | **Unrelated**    | Not large enough to explain the trend               |

<br>

#### **Root Cause Summary**

The **primary driver** of the 35% conversion drop was the **new landing page** launched on 13 October, which introduced **extra form fields** and a **less engaging headline**, reducing completion rates.
This was **amplified** by a **shift toward lower-intent keywords** and **external market news** (FCA guidance) impacting user sentiment.

<br>

####  **Next Steps**

* Roll back or A/B test the **previous high-performing landing page**.
* Rebalance budget to **high-intent keywords** (e.g., “apply for credit card” instead of “check credit score”).
* Monitor **conversion recovery** and **bounce rate** post-adjustment.
* Add **annotation in dashboards** to document the event for future trend analysis.


## Resources

- https://medium.com/@data-storyteller/frameworks-for-answering-business-case-questions-during-analytics-and-data-science-interviews-373a67b78378

- https://medium.com/sequoia-capital/analyzing-metric-changes-c4144ee436a