# Lecture Notes on Enhancements, A/B Testing, and Metrics Classification

## Overview
In this lecture, we will explore the process of adding enhancements to existing systems, focusing on practical applications of A/B testing and the crucial metrics to measure success. 

### Key Objectives
1. Understand the importance of enhancements in software development.
2. Learn the process of A/B testing and its significance in feature launches.
3. Classify different types of metrics used for measuring success, including North Star metrics, primary metrics, secondary metrics, and guardrail metrics.
4. Discuss the challenges associated with enhancements and A/B testing.

## Enhancements in Existing Systems

### Definition of Enhancements
Enhancements are improvements or additions made to an existing system or project. Unlike building a system from scratch, enhancements involve modifying or improving what is already developed.

### Challenges with Enhancements
- Employees often feel intimidated when tasked with enhancements because the original system may have been created by someone else.
- It's essential to understand the architecture and logic of the existing system to ensure changes do not disrupt functionality.

### Importance of Asking Clarifying Questions
When faced with enhancements, especially during an interview or project setup, it is vital to ask clarifying questions, such as:
- What specific metrics are we trying to optimize?
- What are the main business objectives aligned with this enhancement?

## A/B Testing Overview

### Definition of A/B Testing
A/B testing is a statistical method used to compare two versions of a feature or product (Version A vs. Version B) to determine which one performs better based on specific metrics.

### Importance of A/B Testing
1. **Validation of Changes**: Ensure that implemented changes lead to improved user engagement or performance.
2. **Data-Driven Decision Making**: A/B testing allows teams to make informed choices about features based on quantitative analysis.
3. **Performance Optimization**: Helps in identifying the most effective user interfaces, features, or functionalities.

## Steps in A/B Testing

### Step 1: Sample Size Determination
Determining the right sample size is a critical aspect of A/B testing. The sample size affects the reliability of the test results.

#### Key Components for Determining Sample Size
1. **Confidence Level (α)**: The percentage of time we are comfortable accepting that the test results might be due to random chance, commonly set at 95% (which corresponds to α = 0.05).

   $$
   P(\text{False Positive}) = \alpha = 0.05
   $$

2. **Effect Size**: The smallest difference in performance that is meaningful to detect. 

   $$
   \text{Effect Size} = P_2 - P_1
   $$

3. **Statistical Power (1 - β)**: The probability of correctly rejecting the null hypothesis when it is false, typically targeted at 80%.

   $$
   \text{Power} = 1 - \beta
   $$

### Step 2: Choosing the Right Statistical Test
Selecting an appropriate statistical test helps in accurately analyzing the data gathered from the A/B test.

The Z-test for proportions is often used when comparing ratios or proportions of two groups.

4. **Z-Test for Proportions**: Appropriate for comparing two proportions. 

   $$
   Z = \frac{(P_1 - P_2)}{\sqrt{P(1 - P) \cdot \left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}
   $$

   Where $$ P $$ is the pooled proportion calculated as:

   $$
   P = \frac{(n_1P_1 + n_2P_2)}{(n_1 + n_2)}
   $$

## Metrics Classification

1. ### North Star Metrics
   - **Definition**: This is the primary metric that a business or product aims to improve. It reflects the core value users derive from the product.
   - **Examples**: Daily Active Users (DAUs), Monthly Active Users (MAUs), or total engagement time on a platform like Instagram.

2. ### Primary Metrics
   - **Definition**: Metrics that evaluate the immediate impact of a new feature or alteration.
   - **Examples**: The number of users engaging with a new reaction feature on Instagram stories within a specific timeframe.

3. ### Secondary Metrics
   - **Definition**: Supportive metrics providing additional insights linked to primary metrics.
   - **Examples**: Repeat usage rates of features, engagement rates from notifications regarding the new reaction feature.

4. ### Guardrail Metrics
   - **Definition**: Critical measures that ensure that a new feature does not adversely impact existing functionalities or user satisfaction. 
   - **Examples**: App crash rates, overall user satisfaction ratings, and the reaction rate of existing features to ensure they do not drop following a change.

## Example Problem: Enhancements with A/B Testing

### Problem Statement
You work with Amazon, comparing two parcel types (A and B) for shipment. Parcel A has a damage probability of **0.4**, and Parcel B has a probability of **0.6**. For 200 shipments (100 each), your task is to determine which parcel is statistically better.

### Solution Steps:
1. **Define Hypotheses**:
   - **Null Hypothesis (H0)**: $$ P_A = P_B $$
   - **Alternate Hypothesis (H1)**: $$ P_A \neq P_B $$

2. **Data Collection**:
   - Damage Probability for Parcel A: $$ P_A = 0.4 $$ 
   - Damage Probability for Parcel B: $$ P_B = 0.6 $$ 
   - Sample Sizes: Both $$ n_A $$ and $$ n_B = 100 $$

3. **Calculate Pooled Proportion**:
   $$
   P = \frac{(100 \cdot 0.4 + 100 \cdot 0.6)}{200} = 0.5
   $$

4. **Calculate the Z-score**:
   $$
   Z = \frac{(0.4 - 0.6)}{\sqrt{0.5 \cdot (1 - 0.5) \cdot \left(\frac{1}{100} + \frac{1}{100}\right)}}
   $$

   Continuing the equation leads to:
   $$
   Z \approx \frac{-0.2}{0.0707} \approx -2.828
   $$

5. **Determine Critical Values**:
   For a two-tailed test at $$ \alpha = 0.05 $$:
   - Critical Z-values: $$ \pm 1.96 $$

6. **Conclusion**:
   Since $$ |Z| \approx 2.828 $$ exceeds $$ 1.96 $$, we reject H0, indicating a statistically significant difference between the parcels. Parcel A (0.4) is statistically better than Parcel B (0.6).

### Recap of Key Concepts
1. **Hypothesis Testing**: Importance of setting up H0 and H1.
2. **Statistical Tests**: Application of Z-tests for proportions.
3. **Critical Value Analysis**: Importance of comparing calculated statistics to critical values.
4. **Impact Metrics**: Recognizing the role of North Star, primary, secondary, and guardrail metrics in assessing feature success.

## Additional Insights on A/B Testing
### Control vs Treatment Groups
- **Control Group**: Users who do not experience changes (e.g., Parcel A).
- **Treatment Group**: Users who experience the new feature (e.g., Parcel B).

### Understanding Power Analysis
- **Power Analysis**: Probability of correctly detecting an actual effect. A standard power of 0.80 implies an 80% chance of detecting an effect when it exists.

## Summary
These notes provided a comprehensive framework on enhancements in existing systems, A/B testing methodologies, metric classifications, and practical applications through an example with Amazon parcel shipment analysis. Each part contributes to making informed, data-driven decisions within organizations.