
# DSA Phase 2 Analysis

## 1. Introduction and Motivation
The purpose of this study is to investigate which mentor and mentoring factors have a meaningful impact on student retention at the Coachify coaching platform.
The main goal is to identify potential improvements in mentor practices to enhance student engagement and retention rates, which ultimately would improve the platform's profitability and effectiveness.

## 2. Data Collection and Preprocessing
The data were collected from two sources:

- The Coachify student database, which included student enrollment information (mentor assignment, membership start month, dropout month).
- A survey administered to 30 mentors, gathering data about their weekly call frequency, average call duration, messaging habits, and primary communication format (video or voice).

**Data Cleaning Steps:**
- Students who initially purchased long-term packages were excluded to avoid bias towards artificially high retention.
- Mentors with fewer than 3 students were excluded to reduce statistical noise.
- Mentor and student names were anonymized.
- Weekly survey responses were multiplied by 4 to approximate monthly activity levels.

The cleaned dataset included 22 mentors and approximately 116 students.

## 3. Retention Metric Definition
Retention was defined as the transition from the 1st month to the 2nd month.

\[ \text{Pass Rate} = \frac{\text{Number of students continuing to 2nd month}}{\text{Total number of students}} \]

This metric was used because:
- It captures early-stage student engagement, which is critical in subscription-based models.
- It provides a clear, binary outcome (continued/dropped) ideal for statistical comparison.

## 4. Research Questions
The analysis aimed to answer the following research questions:

| No. | Research Question |
|----|--------------------|
| RQ1 | Does the mentor's YKS ranking affect student retention? |
| RQ2 | Does the primary communication format (video or voice) impact student retention? |
| RQ3 | Is there a relationship between the average call duration and student retention? |
| RQ4 | Does the frequency of weekly calls affect retention? |
| RQ5 | Does the frequency of sending progress monitoring messages affect retention? |
| RQ6 | Does the number of weekly communication days correlate with student retention? |


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import pearsonr

df = pd.read_excel("/content/DSA_Student_List_Final.xlsx", sheet_name="Merged")
plt.figure(figsize=(8, 5))
plt.scatter(df["YKS_Ranking"], df["Pass_Rate"], color='blue')
plt.xlabel("YKS Ranking")
plt.ylabel("Pass Rate")
plt.title("YKS Ranking vs. Pass Rate")
plt.grid(True)
plt.show()

corr, pval = pearsonr(df["YKS_Ranking"], df["Pass_Rate"])
print("Pearson correlation:", round(corr, 3))
print("p-value:", round(pval, 4))