# Evoastra Ventures Intern Assessment Task
**Duration: 35 Minutes | Total Points: 100**

---

## Candidate Information

- **Name:** Shubham Chorge  
- **Email:** shubhamchorge777@gmail.com
- **Phone:** 8468840991  
- **College/University:** Ajennkya D Y Patil University pune  
- **Course/Branch:** BCA (AIML)  
- **Start Time:** 16-02-2026
- **End Time:** 18-02-2026


# SECTION A: Data Understanding & Basic Analysis


## Question 1

### Customer Age:
**Data Type:** Numerical (Integer/Continuous)  
Used for segmentation, correlation analysis, and regression modeling.

### Gender:
**Data Type:** Categorical (Nominal)  
Used for customer segmentation and targeted marketing.

### Total Purchase Amount:
**Data Type:** Numerical (Continuous/Float)  
Used for revenue analysis and profitability modeling.

### Churn:
**Data Type:** Binary Categorical (0/1)  
Target variable for churn prediction classification models.


## Question 2

a) Product categories with highest revenue:
→ Descriptive Analytics (Group-by & Aggregation)

b) Predicting customer churn:
→ Predictive Analytics (Classification Models)

c) Age and spending relationship:
→ Correlation / Regression Analysis

d) Payment method preferences:
→ Segmentation & Cross-tabulation Analysis


## Question 3 – Data Quality Assessment

### Issue 1:
Missing Values  
Detection: df.isnull().sum()

### Issue 2:
Duplicate Records  
Detection: df.duplicated()

### Issue 3:
Data Inconsistency (Price × Quantity mismatch)  
Detection: Validation Rule Check


In [1]:
import pandas as pd

# Load dataset
df = pd.read_csv("/content/ecommerce_customer_data_custom_ratios.csv")

# Check missing values
print("Missing Values:\n", df.isnull().sum())

# Check duplicates
print("\nDuplicate Records:", df.duplicated().sum())

# Validate Total Purchase Amount
df["Calculated_Total"] = df["Product Price"] * df["Quantity"]
print("\nMismatch Records:",
      (df["Calculated_Total"] != df["Total Purchase Amount"]).sum())


Missing Values:
 Customer ID                  0
Purchase Date                0
Product Category             1
Product Price                1
Quantity                     1
Total Purchase Amount        1
Payment Method               1
Customer Age                 1
Returns                  25751
Customer Name                1
Age                          1
Gender                       1
Churn                        1
dtype: int64

Duplicate Records: 0

Mismatch Records: 134570


# SECTION B: Customer Analysis & Business Intelligence


## Question 4

### Profit Formula:
Net Profit = (Average Purchase × (1 – Return Rate)) × 20% – 180


In [2]:
# Given Data
segments = {
    "Young": {"avg_purchase": 850, "return_rate": 0.12, "churn": 0.25},
    "Middle": {"avg_purchase": 1200, "return_rate": 0.08, "churn": 0.15},
    "Senior": {"avg_purchase": 950, "return_rate": 0.15, "churn": 0.30}
}

acquisition_cost = 180
profit_margin = 0.20

results = {}

for seg, data in segments.items():
    effective_revenue = data["avg_purchase"] * (1 - data["return_rate"])
    profit = effective_revenue * profit_margin
    net_profit = profit - acquisition_cost
    retention = 1 - data["churn"]

    results[seg] = {
        "Net Profit": round(net_profit, 2),
        "Retention Rate": retention
    }

pd.DataFrame(results).T


Unnamed: 0,Net Profit,Retention Rate
Young,-30.4,0.75
Middle,40.8,0.85
Senior,-18.5,0.7


### Result Interpretation:

- Middle-aged segment generates highest net profit.
- Middle-aged segment also has highest retention rate.
- Therefore, Middle-aged customers have highest CLV.


## Question 5 – Strategic Recommendations

### Strategy 1:
Focus on retention marketing for middle-aged segment.

### Strategy 2:
Reduce return rates in young and senior segments using better product clarity and engagement.


## Question 6

### Data Analysis Plan:
- Compare average purchase value by category
- Analyze return rate differences
- Perform churn rate comparison
- Logistic regression with Product Category as predictor

### Action Plan:
- Warranty & support for Electronics
- Post-purchase engagement
- Installment payment options


# SECTION C: Research Methodology & Predictive Analytics


## Question 7

### Feature Selection:
Age, Gender, Product Category, Price, Quantity, Payment Method, Returns

### Preprocessing:
- Handle missing values
- Encode categorical variables
- Scale features
- Train-test split
- Handle class imbalance (SMOTE)

### Evaluation Metrics:
Accuracy, Precision, Recall, F1-score, ROC-AUC


## Question 8

### Challenge 1:
Imbalanced Dataset  
Solution: Use SMOTE

### Challenge 2:
Model Interpretability  
Solution: SHAP Analysis

### Challenge 3:
Retention Cost  
Solution: Target high-probability churners with high CLV


# SECTION D: Professional Communication & Problem-Solving


## Question 9

I would re-evaluate churn and returns using month-wise cohort analysis to detect short-term churn patterns. The initial correlation may have ignored temporal effects. I would build visual dashboards showing churn spikes after returns and clearly communicate that although overall correlation is weak, immediate churn risk post-return is significant and requires targeted intervention.


## Question 10

### Priority 1:
Ensure data cleaning and validation.

### Priority 2:
Define KPIs aligned with revenue and churn reduction.

### Priority 3:
Maintain structured collaboration with dashboards and milestone tracking.


# Self-Assessment

Completed within 35 minutes: Yes  
Most Time: Section B  
Most Challenging: Section C  

Confidence:
Section A: 9  
Section B: 9  
Section C: 8  
Section D: 9  

---

I confirm that I have completed this assessment independently.

**Digital Signature:** Shubham Chorge  
**Final Submission Time:**19-02-2026
