# üèÜ 60-MINUTE DATA SCIENCE HACKATHON CHALLENGE

## üéØ THE BATTLE: Team A vs Team B

**Duration:** 60 minutes  
**Teams:** 2 teams of ~5 people each  
**Prize:** Glory + Bragging Rights üèÖ

---

## üìã SCENARIO

You work for **FastBank**, a digital bank losing customers to competitors. Your task: analyze customer data, build a churn prediction model, and present insights to save the company!

**Dataset:** `quick_hackathon_data.csv`
- 300 customers
- 10 features
- Target: `churn` (0=Stay, 1=Leave)

---

## ‚ö†Ô∏è CRITICAL RULES

### üç∫ THE DRINKING GAME
**Every Git error = One sip!** üç∫
- Merge conflict? Drink!
- Push rejected? Drink!
- Wrong branch? Drink!
- Forgot to pull? Drink!

### üíª GitHub Requirements (25% of score!)
1. **Must use GitHub** for ALL changes
2. **Minimum 10 commits** from the team
3. **All team members** must commit at least once
4. **Descriptive commit messages** required
5. **No local-only work** - everything must be pushed!

---

## üìä TASKS BREAKDOWN (60 minutes)

### **Part 1: EDA & Visualization** (25 minutes) - Carles Style
**Points: 25**

1. Load dataset and show first 10 rows **(2 pts)**
2. Check for missing values **(2 pts)**
3. Create distribution plot for `age` **(3 pts)**
4. Create bar chart showing churn by `account_type` **(4 pts)**
5. Create correlation heatmap **(4 pts)**
6. Analyze text feedback length distribution **(5 pts)**
7. Write 3 insights in markdown **(5 pts)**

### **Part 2: Feature Engineering & ML** (30 minutes) - Jordi Style
**Points: 50**

8. Handle missing values **(5 pts)**
9. Create 5+ text features from `feedback` **(10 pts)**
10. Encode categorical variables **(5 pts)**
11. Create 2 interaction features **(5 pts)**
12. Train/test split **(3 pts)**
13. Train Logistic Regression **(5 pts)**
14. Train Random Forest **(5 pts)**
15. Evaluate with classification report **(7 pts)**
16. Feature importance visualization **(5 pts)**

### **Part 3: Presentation** (5 minutes)
**Points: 25**

17. Clear notebook organization **(5 pts)**
18. Markdown documentation **(5 pts)**
19. GitHub commit quality **(10 pts)**
20. Final 2-minute pitch **(5 pts)**

**Total: 100 points**

---

## üîß GITHUB SETUP (Do this FIRST! - 5 minutes)

### Team A Repository Name:
```
fastbank-teamA-hackathon-2024
```

### Team B Repository Name:
```
fastbank-teamB-hackathon-2024
```

### Initial Setup (One person per team):

```bash
# 1. Create repo on GitHub
# 2. Clone it
git clone https://github.com/[username]/fastbank-team[A/B]-hackathon-2024.git
cd fastbank-team[A/B]-hackathon-2024

# 3. Create files
touch hackathon_solution.ipynb
# Copy dataset here

# 4. Create .gitignore
echo "*.pyc\n__pycache__/\n.ipynb_checkpoints/\n.DS_Store" > .gitignore

# 5. Initial commit
git add .
git commit -m "üöÄ Team [A/B] initial setup"
git push origin main

# 6. Add all team members as collaborators on GitHub
# Settings > Collaborators > Add people
```

### Everyone else:
```bash
git clone https://github.com/[username]/fastbank-team[A/B]-hackathon-2024.git
cd fastbank-team[A/B]-hackathon-2024
```

---

## üéÆ WORKFLOW (IMPORTANT!)

### Before ANY work:
```bash
git pull origin main
```

### After completing a task:
```bash
git add hackathon_solution.ipynb
git commit -m "[Name] Completed Task X: [description]"
git pull origin main  # Check for conflicts!
git push origin main
```

### Example commit messages:
- `"[Alice] Task 1: Loaded dataset and displayed first 10 rows"`
- `"[Bob] Task 3: Created age distribution plot"`
- `"[Carol] Task 9: Engineered 7 text features from feedback"`
- `"[Dave] Task 13: Trained logistic regression, accuracy 0.78"`

---

## üìä DATASET DESCRIPTION

**File:** `quick_hackathon_data.csv`

| Column | Type | Description |
|--------|------|-------------|
| customer_id | int | Unique identifier |
| age | int | Customer age |
| income | float | Annual income (some missing!) |
| credit_score | int | Credit score (some missing!) |
| account_balance | int | Current balance |
| num_transactions | int | Monthly transactions |
| city | str | Customer city |
| account_type | str | Basic/Premium/Gold |
| feedback | str | **TEXT** - Customer feedback |
| **churn** | int | **TARGET** - 0=Stay, 1=Leave |

---

## üèÉ TEAM STRATEGY TIPS

### Divide & Conquer:
- **Person 1:** Tasks 1-3 (Load data, check nulls, age plot)
- **Person 2:** Tasks 4-5 (Churn bar chart, correlation heatmap)
- **Person 3:** Tasks 6-7 (Text analysis, insights)
- **Person 4:** Tasks 8-11 (Feature engineering)
- **Person 5:** Tasks 12-16 (Model training, evaluation)

### Git Coordination:
1. **Communicate!** - "I'm working on Task X"
2. **Pull before push** - Always!
3. **Commit often** - Every task completion
4. **Descriptive messages** - Helps everyone
5. **Stay calm on conflicts** - They happen! (And you drink üç∫)

---

## ‚è±Ô∏è TIME MANAGEMENT

```
0:00-0:05   Setup GitHub (5 min)
0:05-0:30   Part 1: EDA & Viz (25 min)
0:30-1:00   Part 2: Features & ML (30 min)
1:00        STOP! Prepare 2-min presentation
```

---

## üèÜ SCORING RUBRIC

### Technical (75 points):
- **EDA completeness** (25 pts)
- **Feature engineering** (25 pts)
- **Model performance** (25 pts)

### Process (25 points):
- **GitHub commits** (10 pts) - Quality & quantity
- **Team collaboration** (5 pts) - Everyone contributes
- **Documentation** (5 pts) - Clear markdown
- **Presentation** (5 pts) - Clear insights

### Bonus Points:
- **+5:** First team to complete all tasks
- **+3:** Best feature engineering creativity
- **+2:** Best data visualization
- **-2:** Per team member with 0 commits üò±

---

## üö¶ READY?

### Pre-Flight Checklist:
- [ ] GitHub repo created
- [ ] All team members have access
- [ ] Dataset downloaded
- [ ] Jupyter notebook open
- [ ] Timer ready
- [ ] Drinks ready üç∫

### On Your Marks...

### Get Set...

### GO! ‚è±Ô∏è

---

# START YOUR SOLUTION BELOW ‚¨áÔ∏è

In [2]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ Libraries loaded!")

‚úÖ Libraries loaded!


---
## üîç PART 1: EDA & VISUALIZATION (Tasks 1-7)

### Task 1: Load dataset (2 pts)

In [4]:
#hi
df = pd.read_csv('quick_hackathon_data.csv')
df.head(10)

Unnamed: 0,customer_id,age,income,credit_score,account_balance,num_transactions,city,account_type,feedback,churn
0,1,63,,831.0,45737,178,Houston,Premium,Not happy with fees.,0
1,2,20,92431.0,416.0,22305,135,Houston,Basic,Poor customer support.,1
2,3,46,58106.0,453.0,39902,182,Houston,Gold,"Terrible experience, will switch banks.",0
3,4,52,115093.0,,35976,43,NYC,Basic,Great service! Very satisfied.,0
4,5,56,,818.0,12115,104,NYC,Basic,Not happy with fees.,0
5,6,35,65205.0,750.0,41787,176,NYC,Gold,Great service! Very satisfied.,1
6,7,37,36928.0,611.0,20915,70,NYC,Premium,Excellent! Best bank ever!,1
7,8,60,82887.0,,30030,186,LA,Basic,"Terrible experience, will switch banks.",1
8,9,40,35745.0,800.0,12959,189,NYC,Basic,Good experience overall.,0
9,10,51,99714.0,393.0,49060,170,Houston,Premium,Not happy with fees.,1


### Task 2: Check missing values (2 pts)

In [None]:
# YOUR CODE HERE


### Task 3: Age distribution plot (3 pts)

In [None]:
# YOUR CODE HERE


### Task 4: Churn by account_type bar chart (4 pts)

In [None]:
# YOUR CODE HERE


### Task 5: Correlation heatmap (4 pts)

In [None]:
# YOUR CODE HERE


### Task 6: Text feedback length analysis (5 pts)

In [None]:
# YOUR CODE HERE


### Task 7: Write 3 key insights (5 pts)

**Insight 1:**

[YOUR INSIGHT HERE]

**Insight 2:**

[YOUR INSIGHT HERE]

**Insight 3:**

[YOUR INSIGHT HERE]

---
## üî® PART 2: FEATURE ENGINEERING & ML (Tasks 8-16)

### Task 8: Handle missing values (5 pts)

In [None]:
# YOUR CODE HERE


### Task 9: Create 5+ text features from feedback (10 pts)

In [None]:
# YOUR CODE HERE


### Task 10: Encode categorical variables (5 pts)

In [None]:
# YOUR CODE HERE


### Task 11: Create 2 interaction features (5 pts)

In [None]:
# YOUR CODE HERE


### Task 12: Train/test split (3 pts)

In [None]:
# YOUR CODE HERE


### Task 13: Train Logistic Regression (5 pts)

In [None]:
# YOUR CODE HERE


### Task 14: Train Random Forest (5 pts)

In [None]:
# YOUR CODE HERE


### Task 15: Evaluation with classification report (7 pts)

In [None]:
# YOUR CODE HERE


### Task 16: Feature importance visualization (5 pts)

In [None]:
# YOUR CODE HERE


---
## üé§ FINAL PRESENTATION (2 minutes)

**Prepare to present:**
1. Key insights from EDA
2. Features you engineered
3. Model performance
4. Recommendations to reduce churn

**Keep it short and impactful!** ‚è±Ô∏è