# Mengakses sekelompok data dengan `get_group()`

## Import Modules

In [1]:
import pandas as pd

print("Pandas version:", pd.__version__)

Pandas version: 2.3.1


## Persiapan Data Frame

In [3]:
df = pd.read_csv('./../data/titanicfull.csv')
df.head()

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
0,1,1,"Allen, Miss. Elisabeth Walton",female,29.0,0,0,24160,211.3375,B5,S
1,1,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.55,C22 C26,S
2,1,0,"Allison, Miss. Helen Loraine",female,2.0,1,2,113781,151.55,C22 C26,S
3,1,0,"Allison, Mr. Hudson Joshua Creighton",male,30.0,1,2,113781,151.55,C22 C26,S
4,1,0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1,2,113781,151.55,C22 C26,S


## Mengaksees sekelompok data yang sudah terkelompok dengan `get_group()`

In [4]:
grouped_df = df.groupby('sex')

In [6]:
grouped_df.get_group('female').head(10)

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
0,1,1,"Allen, Miss. Elisabeth Walton",female,29.0,0,0,24160,211.3375,B5,S
2,1,0,"Allison, Miss. Helen Loraine",female,2.0,1,2,113781,151.55,C22 C26,S
4,1,0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1,2,113781,151.55,C22 C26,S
6,1,1,"Andrews, Miss. Kornelia Theodosia",female,63.0,1,0,13502,77.9583,D7,S
8,1,1,"Appleton, Mrs. Edward Dale (Charlotte Lamson)",female,53.0,2,0,11769,51.4792,C101,S
11,1,1,"Astor, Mrs. John Jacob (Madeleine Talmadge Force)",female,18.0,1,0,PC 17757,227.525,C62 C64,C
12,1,1,"Aubart, Mme. Leontine Pauline",female,24.0,0,0,PC 17477,69.3,B35,C
13,1,1,"Barber, Miss. Ellen ""Nellie""",female,26.0,0,0,19877,78.85,,S
17,1,1,"Baxter, Mrs. James (Helene DeLaudeniere Chaput)",female,50.0,0,1,PC 17558,247.5208,B58 B60,C
18,1,1,"Bazzani, Miss. Albina",female,32.0,0,0,11813,76.2917,D15,C


In [7]:
grouped_df = df.groupby('survived')

In [8]:
grouped_df.get_group(1).head(10)

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
0,1,1,"Allen, Miss. Elisabeth Walton",female,29.0,0,0,24160,211.3375,B5,S
1,1,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.55,C22 C26,S
5,1,1,"Anderson, Mr. Harry",male,48.0,0,0,19952,26.55,E12,S
6,1,1,"Andrews, Miss. Kornelia Theodosia",female,63.0,1,0,13502,77.9583,D7,S
8,1,1,"Appleton, Mrs. Edward Dale (Charlotte Lamson)",female,53.0,2,0,11769,51.4792,C101,S
11,1,1,"Astor, Mrs. John Jacob (Madeleine Talmadge Force)",female,18.0,1,0,PC 17757,227.525,C62 C64,C
12,1,1,"Aubart, Mme. Leontine Pauline",female,24.0,0,0,PC 17477,69.3,B35,C
13,1,1,"Barber, Miss. Ellen ""Nellie""",female,26.0,0,0,19877,78.85,,S
14,1,1,"Barkworth, Mr. Algernon Henry Wilson",male,80.0,0,0,27042,30.0,A23,S
17,1,1,"Baxter, Mrs. James (Helene DeLaudeniere Chaput)",female,50.0,0,1,PC 17558,247.5208,B58 B60,C


## 📋 Kesimpulan: Mengakses Sekelompok Data dengan `get_group()`

### 🎯 Konsep Utama

**`get_group()`** adalah method pandas yang memungkinkan kita untuk **mengakses subset data spesifik** dari GroupBy object. Method ini sangat berguna ketika kita ingin **fokus pada analisis satu kelompok tertentu** tanpa harus melakukan aggregasi.

### 🔧 Syntax dan Penggunaan

```python
# Basic syntax
grouped_object = df.groupby('column_name')
specific_group = grouped_object.get_group('group_value')
```

| Parameter | Deskripsi | Contoh |
|-----------|-----------|--------|
| **name** | Nama/nilai grup yang ingin diakses | `'female'`, `1`, `('A', 'B')` |
| **obj** | GroupBy object (optional) | Default menggunakan original object |

### 💡 Contoh Praktis dari Notebook

```python
# 1. Grouping berdasarkan jenis kelamin
grouped_df = df.groupby('sex')
female_passengers = grouped_df.get_group('female')

# 2. Grouping berdasarkan status survival
grouped_df = df.groupby('survived')
survivors = grouped_df.get_group(1)  # Yang selamat
non_survivors = grouped_df.get_group(0)  # Yang tidak selamat
```

### 🔍 Berbagai Skenario Penggunaan

#### **1. Single Column Grouping**
```python
# Group berdasarkan kelas
class_groups = df.groupby('pclass')
first_class = class_groups.get_group(1)
second_class = class_groups.get_group(2)
third_class = class_groups.get_group(3)

print(f"First class passengers: {len(first_class)}")
print(f"Second class passengers: {len(second_class)}")
print(f"Third class passengers: {len(third_class)}")
```

#### **2. Multiple Column Grouping**
```python
# Group berdasarkan beberapa kolom
multi_groups = df.groupby(['sex', 'pclass'])

# Akses grup spesifik dengan tuple
female_first_class = multi_groups.get_group(('female', 1))
male_third_class = multi_groups.get_group(('male', 3))

# Analisis grup tertentu
survival_rate = female_first_class['survived'].mean()
print(f"Survival rate female first class: {survival_rate:.2%}")
```

#### **3. Categorical Data Grouping**
```python
# Group berdasarkan pelabuhan embarkasi
port_groups = df.groupby('embarked')
southampton = port_groups.get_group('S')
cherbourg = port_groups.get_group('C')
queenstown = port_groups.get_group('Q')

# Analisis per pelabuhan
for port, data in port_groups:
    print(f"Port {port}: {len(data)} passengers, "
          f"survival rate: {data['survived'].mean():.2%}")
```

### 🆚 Alternatif Methods untuk Akses Grup

| Method | Syntax | Kelebihan | Kekurangan |
|--------|--------|-----------|------------|
| **get_group()** | `grouped.get_group('value')` | Simple, direct access | Hanya satu grup per call |
| **Boolean indexing** | `df[df['col'] == 'value']` | Flexible, multiple conditions | Perlu rebuild query |
| **query()** | `df.query("col == 'value'")` | Readable for complex conditions | String-based, slower |
| **Iteration** | `for name, group in grouped:` | Access all groups | Need loop structure |

### 📊 Performance Comparison

```python
import time

# Method 1: get_group()
start = time.time()
grouped = df.groupby('sex')
female_data1 = grouped.get_group('female')
time1 = time.time() - start

# Method 2: Boolean indexing
start = time.time()
female_data2 = df[df['sex'] == 'female']
time2 = time.time() - start

# Method 3: query()
start = time.time()
female_data3 = df.query("sex == 'female'")
time3 = time.time() - start

print(f"get_group(): {time1:.6f} seconds")
print(f"Boolean indexing: {time2:.6f} seconds")
print(f"query(): {time3:.6f} seconds")
```

### 🚀 Advanced Use Cases

#### **1. Exploratory Data Analysis**
```python
def analyze_group(grouped_obj, group_name, target_col='survived'):
    """Analisis detail untuk grup tertentu"""
    group_data = grouped_obj.get_group(group_name)
    
    analysis = {
        'count': len(group_data),
        'survival_rate': group_data[target_col].mean(),
        'age_stats': {
            'mean': group_data['age'].mean(),
            'median': group_data['age'].median(),
            'std': group_data['age'].std()
        },
        'fare_stats': {
            'mean': group_data['fare'].mean(),
            'median': group_data['fare'].median()
        }
    }
    
    return analysis

# Usage
sex_groups = df.groupby('sex')
female_analysis = analyze_group(sex_groups, 'female')
male_analysis = analyze_group(sex_groups, 'male')
```

#### **2. Comparative Analysis**
```python
def compare_groups(df, group_col, groups_to_compare):
    """Membandingkan beberapa grup"""
    grouped = df.groupby(group_col)
    comparison = {}
    
    for group_name in groups_to_compare:
        group_data = grouped.get_group(group_name)
        comparison[group_name] = {
            'size': len(group_data),
            'survival_rate': group_data['survived'].mean(),
            'avg_age': group_data['age'].mean(),
            'avg_fare': group_data['fare'].mean()
        }
    
    return pd.DataFrame(comparison).T

# Usage
class_comparison = compare_groups(df, 'pclass', [1, 2, 3])
print(class_comparison)
```

#### **3. Data Validation & Quality Checks**
```python
def validate_groups(df, group_col):
    """Validasi data per grup"""
    grouped = df.groupby(group_col)
    validation_report = {}
    
    for group_name in df[group_col].unique():
        if pd.notna(group_name):  # Skip NaN groups
            group_data = grouped.get_group(group_name)
            validation_report[group_name] = {
                'missing_values': group_data.isnull().sum().to_dict(),
                'duplicates': group_data.duplicated().sum(),
                'data_types': group_data.dtypes.to_dict()
            }
    
    return validation_report

# Usage
validation = validate_groups(df, 'pclass')
```

### 🔍 Error Handling & Edge Cases

```python
def safe_get_group(grouped_obj, group_name):
    """Safely get group dengan error handling"""
    try:
        return grouped_obj.get_group(group_name)
    except KeyError:
        print(f"Group '{group_name}' not found in the data")
        available_groups = list(grouped_obj.groups.keys())
        print(f"Available groups: {available_groups}")
        return None

# Usage
sex_groups = df.groupby('sex')
unknown_group = safe_get_group(sex_groups, 'unknown')  # Safe handling

# Check available groups
print("Available groups:", list(sex_groups.groups.keys()))
print("Group sizes:", sex_groups.size())
```

### 💡 Best Practices

#### **1. Efficient Group Access**
```python
# ✅ Reuse grouped object untuk multiple access
grouped = df.groupby('sex')
female_data = grouped.get_group('female')
male_data = grouped.get_group('male')

# ❌ Avoid re-grouping untuk setiap akses
female_data = df.groupby('sex').get_group('female')  # Inefficient
male_data = df.groupby('sex').get_group('male')      # Inefficient
```

#### **2. Memory Management**
```python
# ✅ Untuk large datasets, process groups satu per satu
def process_groups_efficiently(df, group_col):
    grouped = df.groupby(group_col)
    results = {}
    
    for group_name in df[group_col].unique():
        if pd.notna(group_name):
            group_data = grouped.get_group(group_name)
            # Process group_data
            results[group_name] = group_data.describe()
            # Group_data akan auto-cleanup setelah loop
    
    return results
```

#### **3. Documentation & Readability**
```python
def get_passenger_group(df, group_type, group_value):
    """
    Get specific passenger group from Titanic dataset
    
    Args:
        df: Titanic DataFrame
        group_type: Column to group by ('sex', 'pclass', 'embarked')
        group_value: Specific value to extract
    
    Returns:
        DataFrame: Subset of passengers matching criteria
    """
    grouped = df.groupby(group_type)
    return grouped.get_group(group_value)

# Usage dengan clear documentation
female_passengers = get_passenger_group(df, 'sex', 'female')
first_class = get_passenger_group(df, 'pclass', 1)
```

### 🎯 Common Use Cases dalam Business

| Skenario | Implementation | Business Value |
|----------|----------------|----------------|
| **Customer Segmentation** | `get_group('premium_customers')` | Targeted marketing |
| **Regional Analysis** | `get_group('asia_pacific')` | Regional strategies |
| **Product Categories** | `get_group('electronics')` | Category performance |
| **Time Period Analysis** | `get_group('Q1_2024')` | Seasonal insights |
| **User Behavior** | `get_group('high_engagement')` | User experience optimization |

### 🔍 Key Takeaways

- ✅ **`get_group()`** adalah cara tercepat untuk akses grup spesifik dari GroupBy object
- ✅ **Reuse grouped objects** untuk multiple group access agar efisien
- ✅ **Handle KeyError** dengan try-catch untuk grup yang mungkin tidak ada
- ✅ **Perfect untuk EDA** dan analisis detail per segment
- ✅ **Combine dengan analysis functions** untuk insight yang lebih dalam
- ✅ **Memory efficient** untuk large datasets dengan processing satu grup per waktu
- ✅ **Document group logic** untuk maintainability

**`get_group()` adalah essential tool untuk segment-specific analysis dan detailed data exploration!** 🚀