# Seleksi baris dengan banyak kriteria

## Import Modules

In [1]:
import pandas as pd

print("Pandas version:", pd.__version__)

Pandas version: 2.3.1


## Persiapan Data Frame

In [4]:
df = pd.read_csv('./../data/titanicfull.csv')
df.head()

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
0,1,1,"Allen, Miss. Elisabeth Walton",female,29.0,0,0,24160,211.3375,B5,S
1,1,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.55,C22 C26,S
2,1,0,"Allison, Miss. Helen Loraine",female,2.0,1,2,113781,151.55,C22 C26,S
3,1,0,"Allison, Mr. Hudson Joshua Creighton",male,30.0,1,2,113781,151.55,C22 C26,S
4,1,0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1,2,113781,151.55,C22 C26,S


## Seleksi baris dengan banyak kriteria

In [5]:
df[(df['sex']=='female') & (df['age'] >= 60) & (df['embarked'] == 'S') & (df['survived'] == 1)]

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
6,1,1,"Andrews, Miss. Kornelia Theodosia",female,63.0,1,0,13502,77.9583,D7,S
61,1,1,"Cavendish, Mrs. Tyrell William (Julia Florence...",female,76.0,1,0,19877,78.85,C46,S
83,1,1,"Crosby, Mrs. Edward Gifford (Catherine Elizabe...",female,64.0,1,1,112901,26.55,B26,S
116,1,1,"Fortune, Mrs. Mark (Mary McDougald)",female,60.0,1,4,19950,263.0,C23 C25 C27,S
1261,3,1,"Turkula, Mrs. (Hedwig)",female,63.0,0,0,4134,9.5875,,S


In [6]:
df[
    (df['sex'] == 'female') &
    (df['age'] >= 60) &
    (df['embarked'] == 'S') &
    (df['survived'] == 1)
]

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
6,1,1,"Andrews, Miss. Kornelia Theodosia",female,63.0,1,0,13502,77.9583,D7,S
61,1,1,"Cavendish, Mrs. Tyrell William (Julia Florence...",female,76.0,1,0,19877,78.85,C46,S
83,1,1,"Crosby, Mrs. Edward Gifford (Catherine Elizabe...",female,64.0,1,1,112901,26.55,B26,S
116,1,1,"Fortune, Mrs. Mark (Mary McDougald)",female,60.0,1,4,19950,263.0,C23 C25 C27,S
1261,3,1,"Turkula, Mrs. (Hedwig)",female,63.0,0,0,4134,9.5875,,S


In [7]:
k1 = df['sex'] == 'female'
k2 = df['age'] >= 60
k3 = df['embarked'] == 'S'
k4 = df['survived'] == 1

df[k1 & k2 & k3 & k4]

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
6,1,1,"Andrews, Miss. Kornelia Theodosia",female,63.0,1,0,13502,77.9583,D7,S
61,1,1,"Cavendish, Mrs. Tyrell William (Julia Florence...",female,76.0,1,0,19877,78.85,C46,S
83,1,1,"Crosby, Mrs. Edward Gifford (Catherine Elizabe...",female,64.0,1,1,112901,26.55,B26,S
116,1,1,"Fortune, Mrs. Mark (Mary McDougald)",female,60.0,1,4,19950,263.0,C23 C25 C27,S
1261,3,1,"Turkula, Mrs. (Hedwig)",female,63.0,0,0,4134,9.5875,,S


## 📋 Kesimpulan: Seleksi Baris dengan Banyak Kriteria

### 🎯 Konsep Utama

**Seleksi baris dengan multiple criteria** adalah salah satu operasi paling penting dalam analisis data. Pandas menyediakan beberapa cara untuk melakukan filtering dengan kondisi kompleks menggunakan **logical operators** dan **boolean indexing**.

### 🔧 Tiga Pendekatan Utama

| Pendekatan | Syntax | Kelebihan | Kekurangan |
|------------|--------|-----------|------------|
| **Inline** | `df[(cond1) & (cond2)]` | Compact, cepat untuk kondisi sederhana | Sulit dibaca untuk kondisi kompleks |
| **Multi-line** | `df[(cond1) &\n    (cond2)]` | Lebih readable | Masih verbose untuk banyak kondisi |
| **Variable-based** | `k1 = cond1; df[k1 & k2]` | Sangat readable, reusable | Lebih banyak variabel |

### 🔍 Logical Operators di Pandas

| Operator | Fungsi | Contoh | Catatan |
|----------|--------|--------|---------|
| **&** | AND | `(cond1) & (cond2)` | **Wajib pakai parentheses!** |
| **\|** | OR | `(cond1) \| (cond2)` | **Wajib pakai parentheses!** |
| **~** | NOT | `~(condition)` | Negasi kondisi |
| **^** | XOR | `(cond1) ^ (cond2)` | Exclusive OR |

### ⚠️ Common Pitfalls

```python
# ❌ SALAH - Tanpa parentheses
df[df['age'] > 30 & df['sex'] == 'male']  # Error!

# ✅ BENAR - Dengan parentheses
df[(df['age'] > 30) & (df['sex'] == 'male')]

# ❌ SALAH - Menggunakan 'and' instead of '&'
df[(df['age'] > 30) and (df['sex'] == 'male')]  # Error!

# ✅ BENAR - Menggunakan '&'
df[(df['age'] > 30) & (df['sex'] == 'male')]
```

### 💡 Contoh Praktis dari Notebook

```python
# 1. Inline approach - Sulit dibaca
df[(df['sex']=='female') & (df['age'] >= 60) & (df['embarked'] == 'S') & (df['survived'] == 1)]

# 2. Multi-line approach - Lebih readable
df[
    (df['sex'] == 'female') &
    (df['age'] >= 60) &
    (df['embarked'] == 'S') &
    (df['survived'] == 1)
]

# 3. Variable-based approach - Paling readable
k1 = df['sex'] == 'female'
k2 = df['age'] >= 60
k3 = df['embarked'] == 'S'
k4 = df['survived'] == 1
df[k1 & k2 & k3 & k4]
```

### 🚀 Advanced Techniques

#### **1. Using `query()` Method**
```python
# Lebih readable untuk kondisi kompleks
df.query("sex == 'female' and age >= 60 and embarked == 'S' and survived == 1")

# Dengan variabel
min_age = 60
df.query("sex == 'female' and age >= @min_age")
```

#### **2. Using `isin()` for Multiple Values**
```python
# Multiple values dalam satu kolom
df[df['embarked'].isin(['S', 'C']) & (df['age'] > 30)]

# Multiple conditions dengan isin
cities = ['S', 'C']
ages = range(20, 40)
df[df['embarked'].isin(cities) & df['age'].isin(ages)]
```

#### **3. Combining with String Methods**
```python
# String contains dengan multiple criteria
df[
    df['name'].str.contains('Mrs|Miss') & 
    (df['age'] > 25) & 
    (df['survived'] == 1)
]
```

### 📊 Performance Comparison

| Method | Speed | Readability | Memory | Use Case |
|--------|-------|-------------|--------|----------|
| **Boolean Indexing** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | General purpose |
| **query()** | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | Complex conditions |
| **loc with conditions** | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | Column selection + filtering |

### 🔍 Real-World Use Cases

#### **Business Intelligence**
```python
# High-value customers
high_value = (df['purchase_amount'] > 1000) & (df['frequency'] > 5)

# Churned customers
churned = (df['last_login'] < '2024-01-01') & (df['status'] == 'active')
```

#### **Data Quality Checks**
```python
# Missing critical data
incomplete = df['email'].isna() & df['phone'].isna() & (df['age'] > 18)

# Outliers
outliers = (df['salary'] > df['salary'].quantile(0.95)) | (df['salary'] < 0)
```

#### **Scientific Analysis**
```python
# Experimental group
experiment = (df['treatment'] == 'A') & (df['age'].between(18, 65)) & (df['gender'] == 'F')

# Control group with matching criteria
control = (df['treatment'] == 'B') & (df['age'].between(18, 65)) & (df['gender'] == 'F')
```

### 🎯 Best Practices

#### **1. Readability First**
```python
# ✅ Gunakan variable names yang descriptive
is_female = df['sex'] == 'female'
is_senior = df['age'] >= 60
from_southampton = df['embarked'] == 'S'
survived = df['survived'] == 1

result = df[is_female & is_senior & from_southampton & survived]
```

#### **2. Performance Optimization**
```python
# ✅ Urutkan kondisi dari yang paling selective
# Kondisi yang mengeliminasi banyak rows di depan
most_selective = df['rare_category'] == 'specific_value'
less_selective = df['age'] > 30
result = df[most_selective & less_selective]  # Faster
```

#### **3. Error Prevention**
```python
# ✅ Handle missing values
safe_filter = (
    df['age'].notna() & 
    (df['age'] > 30) & 
    df['name'].notna()
)
```

### 🔧 Advanced Patterns

#### **1. Dynamic Filtering**
```python
def filter_passengers(dataframe, **criteria):
    """Dynamic filtering berdasarkan criteria yang diberikan"""
    mask = pd.Series([True] * len(dataframe), index=dataframe.index)
    
    for column, value in criteria.items():
        if isinstance(value, (list, tuple)):
            mask &= dataframe[column].isin(value)
        else:
            mask &= dataframe[column] == value
    
    return dataframe[mask]

# Usage
result = filter_passengers(df, sex='female', embarked=['S', 'C'], survived=1)
```

#### **2. Conditional Aggregation**
```python
# Aggregation dengan multiple criteria
summary = df[
    (df['sex'] == 'female') & 
    (df['age'] > 30)
].groupby('embarked').agg({
    'survived': ['count', 'sum', 'mean'],
    'age': ['mean', 'std']
})
```

### 🎯 Key Takeaways

- ✅ **Selalu gunakan parentheses** dengan logical operators (&, |, ~)
- ✅ **Gunakan & instead of `and`** untuk pandas operations
- ✅ **Variable-based approach** paling readable untuk kondisi kompleks
- ✅ **`query()` method** excellent untuk kondisi yang melibatkan banyak string
- ✅ **Urutkan kondisi** dari yang paling selective untuk performance
- ✅ **Handle missing values** untuk menghindari unexpected results
- ✅ **Gunakan `isin()`** untuk multiple values dalam satu kondisi

**Multiple criteria filtering adalah foundation untuk advanced data analysis dan business intelligence!** 🚀