# Mengurutkan data berdasarkan kolom tertentu

## Import Modules

In [1]:
import pandas as pd

print("Pandas version:", pd.__version__)

Pandas version: 2.3.1


## Persiapan Data Frame

In [3]:
df = pd.read_csv('./../data/titanicfull.csv')
df.head()

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
0,1,1,"Allen, Miss. Elisabeth Walton",female,29.0,0,0,24160,211.3375,B5,S
1,1,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.55,C22 C26,S
2,1,0,"Allison, Miss. Helen Loraine",female,2.0,1,2,113781,151.55,C22 C26,S
3,1,0,"Allison, Mr. Hudson Joshua Creighton",male,30.0,1,2,113781,151.55,C22 C26,S
4,1,0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1,2,113781,151.55,C22 C26,S


## Mengurutkan data berdasarkan kolom tertentu

In [4]:
df.sort_values(by='age').head()

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
763,3,1,"Dean, Miss. Elizabeth Gladys ""Millvina""",female,0.17,1,2,C.A. 2315,20.575,,S
747,3,0,"Danbom, Master. Gilbert Sigvard Emanuel",male,0.33,0,2,347080,14.4,,S
1240,3,1,"Thomas, Master. Assad Alexander",male,0.42,0,1,2625,8.5167,,C
427,2,1,"Hamalainen, Master. Viljo",male,0.67,1,1,250649,14.5,,S
658,3,1,"Baclini, Miss. Helene Barbara",female,0.75,2,1,2666,19.2583,,C


In [5]:
df.sort_values(by='age', ascending=False).head()

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
14,1,1,"Barkworth, Mr. Algernon Henry Wilson",male,80.0,0,0,27042,30.0,A23,S
61,1,1,"Cavendish, Mrs. Tyrell William (Julia Florence...",female,76.0,1,0,19877,78.85,C46,S
1235,3,0,"Svensson, Mr. Johan",male,74.0,0,0,347060,7.775,,S
135,1,0,"Goldschmidt, Mr. George B",male,71.0,0,0,PC 17754,34.6542,A5,C
9,1,0,"Artagaveytia, Mr. Ramon",male,71.0,0,0,PC 17609,49.5042,,C


In [6]:
df.sort_values(by=['survived', 'age']).head()

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
747,3,0,"Danbom, Master. Gilbert Sigvard Emanuel",male,0.33,0,2,347080,14.4,,S
1111,3,0,"Peacock, Master. Alfred Edward",male,0.75,1,1,SOTON/O.Q. 3101315,13.775,,S
826,3,0,"Goodwin, Master. Sidney Leonard",male,1.0,5,2,CA 2144,46.9,,S
937,3,0,"Klasen, Miss. Gertrud Emilia",female,1.0,1,1,350405,12.1833,,S
1101,3,0,"Panula, Master. Eino Viljami",male,1.0,4,1,3101295,39.6875,,S


In [13]:
df.sort_values(by=['survived', 'age'], ascending=[False, True]).head()

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
763,3,1,"Dean, Miss. Elizabeth Gladys ""Millvina""",female,0.17,1,2,C.A. 2315,20.575,,S
1240,3,1,"Thomas, Master. Assad Alexander",male,0.42,0,1,2625,8.5167,,C
427,2,1,"Hamalainen, Master. Viljo",male,0.67,1,1,250649,14.5,,S
657,3,1,"Baclini, Miss. Eugenie",female,0.75,2,1,2666,19.2583,,C
658,3,1,"Baclini, Miss. Helene Barbara",female,0.75,2,1,2666,19.2583,,C


## 📋 Kesimpulan: Mengurutkan Data Berdasarkan Kolom Tertentu

### 🎯 Konsep Utama

**`sort_values()`** adalah method pandas yang essential untuk **mengurutkan DataFrame berdasarkan nilai dalam satu atau beberapa kolom**. Sorting adalah operasi fundamental dalam data analysis untuk mengidentifikasi patterns, outliers, dan trends dalam data.

### 🔧 Syntax dan Parameter

```python
DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, 
                     kind='quicksort', na_position='last', 
                     ignore_index=False, key=None)
```

| Parameter | Deskripsi | Default | Contoh |
|-----------|-----------|---------|--------|
| **by** | Kolom untuk sorting (string atau list) | Required | `'age'` atau `['survived', 'age']` |
| **ascending** | Urutan ascending (True/False atau list) | True | `False` atau `[False, True]` |
| **inplace** | Modify original DataFrame | False | `True` |
| **na_position** | Posisi NaN values ('first'/'last') | 'last' | `'first'` |
| **ignore_index** | Reset index setelah sort | False | `True` |
| **kind** | Algoritma sorting | 'quicksort' | 'mergesort', 'heapsort' |

### 💡 Contoh Praktis dari Notebook

```python
# 1. Single column sorting - ascending (default)
df.sort_values(by='age').head()

# 2. Single column sorting - descending
df.sort_values(by='age', ascending=False).head()

# 3. Multiple column sorting - default ascending
df.sort_values(by=['survived', 'age']).head()

# 4. Multiple column sorting - mixed ascending/descending
df.sort_values(by=['survived', 'age'], ascending=[False, True]).head()
```

### 🔍 Berbagai Teknik Sorting

#### **1. Single Column Sorting**
```python
# Numerical data
age_sorted = df.sort_values('age')
fare_sorted = df.sort_values('fare', ascending=False)

# Categorical data
sex_sorted = df.sort_values('sex')  # Female first (alphabetical)
class_sorted = df.sort_values('pclass')  # 1, 2, 3

# String data
name_sorted = df.sort_values('name')  # Alphabetical order
```

#### **2. Multiple Column Sorting (Hierarchical)**
```python
# Primary sort by survived, secondary by age
hierarchical_sort = df.sort_values(['survived', 'age'], ascending=[False, True])

# Complex hierarchy: class -> sex -> age
complex_sort = df.sort_values(
    ['pclass', 'sex', 'age'], 
    ascending=[True, True, False]
)

# Business logic: Priority sorting
priority_sort = df.sort_values(
    ['survived', 'pclass', 'fare'], 
    ascending=[False, True, False]
)
```

#### **3. Handling Missing Values**
```python
# NaN values last (default)
df.sort_values('age', na_position='last')

# NaN values first
df.sort_values('age', na_position='first')

# Drop NaN before sorting
df.dropna(subset=['age']).sort_values('age')
```

### 🚀 Advanced Sorting Techniques

#### **1. Custom Sorting dengan Key Function**
```python
# Sort berdasarkan absolute values
df.sort_values('age', key=lambda x: abs(x - x.mean()))

# Sort berdasarkan string length
df.sort_values('name', key=lambda x: x.str.len())

# Sort berdasarkan month dari date
df['date'] = pd.to_datetime(df['date'])
df.sort_values('date', key=lambda x: x.dt.month)

# Custom business logic
def priority_key(series):
    """Custom priority untuk passenger class"""
    priority_map = {1: 3, 2: 2, 3: 1}  # First class = highest priority
    return series.map(priority_map)

df.sort_values('pclass', key=priority_key, ascending=False)
```

#### **2. Conditional Sorting**
```python
# Sort survivors first, then by age within each group
def conditional_sort(df):
    survivors = df[df['survived'] == 1].sort_values('age')
    non_survivors = df[df['survived'] == 0].sort_values('age', ascending=False)
    return pd.concat([survivors, non_survivors])

sorted_conditional = conditional_sort(df)
```

#### **3. Category-based Sorting**
```python
# Define custom categorical order
df['pclass_cat'] = pd.Categorical(
    df['pclass'], 
    categories=[1, 2, 3], 
    ordered=True
)

# Sort berdasarkan category order
df.sort_values('pclass_cat')

# Custom category order
embarked_order = ['S', 'C', 'Q']  # Custom priority
df['embarked_cat'] = pd.Categorical(
    df['embarked'], 
    categories=embarked_order, 
    ordered=True
)
df.sort_values('embarked_cat')
```

### 📊 Sorting untuk Different Data Types

| Data Type | Example | Special Considerations |
|-----------|---------|----------------------|
| **Numeric** | `sort_values('age')` | Handle NaN, outliers |
| **String** | `sort_values('name')` | Case sensitivity, locale |
| **Date** | `sort_values('date')` | Timezone, format |
| **Boolean** | `sort_values('survived')` | False < True |
| **Categorical** | Custom order dengan categories | Define meaningful order |

### 🎯 Real-World Use Cases

#### **1. Business Analytics**
```python
# Customer analysis - High value customers first
customers_sorted = customers_df.sort_values(
    ['total_spent', 'recency', 'frequency'], 
    ascending=[False, True, False]
)

# Sales performance - Multi-criteria ranking
sales_ranking = sales_df.sort_values(
    ['region', 'revenue', 'profit_margin'], 
    ascending=[True, False, False]
)
```

#### **2. Financial Analysis**
```python
# Stock analysis - Performance ranking
stocks_ranked = stocks_df.sort_values(
    ['sector', 'market_cap', 'pe_ratio'], 
    ascending=[True, False, True]
)

# Portfolio optimization - Risk-return sorting
portfolio_sorted = portfolio_df.sort_values(
    ['risk_category', 'expected_return'], 
    ascending=[True, False]
)
```

#### **3. Educational Data**
```python
# Student ranking - Multiple criteria
student_ranking = students_df.sort_values(
    ['grade_level', 'gpa', 'test_score'], 
    ascending=[True, False, False]
)

# Course popularity
course_popularity = courses_df.sort_values(
    ['enrollment_count', 'rating', 'completion_rate'], 
    ascending=[False, False, False]
)
```

### ⚡ Performance Considerations

#### **1. Algorithm Choice**
```python
# Large datasets - use appropriate algorithm
# quicksort (default) - fastest for most cases
df.sort_values('age', kind='quicksort')

# mergesort - stable sort (preserves equal elements order)
df.sort_values('age', kind='mergesort')

# heapsort - guaranteed O(n log n) worst case
df.sort_values('age', kind='heapsort')
```

#### **2. Memory Optimization**
```python
# In-place sorting untuk save memory
df.sort_values('age', inplace=True)

# Reset index untuk clean numbering
df.sort_values('age', ignore_index=True)

# Sort only relevant columns
subset_sorted = df[['age', 'fare', 'survived']].sort_values('age')
```

### 🔍 Common Patterns & Best Practices

#### **1. Data Exploration**
```python
# Quick data overview
print("Youngest passengers:")
print(df.sort_values('age').head())

print("\nOldest passengers:")
print(df.sort_values('age', ascending=False).head())

print("\nHighest fare passengers:")
print(df.sort_values('fare', ascending=False).head())
```

#### **2. Outlier Detection**
```python
# Identify outliers through sorting
extreme_ages = pd.concat([
    df.sort_values('age').head(5),        # Youngest
    df.sort_values('age').tail(5)         # Oldest
])

extreme_fares = pd.concat([
    df.sort_values('fare', ascending=False).head(10),  # Most expensive
    df.sort_values('fare').head(10)                    # Cheapest
])
```

#### **3. Ranking Implementation**
```python
# Create rankings
df_ranked = df.sort_values('fare', ascending=False).reset_index(drop=True)
df_ranked['fare_rank'] = range(1, len(df_ranked) + 1)

# Percentile rankings
df['age_percentile'] = df['age'].rank(pct=True)
df_with_percentiles = df.sort_values('age_percentile', ascending=False)
```

### 🛠️ Advanced Patterns

#### **1. Multi-step Sorting**
```python
def multi_step_sort(df):
    """Complex business logic sorting"""
    # Step 1: Survivors first
    survivors = df[df['survived'] == 1]
    non_survivors = df[df['survived'] == 0]
    
    # Step 2: Within each group, sort by class then age
    survivors_sorted = survivors.sort_values(['pclass', 'age'])
    non_survivors_sorted = non_survivors.sort_values(['pclass', 'age'])
    
    # Step 3: Combine
    return pd.concat([survivors_sorted, non_survivors_sorted])

final_sorted = multi_step_sort(df)
```

#### **2. Dynamic Sorting**
```python
def dynamic_sort(df, sort_criteria):
    """Dynamic sorting berdasarkan user input"""
    columns = sort_criteria['columns']
    orders = sort_criteria.get('ascending', [True] * len(columns))
    
    return df.sort_values(columns, ascending=orders)

# Usage
criteria = {
    'columns': ['survived', 'pclass', 'age'],
    'ascending': [False, True, True]
}
result = dynamic_sort(df, criteria)
```

#### **3. Sorting dengan Grouping**
```python
# Sort within groups
def sort_within_groups(df, group_col, sort_col, ascending=True):
    """Sort values within each group"""
    return df.groupby(group_col).apply(
        lambda x: x.sort_values(sort_col, ascending=ascending)
    ).reset_index(drop=True)

# Sort ages within each class
class_age_sorted = sort_within_groups(df, 'pclass', 'age', ascending=True)
```

### 🎯 Key Takeaways

- ✅ **`sort_values()`** adalah method fundamental untuk data ordering dan exploration
- ✅ **Multiple column sorting** menggunakan list untuk hierarchical ordering
- ✅ **Mixed ascending/descending** dengan list boolean values
- ✅ **Handle missing values** dengan `na_position` parameter
- ✅ **Custom sorting logic** dengan `key` parameter untuk complex business rules
- ✅ **In-place sorting** untuk memory efficiency pada large datasets
- ✅ **Combine dengan grouping** untuk sophisticated data organization
- ✅ **Use appropriate algorithms** berdasarkan data size dan requirements

**Sorting adalah gateway untuk data understanding - master this untuk effective data exploration!** 🚀