# Sorting and Ranking in Pandas

## Overview

**Sorting** = Arranging data in a specific order (ascending/descending)  
**Ranking** = Assigning numerical positions to values

### Why Sorting & Ranking?

üìä **Common Use Cases:**
- Top 10 products by sales
- Bottom performers identification
- Leaderboard creation
- Percentile calculations
- Competition rankings
- Grade assignments

### Key Methods

| Method | Purpose | Returns |
|--------|---------|----------|
| `sort_values()` | Sort by column values | Sorted DataFrame/Series |
| `sort_index()` | Sort by index | Sorted DataFrame/Series |
| `rank()` | Assign ranks | Ranks as numbers |
| `nlargest()` | Get top N rows | Top N rows |
| `nsmallest()` | Get bottom N rows | Bottom N rows |
| `argsort()` | Get sorted indices | Integer positions |

### What We'll Learn
1. ‚úÖ Basic sorting (single & multiple columns)
2. ‚úÖ Index sorting
3. ‚úÖ Ranking methods (average, min, max, dense, first)
4. ‚úÖ Ranking within groups
5. ‚úÖ Percentiles and quantiles
6. ‚úÖ Real-world applications

In [None]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.precision', 2)

print("‚úÖ Libraries imported")
print(f"Pandas version: {pd.__version__}")

## Sample Dataset: Student Performance

We'll use a student exam scores dataset to demonstrate sorting and ranking.

In [None]:
# Create sample student data
np.random.seed(42)

students = pd.DataFrame({
    'student_id': range(1, 21),
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma', 
             'Frank', 'Grace', 'Henry', 'Ivy', 'Jack',
             'Kate', 'Leo', 'Mia', 'Noah', 'Olivia',
             'Peter', 'Quinn', 'Ryan', 'Sophia', 'Tom'],
    'class': np.random.choice(['A', 'B', 'C'], 20),
    'math_score': np.random.randint(60, 100, 20),
    'science_score': np.random.randint(55, 100, 20),
    'english_score': np.random.randint(65, 100, 20),
    'attendance_%': np.random.randint(70, 100, 20)
})

# Calculate total and average
students['total_score'] = (students['math_score'] + 
                           students['science_score'] + 
                           students['english_score'])
students['avg_score'] = students['total_score'] / 3

print("Student Performance Data:")
print(students)
print(f"\nShape: {students.shape}")
print(f"Columns: {students.columns.tolist()}")

## 1. Sorting by Values (sort_values)

### Syntax

```python
df.sort_values(by='column', ascending=True, inplace=False)
```

### Parameters

| Parameter | Default | Options | Description |
|-----------|---------|---------|-------------|
| `by` | Required | Column name(s) | Column(s) to sort by |
| `ascending` | True | True/False | Sort order |
| `inplace` | False | True/False | Modify original |
| `na_position` | 'last' | 'first'/'last' | Where to put NaN |
| `kind` | 'quicksort' | 'quicksort'/'mergesort'/'heapsort' | Algorithm |
| `ignore_index` | False | True/False | Reset index after sort |

### Key Points
- `ascending=True` ‚Üí Low to High (A to Z)
- `ascending=False` ‚Üí High to Low (Z to A)
- Can sort by multiple columns
- Original DataFrame unchanged (unless `inplace=True`)

In [None]:
print("=== BASIC SORTING ===\n")

# Example 1: Sort by single column (ascending)
print("Example 1: Sort by math score (ascending)")
sorted_math_asc = students.sort_values('math_score')
print(sorted_math_asc[['name', 'math_score']].head())
print()

# Example 2: Sort by single column (descending)
print("Example 2: Sort by math score (descending)")
sorted_math_desc = students.sort_values('math_score', ascending=False)
print(sorted_math_desc[['name', 'math_score']].head())
print()

# Example 3: Sort by total score to find top performers
print("Example 3: Top 5 students by total score")
top_students = students.sort_values('total_score', ascending=False).head(5)
print(top_students[['name', 'math_score', 'science_score', 'english_score', 'total_score']])
print()

# Example 4: Sort by average score
print("Example 4: Students sorted by average score")
sorted_avg = students.sort_values('avg_score', ascending=False)
print(sorted_avg[['name', 'avg_score']].head(10))
print()

# Example 5: Sort alphabetically by name
print("Example 5: Sort alphabetically by name")
sorted_name = students.sort_values('name')
print(sorted_name[['name', 'total_score']].head(10))

## 2. Sorting by Multiple Columns

### Syntax

```python
# Sort by multiple columns
df.sort_values(by=['col1', 'col2'], ascending=[True, False])
```

### How It Works

1. **Primary Sort**: First column
2. **Secondary Sort**: Second column (within ties of first)
3. **Tertiary Sort**: Third column (and so on...)

### Example Logic

Sort by `class` (A‚ÜíZ), then by `total_score` (high‚Üílow):
```
Class A: 280, 275, 260
Class B: 290, 265, 250
Class C: 285, 270, 255
```

### Use Cases
- Sort by department, then by salary
- Sort by date, then by amount
- Sort by category, then by priority

In [None]:
print("=== SORTING BY MULTIPLE COLUMNS ===\n")

# Example 1: Sort by class, then by total score
print("Example 1: Sort by class (A‚ÜíZ), then total score (high‚Üílow)")
sorted_multi = students.sort_values(
    by=['class', 'total_score'],
    ascending=[True, False]
)
print(sorted_multi[['name', 'class', 'total_score']].head(15))
print()

# Example 2: Sort by attendance, then by average score
print("Example 2: Sort by attendance (high‚Üílow), then avg_score (high‚Üílow)")
sorted_attendance = students.sort_values(
    by=['attendance_%', 'avg_score'],
    ascending=[False, False]
)
print(sorted_attendance[['name', 'attendance_%', 'avg_score']].head(10))
print()

# Example 3: Three-level sorting
print("Example 3: Sort by class, math score, then name")
sorted_three = students.sort_values(
    by=['class', 'math_score', 'name'],
    ascending=[True, False, True]
)
print(sorted_three[['name', 'class', 'math_score']].head(15))
print()

# Example 4: Reset index after sorting
print("Example 4: Sort and reset index")
sorted_reset = students.sort_values('total_score', ascending=False, ignore_index=True)
print("Index after sorting with ignore_index=True:")
print(sorted_reset[['name', 'total_score']].head())
print("Note: Index is now 0, 1, 2, 3, 4... (reset)")

## 3. Sorting by Index (sort_index)

### When to Use
- After operations that shuffle data
- When index has meaning (dates, IDs)
- Before merging on index
- To restore original order

### Syntax

```python
df.sort_index(ascending=True, axis=0)
```

### Parameters

| Parameter | Options | Description |
|-----------|---------|-------------|
| `axis` | 0 (rows), 1 (columns) | Which axis to sort |
| `ascending` | True/False | Sort order |
| `inplace` | True/False | Modify original |
| `kind` | 'quicksort', 'mergesort', etc. | Algorithm |

### Index vs Values
- **`sort_index()`**: Sorts by row/column labels
- **`sort_values()`**: Sorts by data values

In [None]:
print("=== SORTING BY INDEX ===\n")

# Example 1: Shuffle then sort by index
print("Example 1: Restore original order after shuffling")
shuffled = students.sample(frac=1, random_state=42)
print("After shuffling (first 5):")
print(shuffled[['name', 'total_score']].head())
print(f"Index: {shuffled.index[:5].tolist()}")
print()

restored = shuffled.sort_index()
print("After sort_index() (first 5):")
print(restored[['name', 'total_score']].head())
print(f"Index: {restored.index[:5].tolist()}")
print()

# Example 2: Sort index descending
print("Example 2: Sort index in descending order")
sorted_desc = students.sort_index(ascending=False)
print(sorted_desc[['name', 'total_score']].head())
print()

# Example 3: Set custom index and sort
print("Example 3: Sort by custom index (student names)")
students_named_idx = students.set_index('name')
sorted_by_name = students_named_idx.sort_index()
print(sorted_by_name[['class', 'total_score']].head())
print()

# Example 4: Sort columns by name
print("Example 4: Sort columns alphabetically")
print("Original columns:", students.columns.tolist())
sorted_cols = students.sort_index(axis=1)
print("Sorted columns:", sorted_cols.columns.tolist())
print(sorted_cols.head(3))

## 4. Ranking (rank method)

### What is Ranking?

**Ranking** assigns a numerical position to each value in a dataset.

### Syntax

```python
df['rank'] = df['column'].rank(method='average', ascending=True)
```

### Ranking Methods (Handling Ties)

**Example scores:** [95, 90, 90, 85, 80]

| Method | Ranks | Description | Use Case |
|--------|-------|-------------|----------|
| **average** | [5, 3.5, 3.5, 2, 1] | Average of tied ranks | Statistical analysis |
| **min** | [5, 3, 3, 2, 1] | Lowest rank in tie | Conservative ranking |
| **max** | [5, 4, 4, 2, 1] | Highest rank in tie | Generous ranking |
| **dense** | [5, 3, 3, 2, 1] | No gaps in ranks | Leaderboards |
| **first** | [5, 3, 4, 2, 1] | Rank by order | Order matters |

### Visual Example

```
Scores:  100  95  95  90  85
         ‚îÄ‚î¨‚îÄ  ‚îÄ‚î¥‚îÄ‚îÄ‚îÄ‚î¥‚îÄ  ‚îÄ‚î¨‚îÄ  ‚îÄ‚î¨‚îÄ
          ‚îÇ      ‚îÇ      ‚îÇ    ‚îÇ
average:  5    3.5    3.5   2   1
min:      5     3      3    2   1
max:      5     4      4    2   1
dense:    5     3      3    2   1  ‚Üê No gaps!
first:    5     3      4    2   1  ‚Üê Order matters
```

### Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| `method` | 'average' | How to handle ties |
| `ascending` | True | Rank order (True: lowest=1) |
| `na_option` | 'keep' | 'keep', 'top', 'bottom' |
| `pct` | False | Return percentile ranks |

In [None]:
print("=== RANKING METHODS ===\n")

# Create simple example with ties
simple_df = pd.DataFrame({
    'student': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
    'score': [95, 90, 90, 85, 80]
})

print("Sample data with ties (Bob and Charlie both scored 90):")
print(simple_df)
print()

# Demonstrate all ranking methods
print("Different ranking methods for handling ties:\n")

# Method 1: average (default)
simple_df['rank_average'] = simple_df['score'].rank(method='average', ascending=False)
print("1. AVERAGE method (average of tied ranks):")
print(simple_df[['student', 'score', 'rank_average']])
print("   Bob & Charlie: (2+3)/2 = 2.5")
print()

# Method 2: min
simple_df['rank_min'] = simple_df['score'].rank(method='min', ascending=False)
print("2. MIN method (minimum rank in tie):")
print(simple_df[['student', 'score', 'rank_min']])
print("   Bob & Charlie: both get 2")
print()

# Method 3: max
simple_df['rank_max'] = simple_df['score'].rank(method='max', ascending=False)
print("3. MAX method (maximum rank in tie):")
print(simple_df[['student', 'score', 'rank_max']])
print("   Bob & Charlie: both get 3")
print()

# Method 4: dense
simple_df['rank_dense'] = simple_df['score'].rank(method='dense', ascending=False)
print("4. DENSE method (no gaps in ranking):")
print(simple_df[['student', 'score', 'rank_dense']])
print("   Bob & Charlie: both get 2, next rank is 3 (not 4)")
print()

# Method 5: first
simple_df['rank_first'] = simple_df['score'].rank(method='first', ascending=False)
print("5. FIRST method (order in data matters):")
print(simple_df[['student', 'score', 'rank_first']])
print("   Bob gets 2 (appears first), Charlie gets 3")
print()

# Summary comparison
print("="*60)
print("COMPARISON OF ALL METHODS:")
print("="*60)
print(simple_df)

## 5. Practical Ranking Applications

### Real-World Use Cases

**1. Class Rankings**
- Rank students by performance
- Identify top/bottom performers
- Grade assignment

**2. Sales Leaderboards**
- Rank salespeople by revenue
- Monthly/quarterly rankings
- Performance bonuses

**3. Percentiles**
- "You scored better than 75% of students"
- Standardized test scores
- Performance benchmarking

**4. Top N Selection**
- Top 10 products
- Bottom 5 performers
- Qualification cutoffs

### Ascending vs Descending

```python
# ascending=True: Lowest value gets rank 1
# [10, 20, 30] ‚Üí [1, 2, 3]
df['rank_asc'] = df['value'].rank(ascending=True)

# ascending=False: Highest value gets rank 1
# [10, 20, 30] ‚Üí [3, 2, 1]
df['rank_desc'] = df['value'].rank(ascending=False)
```

In [None]:
print("=== PRACTICAL RANKING APPLICATIONS ===\n")

# Application 1: Overall class rankings
print("Application 1: Overall Class Rankings")
print("-" * 60)
students['overall_rank'] = students['total_score'].rank(method='min', ascending=False)
leaderboard = students.sort_values('overall_rank')[['name', 'total_score', 'overall_rank']].head(10)
print(leaderboard)
print()

# Application 2: Subject-wise rankings
print("Application 2: Subject-wise Rankings (Math)")
print("-" * 60)
students['math_rank'] = students['math_score'].rank(method='min', ascending=False)
math_toppers = students.sort_values('math_rank')[['name', 'math_score', 'math_rank']].head(5)
print(math_toppers)
print()

# Application 3: Percentile ranking
print("Application 3: Percentile Rankings")
print("-" * 60)
students['percentile'] = students['total_score'].rank(pct=True) * 100
print("Students with their percentile scores:")
print(students[['name', 'total_score', 'percentile']].sort_values('percentile', ascending=False).head(10))
print("\nInterpretation: 95th percentile = scored better than 95% of students")
print()

# Application 4: Grade assignment based on ranks
print("Application 4: Grade Assignment (Top 20% = A, Next 30% = B, etc.)")
print("-" * 60)

def assign_grade(percentile):
    if percentile >= 80:
        return 'A'
    elif percentile >= 60:
        return 'B'
    elif percentile >= 40:
        return 'C'
    elif percentile >= 20:
        return 'D'
    else:
        return 'F'

students['grade'] = students['percentile'].apply(assign_grade)
print(students[['name', 'total_score', 'percentile', 'grade']].sort_values('total_score', ascending=False))
print()
print("Grade distribution:")
print(students['grade'].value_counts().sort_index())
print()

# Application 5: Identify top and bottom performers
print("Application 5: Top 3 and Bottom 3 Students")
print("-" * 60)
print("Top 3:")
print(students.nlargest(3, 'total_score')[['name', 'total_score', 'avg_score']])
print("\nBottom 3:")
print(students.nsmallest(3, 'total_score')[['name', 'total_score', 'avg_score']])

## 6. Ranking Within Groups

### Why Group-wise Ranking?

Sometimes you need rankings **within categories**, not globally.

### Examples
- Top student in **each class**
- Best salesperson in **each region**
- Highest revenue product in **each category**

### Syntax

```python
df['rank_in_group'] = df.groupby('category')['value'].rank(
    method='dense',
    ascending=False
)
```

### How It Works

```
Class A: [95, 90, 85] ‚Üí Ranks [1, 2, 3]
Class B: [92, 88, 80] ‚Üí Ranks [1, 2, 3]
Class C: [98, 85, 75] ‚Üí Ranks [1, 2, 3]
```

Each group gets its own ranking system!

### Use Cases
- Compare performance within teams
- Department-wise rankings
- Category-wise top products
- Regional leaderboards

In [None]:
print("=== RANKING WITHIN GROUPS ===\n")

# Example 1: Rank students within each class
print("Example 1: Rank students within their class")
print("-" * 60)
students['rank_in_class'] = students.groupby('class')['total_score'].rank(
    method='dense',
    ascending=False
)

class_rankings = students.sort_values(['class', 'rank_in_class'])
print(class_rankings[['name', 'class', 'total_score', 'rank_in_class']].head(15))
print()

# Example 2: Top student in each class
print("Example 2: Top student in each class")
print("-" * 60)
top_per_class = students[students['rank_in_class'] == 1].sort_values('class')
print(top_per_class[['name', 'class', 'total_score', 'rank_in_class']])
print()

# Example 3: Top 2 students in each class
print("Example 3: Top 2 students in each class")
print("-" * 60)
top2_per_class = students[students['rank_in_class'] <= 2].sort_values(['class', 'rank_in_class'])
print(top2_per_class[['name', 'class', 'total_score', 'rank_in_class']])
print()

# Example 4: Percentile within class
print("Example 4: Percentile ranking within class")
print("-" * 60)
students['percentile_in_class'] = students.groupby('class')['total_score'].rank(pct=True) * 100
print(students[['name', 'class', 'total_score', 'percentile_in_class']].sort_values(['class', 'percentile_in_class'], ascending=[True, False]).head(15))
print()

# Example 5: Compare global vs class rank
print("Example 5: Global Rank vs Class Rank Comparison")
print("-" * 60)
comparison = students[['name', 'class', 'total_score', 'overall_rank', 'rank_in_class']].sort_values('overall_rank')
print(comparison.head(10))
print("\nNote: A student can be #1 in their class but not #1 overall!")

## 7. Quick Top/Bottom Selection: nlargest() & nsmallest()

### Why Use These?

**Faster than sorting** when you only need top/bottom N rows!

### Comparison

```python
# ‚ùå Slower: Sort entire DataFrame, then take top 5
df.sort_values('score', ascending=False).head(5)

# ‚úÖ Faster: Directly get top 5
df.nlargest(5, 'score')
```

### Syntax

```python
# Get top N rows
df.nlargest(n, columns)

# Get bottom N rows
df.nsmallest(n, columns)
```

### Multiple Columns

```python
# Top 5 by math score, then by science score
df.nlargest(5, ['math_score', 'science_score'])
```

### When to Use
- ‚úÖ Need only top/bottom N (e.g., top 10)
- ‚úÖ Large dataset, small N
- ‚úÖ Quick exploratory analysis
- ‚ùå Need full sorted dataset
- ‚ùå Need middle values

In [None]:
print("=== NLARGEST & NSMALLEST ===\n")

# Example 1: Top 5 by total score
print("Example 1: Top 5 students by total score")
print("-" * 60)
top5 = students.nlargest(5, 'total_score')
print(top5[['name', 'total_score', 'avg_score']])
print()

# Example 2: Bottom 5 by total score
print("Example 2: Bottom 5 students by total score")
print("-" * 60)
bottom5 = students.nsmallest(5, 'total_score')
print(bottom5[['name', 'total_score', 'avg_score']])
print()

# Example 3: Multiple columns (tiebreaker)
print("Example 3: Top 5 by math score, tiebreak by science score")
print("-" * 60)
top5_multi = students.nlargest(5, ['math_score', 'science_score'])
print(top5_multi[['name', 'math_score', 'science_score', 'total_score']])
print()

# Example 4: Top performers with high attendance
print("Example 4: Top 5 students with highest attendance")
print("-" * 60)
top_attendance = students.nlargest(5, 'attendance_%')
print(top_attendance[['name', 'attendance_%', 'total_score']])
print()

# Example 5: Series nlargest
print("Example 5: Top 5 math scores (Series)")
print("-" * 60)
top_math_series = students['math_score'].nlargest(5)
print(top_math_series)
print()

# Example 6: Performance comparison
print("Example 6: Performance Comparison (Large Dataset)")
print("-" * 60)
import time

# Create large dataset
large_df = pd.DataFrame({
    'value': np.random.randint(1, 1000, 10000)
})

# Method 1: sort_values + head
start = time.time()
result1 = large_df.sort_values('value', ascending=False).head(10)
time1 = time.time() - start

# Method 2: nlargest
start = time.time()
result2 = large_df.nlargest(10, 'value')
time2 = time.time() - start

print(f"sort_values().head(10): {time1:.6f} seconds")
print(f"nlargest(10):           {time2:.6f} seconds")
print(f"\nnlargest is {time1/time2:.2f}x faster!")
print("\nRecommendation: Use nlargest/nsmallest for top/bottom N selection")

## 8. Advanced Sorting Techniques

### Custom Sort with Key Function

```python
# Sort by custom logic
df['temp_col'] = df['col'].apply(custom_function)
df = df.sort_values('temp_col').drop('temp_col', axis=1)
```

### Sort by Absolute Values

```python
df['abs_value'] = df['value'].abs()
df.sort_values('abs_value')
```

### Sort with NA Handling

```python
# Put NaN at the end
df.sort_values('col', na_position='last')

# Put NaN at the beginning
df.sort_values('col', na_position='first')
```

### Sort by Index Level (MultiIndex)

```python
df.sort_index(level=0)  # Sort by first index level
df.sort_index(level=[0, 1])  # Sort by multiple levels
```

### Stable Sorting

```python
# Preserves original order for equal values
df.sort_values('col', kind='mergesort')  # Stable
df.sort_values('col', kind='quicksort')  # Unstable (faster)
```

In [None]:
print("=== ADVANCED SORTING TECHNIQUES ===\n")

# Example 1: Sort by absolute difference from average
print("Example 1: Sort by distance from class average")
print("-" * 60)
class_avg = students['total_score'].mean()
students['diff_from_avg'] = abs(students['total_score'] - class_avg)
closest_to_avg = students.sort_values('diff_from_avg').head(5)
print(f"Class average: {class_avg:.2f}")
print(closest_to_avg[['name', 'total_score', 'diff_from_avg']])
print()

# Example 2: Sort by score improvement potential
print("Example 2: Students with most improvement potential")
print("-" * 60)
students['max_potential'] = 300 - students['total_score']  # Max possible = 300
needs_improvement = students.sort_values('max_potential', ascending=False).head(5)
print(needs_improvement[['name', 'total_score', 'max_potential']])
print()

# Example 3: Custom sort - prioritize by attendance, then score
print("Example 3: Reward system - High attendance + Good score")
print("-" * 60)
students['reward_score'] = (students['attendance_%'] * 0.3 + 
                            students['avg_score'] * 0.7)
reward_list = students.sort_values('reward_score', ascending=False).head(10)
print(reward_list[['name', 'attendance_%', 'avg_score', 'reward_score']])
print()

# Example 4: Sort with NaN handling
print("Example 4: Handling missing values in sorting")
print("-" * 60)
# Introduce some NaN values
students_with_nan = students.copy()
students_with_nan.loc[0:2, 'math_score'] = np.nan

print("Sort with NaN last:")
sorted_nan_last = students_with_nan.sort_values('math_score', na_position='last')
print(sorted_nan_last[['name', 'math_score']].tail())
print()

print("Sort with NaN first:")
sorted_nan_first = students_with_nan.sort_values('math_score', na_position='first')
print(sorted_nan_first[['name', 'math_score']].head())
print()

# Example 5: Sort by categorical order
print("Example 5: Custom categorical order (Grade levels)")
print("-" * 60)
students_copy = students.copy()
students_copy['class'] = pd.Categorical(
    students_copy['class'],
    categories=['A', 'B', 'C'],
    ordered=True
)
sorted_cat = students_copy.sort_values(['class', 'total_score'], ascending=[True, False])
print(sorted_cat[['name', 'class', 'total_score']].head(15))

## 9. Comprehensive Real-World Example

### Business Scenario: Academic Performance Analysis

**Task**: Create a complete student performance report with:
1. Overall rankings
2. Class-wise rankings
3. Subject toppers
4. Grade assignments
5. Improvement tracking
6. Award recommendations

We'll combine all sorting and ranking techniques learned.

In [None]:
print("="*70)
print("COMPREHENSIVE STUDENT PERFORMANCE REPORT")
print("="*70)
print()

# Section 1: Overall Rankings
print("1. OVERALL SCHOOL RANKINGS")
print("-" * 70)
students['overall_rank'] = students['total_score'].rank(method='min', ascending=False)
students['percentile'] = students['total_score'].rank(pct=True) * 100

overall_report = students.sort_values('overall_rank')[[
    'name', 'class', 'total_score', 'avg_score', 'overall_rank', 'percentile'
]].head(10)
print("Top 10 Students:")
print(overall_report)
print()

# Section 2: Class-wise Rankings
print("2. CLASS-WISE RANKINGS")
print("-" * 70)
students['class_rank'] = students.groupby('class')['total_score'].rank(
    method='min', ascending=False
)

for class_name in sorted(students['class'].unique()):
    print(f"\nClass {class_name} - Top 3:")
    class_top = students[students['class'] == class_name].nsmallest(3, 'class_rank')
    print(class_top[['name', 'total_score', 'class_rank']])
print()

# Section 3: Subject Toppers
print("3. SUBJECT TOPPERS")
print("-" * 70)
print("\nMath Topper:")
math_topper = students.nlargest(1, 'math_score')
print(math_topper[['name', 'class', 'math_score']])

print("\nScience Topper:")
science_topper = students.nlargest(1, 'science_score')
print(science_topper[['name', 'class', 'science_score']])

print("\nEnglish Topper:")
english_topper = students.nlargest(1, 'english_score')
print(english_topper[['name', 'class', 'english_score']])
print()

# Section 4: Grade Assignment
print("4. GRADE DISTRIBUTION")
print("-" * 70)

def assign_grade_comprehensive(row):
    percentile = row['percentile']
    if percentile >= 90:
        return 'A+'
    elif percentile >= 80:
        return 'A'
    elif percentile >= 70:
        return 'B+'
    elif percentile >= 60:
        return 'B'
    elif percentile >= 50:
        return 'C+'
    elif percentile >= 40:
        return 'C'
    else:
        return 'D'

students['final_grade'] = students.apply(assign_grade_comprehensive, axis=1)

grade_dist = students['final_grade'].value_counts().sort_index()
print("Grade Distribution:")
print(grade_dist)
print()

print("Students by Grade:")
grade_summary = students.sort_values(['final_grade', 'total_score'], ascending=[True, False])
print(grade_summary[['name', 'total_score', 'percentile', 'final_grade']].head(15))
print()

# Section 5: Special Categories
print("5. SPECIAL RECOGNITION CATEGORIES")
print("-" * 70)

# Star Performers (Top 10%)
star_performers = students[students['percentile'] >= 90]
print(f"\n‚≠ê STAR PERFORMERS (Top 10%): {len(star_performers)} students")
print(star_performers[['name', 'class', 'total_score']].sort_values('total_score', ascending=False))
print()

# Perfect Attendance + Good Performance
excellent_attendance = students[
    (students['attendance_%'] >= 95) & 
    (students['percentile'] >= 70)
].sort_values('total_score', ascending=False)
print(f"\nüìö DEDICATION AWARD (95%+ Attendance + Top 30%): {len(excellent_attendance)} students")
print(excellent_attendance[['name', 'attendance_%', 'total_score']])
print()

# All-Rounder (Good in all subjects)
all_rounders = students[
    (students['math_score'] >= 80) &
    (students['science_score'] >= 80) &
    (students['english_score'] >= 80)
].sort_values('total_score', ascending=False)
print(f"\nüèÜ ALL-ROUNDER AWARD (80+ in all subjects): {len(all_rounders)} students")
print(all_rounders[['name', 'math_score', 'science_score', 'english_score']])
print()

# Section 6: Summary Statistics
print("="*70)
print("SUMMARY STATISTICS")
print("="*70)
print(f"Total Students: {len(students)}")
print(f"Average Score: {students['avg_score'].mean():.2f}")
print(f"Highest Score: {students['total_score'].max()} ({students.loc[students['total_score'].idxmax(), 'name']})")
print(f"Lowest Score: {students['total_score'].min()} ({students.loc[students['total_score'].idxmin(), 'name']})")
print(f"Average Attendance: {students['attendance_%'].mean():.2f}%")
print()
print("Class-wise Performance:")
class_performance = students.groupby('class')['total_score'].agg(['mean', 'max', 'min', 'count'])
class_performance.columns = ['Avg_Score', 'Max_Score', 'Min_Score', 'Num_Students']
print(class_performance)
print("="*70)

## 10. Method Comparison & Decision Guide

### When to Use Each Method?

| Task | Best Method | Why? |
|------|-------------|------|
| **Get top 10 items** | `nlargest(10)` | Faster than full sort |
| **Get bottom 5 items** | `nsmallest(5)` | Faster than full sort |
| **Need full sorted data** | `sort_values()` | Complete sorting |
| **Restore original order** | `sort_index()` | Sorts by index |
| **Assign positions** | `rank()` | Returns ranks |
| **Top N per group** | `groupby().rank()` | Group-wise ranking |
| **Percentile scores** | `rank(pct=True)` | Relative position |
| **Leaderboard (no gaps)** | `rank(method='dense')` | Dense ranking |
| **Competition ranking** | `rank(method='min')` | Conservative |

### Performance Comparison

```
Dataset size: 1,000,000 rows
Task: Get top 10

sort_values().head(10):  ~500ms  ‚Üê Sorts everything
nlargest(10):            ~50ms   ‚Üê Only finds top 10 ‚úÖ
```

### Ranking Method Selection

| Scenario | Method | Reason |
|----------|--------|--------|
| **Competition (Olympics)** | `min` | Ties share lowest rank |
| **Leaderboard (Gaming)** | `dense` | No rank gaps |
| **Statistical Analysis** | `average` | Mathematically sound |
| **First-come wins** | `first` | Order matters |
| **Generous grading** | `max` | Ties get highest rank |

### Sort vs Rank

```python
# SORT: Changes order of rows
df.sort_values('score')  ‚Üí Rearranged DataFrame

# RANK: Adds rank column, keeps order
df['rank'] = df['score'].rank()  ‚Üí Same order + rank column
```

In [None]:
print("=== QUICK REFERENCE GUIDE ===\n")

# Create demo DataFrame
demo_df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'score': [95, 90, 90, 85]
})

print("Sample Data:")
print(demo_df)
print("\n" + "="*70 + "\n")

# 1. Sort ascending
print("1. Sort by score (ascending):")
print("   df.sort_values('score')")
print(demo_df.sort_values('score'))
print()

# 2. Sort descending
print("2. Sort by score (descending):")
print("   df.sort_values('score', ascending=False)")
print(demo_df.sort_values('score', ascending=False))
print()

# 3. Top 2
print("3. Get top 2:")
print("   df.nlargest(2, 'score')")
print(demo_df.nlargest(2, 'score'))
print()

# 4. Bottom 2
print("4. Get bottom 2:")
print("   df.nsmallest(2, 'score')")
print(demo_df.nsmallest(2, 'score'))
print()

# 5. Rank (average)
print("5. Rank (average method):")
print("   df['rank'] = df['score'].rank(ascending=False)")
demo_df['rank_avg'] = demo_df['score'].rank(method='average', ascending=False)
print(demo_df)
print()

# 6. Rank (dense)
print("6. Rank (dense method):")
print("   df['rank'] = df['score'].rank(method='dense', ascending=False)")
demo_df['rank_dense'] = demo_df['score'].rank(method='dense', ascending=False)
print(demo_df)
print()

# 7. Percentile
print("7. Percentile ranking:")
print("   df['percentile'] = df['score'].rank(pct=True) * 100")
demo_df['percentile'] = demo_df['score'].rank(pct=True) * 100
print(demo_df)

print("\n" + "="*70)
print("COMMON PATTERNS")
print("="*70)
print("""
# Top 10 students
df.nlargest(10, 'score')

# Sort by multiple columns
df.sort_values(['class', 'score'], ascending=[True, False])

# Rank within groups
df['rank'] = df.groupby('class')['score'].rank(method='dense', ascending=False)

# Top student in each class
df[df['rank'] == 1]

# Percentile ranking
df['percentile'] = df['score'].rank(pct=True) * 100

# Restore original order
df.sort_index()
""")

## 11. Best Practices & Common Pitfalls

### Best Practices ‚úÖ

**1. Choose the Right Method**
```python
# ‚úÖ For top N: Use nlargest
df.nlargest(10, 'score')

# ‚ùå Don't sort everything if you need top N
df.sort_values('score', ascending=False).head(10)  # Slower
```

**2. Use inplace=False (Default)**
```python
# ‚úÖ Create new sorted DataFrame
sorted_df = df.sort_values('score')

# ‚ö†Ô∏è Avoid inplace (prevents chaining, optimization)
df.sort_values('score', inplace=True)
```

**3. Reset Index After Sorting**
```python
# ‚úÖ Reset index for clean numbering
df.sort_values('score', ignore_index=True)

# Or
df.sort_values('score').reset_index(drop=True)
```

**4. Choose Appropriate Ranking Method**
```python
# Leaderboard: dense (no gaps)
df['rank'] = df['score'].rank(method='dense', ascending=False)

# Competition: min (ties share lowest)
df['rank'] = df['score'].rank(method='min', ascending=False)

# Statistical: average
df['rank'] = df['score'].rank(method='average', ascending=False)
```

**5. Handle Missing Values**
```python
# Decide where NaN should go
df.sort_values('score', na_position='last')  # NaN at end
df.sort_values('score', na_position='first')  # NaN at start
```

### Common Pitfalls ‚ùå

**1. Forgetting ascending Parameter**
```python
# ‚ùå Wrong: Lowest score gets rank 1
df['rank'] = df['score'].rank()  # Default: ascending=True

# ‚úÖ Correct: Highest score gets rank 1
df['rank'] = df['score'].rank(ascending=False)
```

**2. Not Handling Ties Properly**
```python
# ‚ùå Using wrong method for use case
df['rank'] = df['score'].rank(method='first')  # Order-dependent

# ‚úÖ Use appropriate method
df['rank'] = df['score'].rank(method='dense')  # For leaderboards
```

**3. Sorting Large DataFrames Unnecessarily**
```python
# ‚ùå Slow: Sort entire DataFrame
df.sort_values('score').head(10)

# ‚úÖ Fast: Use nlargest
df.nlargest(10, 'score')
```

**4. Not Resetting Index**
```python
# ‚ùå Confusing indices after sort
sorted_df = df.sort_values('score')
# Index: [15, 3, 22, 8, ...]  ‚Üê Original indices

# ‚úÖ Clean indices
sorted_df = df.sort_values('score', ignore_index=True)
# Index: [0, 1, 2, 3, ...]  ‚Üê Sequential
```

**5. Modifying During Iteration**
```python
# ‚ùå Don't sort while iterating
for idx, row in df.iterrows():
    df = df.sort_values('score')  # Bad!

# ‚úÖ Sort once before iteration
sorted_df = df.sort_values('score')
for idx, row in sorted_df.iterrows():
    # Process...
```

### Performance Tips üöÄ

**1. Use categorical for repeated sorting**
```python
df['category'] = pd.Categorical(df['category'], 
                                categories=['A', 'B', 'C'], 
                                ordered=True)
df.sort_values('category')  # Faster with categorical
```

**2. Sort before groupby operations**
```python
# Faster groupby on sorted data
df.sort_values('group_col').groupby('group_col').sum()
```

**3. Use kind parameter for specific needs**
```python
# Stable sort (preserves order of equal elements)
df.sort_values('score', kind='mergesort')

# Faster but unstable
df.sort_values('score', kind='quicksort')
```

## 12. Practice Exercises

### Beginner Level

1. **Sort students by name alphabetically**
   - Use `sort_values()` on the name column

2. **Find the student with the highest math score**
   - Use `nlargest(1, 'math_score')`

3. **Rank students by total score**
   - Use `rank(ascending=False)`

4. **Get bottom 5 students by attendance**
   - Use `nsmallest(5, 'attendance_%')`

5. **Sort by class, then by name**
   - Use `sort_values(['class', 'name'])`

### Intermediate Level

6. **Find top 3 students in each class**
   - Use `groupby('class')` + `rank()` + filter

7. **Calculate percentile ranks**
   - Use `rank(pct=True) * 100`

8. **Assign letter grades based on percentile**
   - Create custom function + apply

9. **Find students who rank differently in math vs overall**
   - Compare math rank and overall rank

10. **Sort by average score, with ties broken by attendance**
    - Use `sort_values(['avg_score', 'attendance_%'])`

### Advanced Level

11. **Create a reward score: 70% performance + 30% attendance**
    - Calculate composite score, then rank

12. **Find students in top 25% of ALL subjects**
    - Calculate percentiles for each subject

13. **Identify "most improved" students (furthest from average)**
    - Calculate distance from mean, rank by absolute value

14. **Create class leaderboard with dense ranking**
    - Group by class + dense rank

15. **Find students who are top 3 in their class but not top 10 overall**
    - Compare class rank and overall rank with conditions

### Challenge Problems

16. **Olympic-style ranking: Ties share minimum rank, next rank skipped**
    - Use `rank(method='min')`

17. **Create a "balance score" ranking students by how evenly they perform across subjects**
    - Calculate standard deviation across subjects, rank by lowest std

18. **Find the median-performing student in each class**
    - Use percentile rank ‚âà 50%

19. **Rank students by total score, but penalize for poor attendance (<80%)**
    - Adjust score based on attendance before ranking

20. **Create a "surprise performer" metric: high score despite low attendance**
    - Calculate ratio: score / attendance, then rank

In [None]:
print("=== PRACTICE EXERCISE SOLUTIONS ===\n")
print("Try to solve the exercises above, then check solutions here!\n")

# Solution 1: Sort by name
print("Solution 1: Sort students alphabetically")
sorted_by_name = students.sort_values('name')
print(sorted_by_name[['name', 'total_score']].head())
print()

# Solution 2: Highest math score
print("Solution 2: Student with highest math score")
top_math = students.nlargest(1, 'math_score')
print(top_math[['name', 'math_score']])
print()

# Solution 6: Top 3 in each class
print("Solution 6: Top 3 students in each class")
students['class_rank'] = students.groupby('class')['total_score'].rank(
    method='dense', ascending=False
)
top3_per_class = students[students['class_rank'] <= 3].sort_values(['class', 'class_rank'])
print(top3_per_class[['name', 'class', 'total_score', 'class_rank']])
print()

# Solution 11: Reward score
print("Solution 11: Reward score (70% performance + 30% attendance)")
students['reward_score'] = (students['avg_score'] * 0.7 + 
                            students['attendance_%'] * 0.3)
students['reward_rank'] = students['reward_score'].rank(ascending=False)
reward_leaders = students.sort_values('reward_rank').head(5)
print(reward_leaders[['name', 'avg_score', 'attendance_%', 'reward_score', 'reward_rank']])
print()

# Solution 17: Balance score
print("Solution 17: Most balanced performers (low std deviation)")
students['score_std'] = students[['math_score', 'science_score', 'english_score']].std(axis=1)
students['balance_rank'] = students['score_std'].rank(ascending=True)  # Lower std = better
balanced = students.sort_values('balance_rank').head(5)
print(balanced[['name', 'math_score', 'science_score', 'english_score', 'score_std', 'balance_rank']])
print("\nNote: Lower std deviation = more consistent across subjects")

print("\n" + "="*70)
print("Try solving the remaining exercises on your own!")
print("="*70)

## Summary & Quick Reference Card

### Core Concepts Learned

‚úÖ **Sorting**
- `sort_values()` - Sort by column values
- `sort_index()` - Sort by index
- Single and multiple column sorting
- Ascending vs descending order

‚úÖ **Ranking**
- `rank()` - Assign numerical positions
- 5 ranking methods: average, min, max, dense, first
- Percentile rankings
- Group-wise ranking

‚úÖ **Quick Selection**
- `nlargest()` - Get top N rows
- `nsmallest()` - Get bottom N rows
- Performance advantages

‚úÖ **Advanced Techniques**
- Custom sorting logic
- Handling ties
- NA value positioning
- Multi-level sorting

---

### Quick Reference Card

```python
# SORTING
df.sort_values('col')                          # Sort ascending
df.sort_values('col', ascending=False)         # Sort descending
df.sort_values(['col1', 'col2'])              # Multi-column sort
df.sort_index()                                # Sort by index

# TOP/BOTTOM SELECTION
df.nlargest(10, 'col')                         # Top 10
df.nsmallest(5, 'col')                         # Bottom 5

# RANKING
df['rank'] = df['col'].rank()                  # Average method
df['rank'] = df['col'].rank(ascending=False)   # High to low
df['rank'] = df['col'].rank(method='dense')    # No gaps
df['rank'] = df['col'].rank(method='min')      # Min rank in tie
df['rank'] = df['col'].rank(pct=True) * 100    # Percentile

# GROUP-WISE RANKING
df['rank'] = df.groupby('category')['value'].rank()

# TOP N PER GROUP
df['rank'] = df.groupby('cat')['val'].rank(ascending=False)
top_per_group = df[df['rank'] <= 3]
```

---

### Method Selection Guide

| Task | Use This |
|------|----------|
| Need full sorted dataset | `sort_values()` |
| Need only top/bottom N | `nlargest()` / `nsmallest()` |
| Assign positions | `rank()` |
| Leaderboard (no gaps) | `rank(method='dense')` |
| Competition ranking | `rank(method='min')` |
| Percentile scores | `rank(pct=True)` |
| Top N per category | `groupby()` + `rank()` |
| Restore original order | `sort_index()` |

---

### Ranking Methods Comparison

**Scores: [95, 90, 90, 85]**

| Method | Ranks | Use Case |
|--------|-------|----------|
| average | [1, 2.5, 2.5, 4] | Statistical analysis |
| min | [1, 2, 2, 4] | Olympic-style |
| max | [1, 3, 3, 4] | Generous grading |
| dense | [1, 2, 2, 3] | Leaderboards |
| first | [1, 2, 3, 4] | Order matters |

---

### Key Takeaways

1. üéØ **Choose the right method** for your use case
2. ‚ö° **Use `nlargest/nsmallest`** for top/bottom N (faster)
3. üèÜ **Pick appropriate ranking method** for ties
4. üìä **Group-wise ranking** for category comparisons
5. üîÑ **Reset index** after sorting for clean numbering

---

### Next Steps

After mastering sorting and ranking:
1. **Grouping & Aggregation** - Advanced group operations
2. **Window Functions** - Rolling calculations
3. **Time Series** - Date-based sorting and ranking
4. **Visualization** - Plot rankings and distributions
5. **Statistical Analysis** - Percentile-based analysis

---

**Happy Sorting & Ranking! üöÄüìä**