# PySpark Example: RDD.takeOrdered()

This notebook demonstrates how to use **`takeOrdered()`** on an RDD of `(String, Integer)` pairs.

## 🔑 What is `takeOrdered()`?
- **`rdd.takeOrdered(n, key=...)`** returns the first **`n` elements** of the RDD **ordered by a key**.
- Unlike `take(n)`, which just grabs the first `n` elements as they appear in partitions,
  `takeOrdered(n)` performs a **global sort** (but optimized, not a full shuffle).
- Default ordering = natural ascending order.
- You can use `key=lambda x: ...` to customize sorting.

In [None]:
from pyspark import SparkConf, SparkContext

# Initialize Spark
conf = SparkConf().setAppName("TakeOrderedExample").setMaster("local[*]")
sc = SparkContext.getOrCreate(conf)

# Sample data: (employee_name, salary)
data = [
    ("Alice", 95000),
    ("Bob", 120000),
    ("Carol", 70000),
    ("David", 105000),
    ("Eva", 60000),
    ("Frank", 85000)
]

rdd = sc.parallelize(data)
rdd.collect()

## 1️⃣ Smallest 3 salaries (ascending order by default)

In [None]:
smallest_salaries = rdd.takeOrdered(3, key=lambda x: x[1])
print("Lowest 3 salaries:", smallest_salaries)

## 2️⃣ Top 3 highest salaries

In [None]:
highest_salaries = rdd.takeOrdered(3, key=lambda x: -x[1])
print("Top 3 highest salaries:", highest_salaries)

## 3️⃣ Alphabetically first 3 employees

In [None]:
alphabetical_first3 = rdd.takeOrdered(3, key=lambda x: x[0])
print("Alphabetically first 3 employees:", alphabetical_first3)

## 🔎 Summary
- `takeOrdered(n)` is very useful when you want a **global “top N” query**.
- Common use cases:
  - Lowest / highest values (`min`, `max`, but for N elements).
  - Lexicographic (string) ordering.
  - Efficient sampling with order guarantee.