# Sorting and Limiting Results in SQL

This notebook covers:
- Sorting results with ORDER BY
- Ascending vs descending order (ASC/DESC)
- Limiting result sets with LIMIT
- Multiple sort criteria
- Combining WHERE, ORDER BY, and LIMIT
- Performance considerations

In [None]:
import duckdb
import pandas as pd

# Connect to the movies database
conn = duckdb.connect('movies.db')

# This will make our query results look nice
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 20)

print("Connected to database successfully!")

In [None]:
def run_query(query, show_results=True, show_all=False):
    """Execute a SQL query, optionally display results, and return them as a pandas DataFrame
    
    Args:
        query (str): SQL query to execute
        show_results (bool): Whether to display the results
        show_all (bool): If True, shows all results. If False, shows first 5 rows
        
    Returns:
        pandas.DataFrame: Query results
    """
    try:
        # Execute query and get results
        result = conn.execute(query).df()
        
        # Display results if requested
        if show_results:
            if show_all:
                display(result)
            else:
                display(result.head())
        
        return result
    except Exception as e:
        print(f"Error executing query: {str(e)}")
        return None

## ORDER BY Basics

The `ORDER BY` clause sorts results in ascending (ASC) or descending (DESC) order:
```sql
SELECT column1, column2
FROM table_name
ORDER BY column1 [ASC|DESC];
```

By default, ORDER BY uses ascending order (ASC) if not specified.

Example: Get movies sorted by release year (newest first)

In [None]:
query = """
SELECT title, release_year
FROM movies
ORDER BY release_year DESC;
"""
run_query(query)

🏋️ Challenge: Get all actors sorted by their birth year (oldest first)

In [None]:
query = """
-- Write query here
"""
run_query(query)

## Multiple Sort Criteria

You can sort by multiple columns, each with its own direction:
```sql
SELECT columns
FROM table
ORDER BY column1 ASC, column2 DESC;
```

Example: Sort movies by release year and then by title

In [None]:
query = """
SELECT title, release_year
FROM movies
ORDER BY release_year DESC, title ASC;
"""
run_query(query)

🏋️ Challenge: Get actors sorted by birth year (newest first) and then by name alphabetically

In [None]:
query = """
-- Write query here
"""
run_query(query)

## LIMIT Clause

Use LIMIT to restrict the number of rows returned:
```sql
SELECT columns
FROM table
LIMIT number_of_rows;
```

Example: Get the 5 most recent movies

In [None]:
query = """
SELECT title, release_year
FROM movies
ORDER BY release_year DESC
LIMIT 5;
"""
run_query(query, show_all=True)

🏋️ Challenge: Get the 3 oldest actors in the database

In [None]:
query = """
-- Write query here
"""
run_query(query, show_all=True)

## Combining WHERE, ORDER BY, and LIMIT

You can combine all these clauses, but they must be in the correct order:
```sql
SELECT columns
FROM table
WHERE condition
ORDER BY column
LIMIT number;
```

Example: Get the 3 most recent movies released after 2020

In [None]:
query = """
SELECT title, release_year
FROM movies
WHERE release_year > 2020
ORDER BY release_year DESC
LIMIT 3;
"""
run_query(query, show_all=True)

🏋️ Challenge: Find the 5 youngest actors born before 2000

In [None]:
query = """
-- Write query here
"""
run_query(query, show_all=True)

## Performance Considerations

When using ORDER BY and LIMIT:
- ORDER BY can be computationally expensive on large datasets
- LIMIT can improve performance by reducing the result set
- Using indexes on sorted columns can improve performance
- Consider using WHERE before ORDER BY to reduce the amount of data to sort

Example: Efficient query combining all concepts

In [None]:
query = """
SELECT title, release_year
FROM movies
WHERE release_year >= 2000  -- Filter first
ORDER BY release_year DESC, title ASC  -- Then sort
LIMIT 10;  -- Finally limit
"""
run_query(query, show_all=True)

## Solutions

Here are solutions to the challenges:

### Challenge 1: Actors by Birth Year

In [None]:
query = """
SELECT name, birth_year
FROM actors
ORDER BY birth_year ASC;
"""
run_query(query)

### Challenge 2: Actors by Birth Year and Name

In [None]:
query = """
SELECT name, birth_year
FROM actors
ORDER BY birth_year DESC, name ASC;
"""
run_query(query)

### Challenge 3: Three Oldest Actors

In [None]:
query = """
SELECT name, birth_year
FROM actors
ORDER BY birth_year ASC
LIMIT 3;
"""
run_query(query, show_all=True)

### Challenge 4: Five Youngest Pre-2000 Actors

In [None]:
query = """
SELECT name, birth_year
FROM actors
WHERE birth_year < 2000
ORDER BY birth_year DESC
LIMIT 5;
"""
run_query(query, show_all=True)

In [None]:
conn.close()
print("Database connection closed.")

## Key Points to Remember
- ORDER BY sorts results (ASC by default)
- You can sort by multiple columns
- LIMIT restricts the number of rows returned
- Clause order matters: WHERE → ORDER BY → LIMIT
- Consider performance implications when sorting large datasets
- Always close your database connection when finished