# 1.3 Basic SELECT Queries

This section introduces the fundamental SELECT statement - the foundation of data retrieval in SQL.

## Learning Objectives
By the end of this section, you will be able to:
- Write basic SELECT statements to retrieve data
- Select specific columns vs. all columns
- Use aliases to rename columns
- Apply LIMIT to control result size
- Understand query execution order

## Prerequisites
- Completed sections 1.1 and 1.2
- Understanding of table structure
- Sample data from previous sections

## Database Connection

Let's connect to our database and verify our sample data is ready.

In [None]:
import sqlite3
import pandas as pd
from IPython.display import display

# Connect to our database
conn = sqlite3.connect('my_database.db')
cursor = conn.cursor()

# Verify our tables exist
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
tables = cursor.fetchall()

print("✅ Database connection established!")
print(f"📚 Available tables: {[table[0] for table in tables]}")

## The SELECT Statement Structure

The basic syntax of a SELECT statement:

```sql
SELECT column1, column2, ...
FROM table_name;
```

### Key Components:
- **SELECT**: Specifies which columns to retrieve
- **FROM**: Specifies which table to query
- **;**: Ends the SQL statement

In [None]:
# Example 1: Select all columns with *
print("📊 Example 1: Select all columns from departments")
df = pd.read_sql_query("SELECT * FROM departments", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 2: Select specific columns
print("📊 Example 2: Select specific columns from employees")
df = pd.read_sql_query("SELECT first_name, last_name, email FROM employees", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 3: Select with calculated columns
print("📊 Example 3: Select with calculations")
df = pd.read_sql_query("""
SELECT 
    first_name, 
    last_name, 
    salary,
    salary * 12 as annual_salary
FROM employees
""", conn)
display(df)

## Column Aliases

Aliases allow you to rename columns in your query results for better readability.

### Alias Syntax:
- `column_name AS alias_name`
- `column_name alias_name` (AS is optional)

In [None]:
# Example: Using aliases for better column names
print("🏷️ Using Column Aliases:")

df = pd.read_sql_query("""
SELECT 
    dept_id AS "Department ID",
    dept_name AS "Department Name",
    location AS "Office Location",
    budget AS "Annual Budget",
    is_active AS "Currently Active"
FROM departments
""", conn)
display(df)

print("\n" + "="*30 + "\n")

# Example: Aliases with calculations
print("🧮 Aliases with Calculations:")
df = pd.read_sql_query("""
SELECT 
    project_name AS "Project",
    budget AS "Budget ($)",
    budget / 1000 AS "Budget (K$)",
    ROUND(budget / 1000, 1) AS "Budget (Rounded K$)"
FROM projects
""", conn)
display(df)

## Limiting Results with LIMIT

The LIMIT clause controls how many rows are returned in your result set.

### LIMIT Syntax:
- `LIMIT n` - Returns first n rows
- `LIMIT n OFFSET m` - Returns n rows starting from row m+1

In [None]:
# Example 1: Using LIMIT to get first 3 employees
print("🔢 Example 1: First 3 employees")
df = pd.read_sql_query("SELECT * FROM employees LIMIT 3", conn)
display(df)

print("\n" + "="*30 + "\n")

# Example 2: Using LIMIT with OFFSET
print("🔢 Example 2: Employees 2-4 (OFFSET demonstration)")
df = pd.read_sql_query("SELECT first_name, last_name, salary FROM employees LIMIT 3 OFFSET 1", conn)
display(df)

print("\n" + "="*30 + "\n")

# Example 3: Top 2 highest budget projects
print("🔢 Example 3: Projects ordered by budget (we'll learn ORDER BY in detail later)")
df = pd.read_sql_query("""
SELECT project_name, budget 
FROM projects 
ORDER BY budget DESC 
LIMIT 2
""", conn)
display(df)

## DISTINCT - Removing Duplicates

The DISTINCT keyword removes duplicate rows from your result set.

### DISTINCT Usage:
- `SELECT DISTINCT column` - Unique values from one column
- `SELECT DISTINCT col1, col2` - Unique combinations of multiple columns

In [None]:
# Example 1: Distinct departments (from employees table)
print("🎯 Example 1: Distinct department IDs from employees")
df = pd.read_sql_query("SELECT DISTINCT dept_id FROM employees", conn)
display(df)

print("\n" + "="*30 + "\n")

# Example 2: Distinct locations
print("🎯 Example 2: Distinct office locations")
df = pd.read_sql_query("SELECT DISTINCT location FROM departments", conn)
display(df)

print("\n" + "="*30 + "\n")

# Example 3: Distinct combinations
print("🎯 Example 3: Distinct status and priority combinations from projects")
df = pd.read_sql_query("SELECT DISTINCT status, priority FROM projects ORDER BY status, priority", conn)
display(df)

## Combining Multiple Concepts

Let's combine what we've learned into more complex queries.

In [None]:
# Complex example combining all concepts
print("🚀 Complex Query Example:")
print("Get unique employee information with calculated annual salary, limited results")

df = pd.read_sql_query("""
SELECT DISTINCT
    first_name AS "First Name",
    last_name AS "Last Name", 
    email AS "Email Address",
    salary AS "Monthly Salary",
    salary * 12 AS "Annual Salary",
    dept_id AS "Department"
FROM employees
WHERE salary > 70000
ORDER BY salary DESC
LIMIT 3
""", conn)
display(df)

print("\n" + "="*30 + "\n")

# Another example with projects
print("🎯 Project Summary Example:")
df = pd.read_sql_query("""
SELECT 
    project_name AS "Project Name",
    status AS "Current Status",
    budget / 1000 AS "Budget (K$)",
    priority AS "Priority Level",
    CASE 
        WHEN priority >= 4 THEN 'High'
        WHEN priority = 3 THEN 'Medium' 
        ELSE 'Low'
    END AS "Priority Label"
FROM projects
ORDER BY priority DESC, budget DESC
LIMIT 4
""", conn)
display(df)

## Practice Exercises

Practice writing SELECT statements with the concepts you've learned:

1. **Basic Selection**: Get all information from the customers table
2. **Specific Columns**: Select only company names and emails from customers
3. **With Aliases**: Create meaningful column names for a departments query
4. **Limited Results**: Get the first 2 projects with their names and budgets
5. **Distinct Values**: Find all unique countries from the customers table

Complete the exercises below:

In [None]:
# Exercise 1: Basic selection from customers
print("Exercise 1: All customer information")
df = pd.read_sql_query("SELECT * FROM customers", conn)
display(df)

print("\n" + "="*30 + "\n")

# Exercise 2: Specific columns
print("Exercise 2: Company names and emails")
df = pd.read_sql_query("""
SELECT company_name, email 
FROM customers
""", conn)
display(df)

print("\n" + "="*30 + "\n")

# Exercise 3: Using aliases
print("Exercise 3: Departments with meaningful column names")
df = pd.read_sql_query("""
SELECT 
    dept_name AS "Department",
    location AS "Office Location", 
    budget AS "Annual Budget ($)",
    CASE 
        WHEN is_active = 1 THEN 'Active'
        ELSE 'Inactive'
    END AS "Status"
FROM departments
""", conn)
display(df)

print("\n" + "="*30 + "\n")

# Exercise 4: Limited results
print("Exercise 4: First 2 projects with names and budgets")
df = pd.read_sql_query("""
SELECT 
    project_name AS "Project Name",
    budget AS "Budget ($)"
FROM projects
LIMIT 2
""", conn)
display(df)

print("\n" + "="*30 + "\n")

# Exercise 5: Distinct values
print("Exercise 5: Unique countries from customers")
df = pd.read_sql_query("SELECT DISTINCT country FROM customers", conn)
display(df)

## Query Performance Tips

### Best Practices for SELECT queries:
1. **Specify columns**: Use specific column names instead of `*` when possible
2. **Use LIMIT**: Add LIMIT to prevent accidentally large result sets
3. **Column order**: Put most selective columns first in multi-column operations
4. **Aliases**: Use meaningful aliases for calculated columns

### Understanding Query Execution:
1. **FROM** - Identify the source table
2. **SELECT** - Determine which columns to return  
3. **DISTINCT** - Remove duplicates if specified
4. **ORDER BY** - Sort results (covered in next section)
5. **LIMIT** - Restrict number of rows returned

In [None]:
# Demonstration of query performance considerations
print("📈 Query Performance Examples:")

# Less efficient - selecting all columns
print("❌ Less efficient (selecting all columns):")
df = pd.read_sql_query("SELECT * FROM employees", conn)
print(f"Columns returned: {len(df.columns)}")

print("\n" + "="*30 + "\n")

# More efficient - selecting only needed columns
print("✅ More efficient (selecting specific columns):")
df = pd.read_sql_query("SELECT first_name, last_name FROM employees", conn)
print(f"Columns returned: {len(df.columns)}")

print("\n📊 Memory usage difference is significant with larger datasets!")

## Section Summary

In this section, you mastered the fundamentals of data retrieval:

✅ **SELECT Statement**: The foundation of all data queries  
✅ **Column Selection**: Choosing specific columns vs. selecting all (*)  
✅ **Aliases**: Creating meaningful column names with AS  
✅ **LIMIT**: Controlling result set size for performance  
✅ **DISTINCT**: Removing duplicate rows from results  
✅ **Query Structure**: Understanding the logical flow of SELECT queries  

### Key SQL Commands:
- `SELECT column1, column2` - Specify columns to retrieve
- `SELECT *` - Retrieve all columns  
- `AS alias_name` - Create column aliases
- `LIMIT n` - Restrict number of rows
- `DISTINCT` - Remove duplicate rows
- `CASE WHEN` - Conditional logic in SELECT

### Query Structure Pattern:
```sql
SELECT [DISTINCT] column_list
FROM table_name  
[WHERE conditions]
[ORDER BY columns]
[LIMIT number];
```

### Next Steps
In section 1.4, you'll learn how to filter your data using WHERE clauses to find exactly the information you need.