# Lab Module 3: Row-Level Functions

(Run the below cell first, to ensure connectivity)

In [None]:
%load_ext sql

%sql postgresql://admin:password@postgres:5432/postgres

## Challenge 1: Standardizing Location Names (String Functions)
- **Context**: The marketing team wants to export customer data for a mailing campaign, but they require all city names to be strictly uppercase to match their printing software requirements. 
- **Task**: Write a query that lists the `customer_id` and the `customer_city`, from the `customers` table. Convert the `customer_city` to all uppercase letters. Rename the derived column as `standardized_city`.

In [None]:
%%sql
-- Write your solution here


## Challenge 2: calculating Freight Density (Numeric Functions)
- **Context**: The logistics team is auditing shipping rates. They need to see the weight of products. To simplify the view, they want the weight rounded to the nearest kilogram. 
- **Task**: Select the `product_id` and `product_weight_g` from the `products` table. Create a calculated column that converts grams to kilograms (divide by 1000) and rounds the result to 1 decimal place. Alias this as `weight_kg`.

In [None]:
%%sql
-- Write your solution here


## Challenge 3: Extracting Order Year (Date Functions)
- **Context**: Analysts are performing a quick check on the distribution of orders. They need to extract just the year from the purchase timestamp to verify data range. 
- **Task**: Write a query against the `orders` table. Select `order_id` and `order_purchase_timestamp`. Create a new column `order_year` that extracts the integer year from the timestamp.

In [None]:
%%sql
-- Write your solution here
   

## Challenge 4: Generating Location Slugs (String Concatenation)
- **Context**: The web development team needs a "Location Slug" for SEO purposes, combining the city and state for every seller. 
- **Task**: Select `seller_id` from the `sellers` table. Create a new column named `seller_slug` that combines `seller_city` and `seller_state` separated by a hyphen (e.g., "rio de janeiro-RJ"). Ensure the city is lowercased.

In [None]:
%%sql
-- Write your solution here


## Challenge 5: Anonymizing Customer Keys (String Substring)
- **Context**: For a privacy compliance report, you need to display only the first 5 characters of the customer_unique_id. 
- **Task**: Query the `customers` table. Select `customer_id` and a new column `masked_unique_id` that contains only the first 5 characters of `customer_unique_id`.

In [None]:
%%sql
-- Write your solution here


## Challenge 6: Estimated Delivery Lag (Date Arithmetic)
- **Context**: Customer Service wants to know the gap between purchasing and approval. 
- **Task**: Select `order_id` from the `orders` table. Calculate the difference between `order_approved_at` and `order_purchase_timestamp`. Alias this column as `approval_time`.

In [None]:
%%sql
-- Write your solution here


## Challenge 7: Ceiling Calculation for Box Sizing (Numeric Math)
- **Context**: Packaging boxes come in 10cm increments. The warehouse needs to know the "Box Height" required for each product. 
- **Task**: Select `product_id` and `product_height_cm` from the `products` table. Calculate a `required_box_height` by dividing the product height by 10, using the CEILING function to round up to the nearest integer, and then multiplying by 10

In [None]:
%%sql
-- Write your solution here


## Challenge 8: Handling Null Delivery Dates (COALESCE)
- **Context**: Some orders have not been delivered yet, resulting in NULL values in the delivered column. For reporting, these should display as 'In Transit'. 
- **Task**: Select `order_id` and `order_delivered_customer_date` from the `orders` table. Create a column `delivery_status_label`. If the delivery date is NULL, return 'In Transit'. If it is not NULL, return 'Delivered'. (Hint: You can use CASE or COALESCE with casting, but a CASE statement is often clearer for changing data types).

In [None]:
%%sql
-- Write your solution here


## Challenge 9: Formatting Prices (Casting & Concatenation)
- **Context**: The frontend team needs raw data formatted as a currency string for a quick prototype view. 
- **Task**: From `order_items`, select `order_id` and `price`. Create a column `price_formatted` that converts the `price` to a string (VARCHAR) and prepends 'R$ ' to it.

In [None]:
%%sql
-- Write your solution here


## Challenge 10: Categorizing Product Weight (Simple CASE)
- **Context**: Logistics wants to segment products into 'Light' and 'Heavy' for conveyor belt routing. 
- **Task**: Select `product_id` and `product_weight_g` from the `products` table. Create a column `weight_category`. If the weight is less than 1000g, label it 'Light'. Otherwise, label it 'Heavy'.

In [None]:
%%sql
-- Write your solution here


## Challenge 11: Shipping Days Calculation (Date Extraction)
- **Context**: The Operations Director needs to know exactly how many full days it took for the carrier to deliver the package after picking it up. 
- **Task**: Select `order_id` from the `orders` table. Calculate the number of days between `order_delivered_carrier_date` and `order_delivered_customer_date`. Use EXTRACT(DAY FROM ...) or simple date subtraction casting to verify full days. Alias as `transit_days`.

In [None]:
%%sql
-- Write your solution here


## Challenge 12: Fixing Data Entry Issues (REPLACE)
- **Context**: A data entry error caused some product categories to use underscores instead of spaces. The report needs them to be readable. 
- **Task**: Select `product_category_name` from the `products` table. Create a column `readable_category` that replaces all underscores (_) with spaces.

In [None]:
%%sql
-- Write your solution here


## Challenge 13: Volume Calculation (Multi-Column Math)
- **Context**: Warehouse optimization requires the volume of every product to calculate storage density. 
- **Task**: Select `product_id` from the `products` table. Calculate the volume in cubic centimeters (`product_length_cm` * `product_height_cm` * `product_width_cm`). Alias as `product_volume_cm3`.

In [None]:
%%sql
-- Write your solution here


## Challenge 14: On-Time Delivery Check (Conditional Logic with Dates)
- **Context**: We need to flag orders that were delivered later than the estimated delivery date. 
- **Task**: Select `order_id`, `order_delivered_customer_date`, and `order_estimated_delivery_date` from the `orders` table. Create a column `delivery_performance`. If the actual delivery date is greater than the estimate, label 'Late'; otherwise, label 'On Time'. Handle NULLs in actual delivery as 'Pending'.

In [None]:
%%sql
-- Write your solution here


## Challenge 15: Validating Zip Codes (Length Check)
- **Context**: The system expects all Zip Code prefixes to be exactly 5 digits. We need to flag any rows that look suspicious (though the schema defines them as VARCHAR(10)). 
- **Task**: Select `customer_id` and `customer_zip_code_prefix` from the `customers` table. Create a column `is_valid_length`. If the length of the zip code prefix is 5, return 'Valid', otherwise return 'Check'.

In [None]:
%%sql
-- Write your solution here


## Challenge 16: Safe Division (NULLIF)
- **Context**: You are calculating the ratio of freight value to price. However, some items might be free giveaways (price = 0), which would cause a division by zero error. 
- **Task**: Select `order_id` and `order_item_id` from the `order_items` table. Calculate `freight_value` divided by `price`. Use NULLIF on the `price` to handle zeros (turning them into NULLs) to avoid errors. Alias as `freight_ratio`.

In [None]:
%%sql
-- Write your solution here


## Challenge 17: Order Month for Reporting
- **Context**: Management wants to see the "Purchase Month" for every order, to prepare for a future monthly report. 
- **Task**: Select `order_id` and `order_purchase_timestamp` from the `orders` table. Create a new column `purchase_month` that extracts the month from `order_purchase_timestamp`.

In [None]:
%%sql
-- Write your solution here


## Challenge 18: Complex Shipping Logic (Nested CASE)
- **Context**: Shipping rules are complex. If the product weighs < 500g, it's 'Small Packet'. If it's between 500g and 2000g, it's 'Parcel'. Anything over 2000g is 'Cargo'. 
- **Task**: Select `product_id` and `product_weight_g` from the `products` table. Write a CASE statement to apply these rules. Alias as `shipping_mode`.

In [None]:
%%sql
-- Write your solution here

## Challenge 19: Filter by Function Result (WHERE clause integration)
- **Context**: Sometimes, users enter city names with trailing spaces. We need to find sellers where the length of their city name is greater than 20 characters (perhaps to audit for bad data). 
- **Task**: Select `seller_id` and `seller_city` from the `sellers` table. Filter the results to only show rows where the LENGTH of `seller_city` is strictly greater than 20.

In [None]:
%%sql
-- Write your solution here

## Challenge 20: The "Clean Data" Export (Combination)
- **Context**: Construct a final clean dataset for the customers table. The requirements are strict: The ID must be lower case, the Zip Code must be cast to an integer, and the State must be upper case. 
- **Task**: Select `customer_id` (converted to lower case) aliased as `customer_id`, `customer_zip_code_prefix` (cast to integer, aliased as `zip_int`), and `customer_state` (converted to upper case) aliased as `customer_state`, from the `customers` table. Filter to only show customers from the state 'SP'.

In [None]:
%%sql
-- Write your solution here