# Homework Assignment - Lesson 3: Data Transformation with dplyr - Part 1

**Student Name:** SOLUTION KEY

**Due Date:** [Insert Due Date Here]

**Objective:** Learn to use dplyr functions (`select()`, `filter()`, `arrange()`) and the pipe operator (`%>%`) for data transformation and analysis.

---

## Instructions

- Complete all tasks in this notebook
- Use the pipe operator (`%>%`) wherever possible to chain operations
- Ensure your code is well-commented and easy to understand
- Run all cells to verify your code works correctly
- Answer all reflection questions at the end

---

## Part 1: Data Import and Setup

In this section, you'll import the retail transactions dataset and perform initial exploration.

**Dataset:** `retail_transactions.csv` - This dataset contains transaction records from a retail business with information about customers, products, dates, amounts, and quantities.

In [4]:
# Load required libraries
library(tidyverse)

# Set working directory to data folder
# Note: Adjust path based on where your notebook is located
setwd("/Users/humphrjk/GitHub/ai-homework-grader-clean/data")  # If notebook is in project root
# OR use relative path in read_csv:
# transactions <- read_csv("data/retail_transactions.csv")

# Task 1.1: Import the retail_transactions.csv file
transactions <- read_csv("retail_transactions.csv")

# Display success message
cat("Data imported successfully!\n")
cat("Dataset dimensions:", nrow(transactions), "rows x", ncol(transactions), "columns\n")

[1mRows: [22m[34m500[39m [1mColumns: [22m[34m9[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (4): CustomerName, CustomerCity, ProductName, ProductCategory
[32mdbl[39m  (4): TransactionID, CustomerID, TotalAmount, Quantity
[34mdate[39m (1): TransactionDate

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


Data imported successfully!
Dataset dimensions: 500 rows x 9 columns


In [5]:
# Task 1.2: Initial Exploration

# Display the first 10 rows
cat("First 10 rows of the dataset:\n")
head(transactions, 10)

# Check the structure of the dataset
cat("\nDataset structure:\n")
str(transactions)

# Display column names and their data types
cat("\nColumn names:\n")
names(transactions)

First 10 rows of the dataset:


TransactionID,CustomerID,CustomerName,CustomerCity,ProductName,ProductCategory,TotalAmount,Quantity,TransactionDate
<dbl>,<dbl>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<date>
1,81,Customer 39,Chicago,Adidas Jacket,Clothing,632.39,3,2024-03-09
2,13,Customer 63,Philadelphia,Samsung TV,Music,114.28,3,2024-12-08
3,18,Customer 98,Chicago,Adidas Jacket,Computers,1289.24,7,2024-01-22
4,76,Customer 39,Houston,Dell Laptop,Computers,885.4,2,2024-07-02
5,86,Customer 45,New York,Nike Shoes,Computers,95.95,5,2024-08-13
6,37,Customer 8,Philadelphia,Adidas Jacket,Electronics,1126.34,2,2024-04-15
7,45,Customer 83,New York,HP Printer,Clothing,78.71,3,2024-05-02
8,11,Customer 60,Chicago,Samsung TV,Music,871.93,3,2024-04-30
9,13,Customer 69,Houston,iPhone 14,Music,1347.56,8,2024-08-08
10,55,Customer 24,Chicago,Sony Headphones,Books,633.51,1,2024-06-23



Dataset structure:
spc_tbl_ [500 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ TransactionID  : num [1:500] 1 2 3 4 5 6 7 8 9 10 ...
 $ CustomerID     : num [1:500] 81 13 18 76 86 37 45 11 13 55 ...
 $ CustomerName   : chr [1:500] "Customer 39" "Customer 63" "Customer 98" "Customer 39" ...
 $ CustomerCity   : chr [1:500] "Chicago" "Philadelphia" "Chicago" "Houston" ...
 $ ProductName    : chr [1:500] "Adidas Jacket" "Samsung TV" "Adidas Jacket" "Dell Laptop" ...
 $ ProductCategory: chr [1:500] "Clothing" "Music" "Computers" "Computers" ...
 $ TotalAmount    : num [1:500] 632 114 1289 885 96 ...
 $ Quantity       : num [1:500] 3 3 7 2 5 2 3 3 8 1 ...
 $ TransactionDate: Date[1:500], format: "2024-03-09" "2024-12-08" ...
 - attr(*, "spec")=
  .. cols(
  ..   TransactionID = [32mcol_double()[39m,
  ..   CustomerID = [32mcol_double()[39m,
  ..   CustomerName = [31mcol_character()[39m,
  ..   CustomerCity = [31mcol_character()[39m,
  ..   ProductName = [31mcol_character()[39m,


## Part 2: Column Selection with `select()`

Practice different methods of selecting columns from your dataset.

In [6]:
# Task 2.1: Basic Selection
basic_info <- transactions %>%
  select(TransactionID, CustomerID, ProductName, TotalAmount)

# Display the result
cat("Basic info dataset (first 5 rows):\n")
head(basic_info, 5)

Basic info dataset (first 5 rows):


TransactionID,CustomerID,ProductName,TotalAmount
<dbl>,<dbl>,<chr>,<dbl>
1,81,Adidas Jacket,632.39
2,13,Samsung TV,114.28
3,18,Adidas Jacket,1289.24
4,76,Dell Laptop,885.4
5,86,Nike Shoes,95.95


In [7]:
# Task 2.2: Range Selection
customer_details <- transactions %>%
  select(CustomerID:CustomerCity)

# Display the result
cat("Customer details (first 5 rows):\n")
head(customer_details, 5)

Customer details (first 5 rows):


CustomerID,CustomerName,CustomerCity
<dbl>,<chr>,<chr>
81,Customer 39,Chicago
13,Customer 63,Philadelphia
18,Customer 98,Chicago
76,Customer 39,Houston
86,Customer 45,New York


In [8]:
# Task 2.3: Pattern-Based Selection

# Create 'date_columns' with columns starting with "Date" or "Time"
date_columns <- transactions %>%
  select(starts_with("Date") | starts_with("Time"))

# Create 'amount_columns' with columns containing the word "Amount"
amount_columns <- transactions %>%
  select(contains("Amount"))

# Display column names for verification
cat("Date/Time columns:", names(date_columns), "\n")
cat("Amount columns:", names(amount_columns), "\n")

Date/Time columns:  
Amount columns: TotalAmount 


In [9]:
# Task 2.4: Exclusion Selection
no_ids <- transactions %>%
  select(-TransactionID, -CustomerID)

# Display column names for verification
cat("Columns after removing IDs:", names(no_ids), "\n")
cat("Number of columns:", ncol(no_ids), "\n")

Columns after removing IDs: CustomerName CustomerCity ProductName ProductCategory TotalAmount Quantity TransactionDate 
Number of columns: 7 


## Part 3: Row Filtering with `filter()`

Learn to filter rows based on various conditions.

In [10]:
# Task 3.1: Single Condition Filtering

# Filter transactions with TotalAmount > $100
high_value_transactions <- transactions %>%
  filter(TotalAmount > 100)

# Filter transactions from "Electronics" category
electronics_transactions <- transactions %>%
  filter(ProductCategory == "Electronics")

# Display results
cat("High value transactions (>$100):", nrow(high_value_transactions), "rows\n")
cat("Electronics transactions:", nrow(electronics_transactions), "rows\n")

High value transactions (>$100): 470 rows
Electronics transactions: 93 rows


In [11]:
# Task 3.2: Multiple Condition Filtering (AND)
ny_bulk_purchases <- transactions %>%
  filter(TotalAmount > 50 & Quantity > 1 & CustomerCity == "New York")

# Display results
cat("NY bulk purchases:", nrow(ny_bulk_purchases), "rows\n")
if(nrow(ny_bulk_purchases) > 0) {
  head(ny_bulk_purchases)
}

NY bulk purchases: 75 rows


TransactionID,CustomerID,CustomerName,CustomerCity,ProductName,ProductCategory,TotalAmount,Quantity,TransactionDate
<dbl>,<dbl>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<date>
5,86,Customer 45,New York,Nike Shoes,Computers,95.95,5,2024-08-13
7,45,Customer 83,New York,HP Printer,Clothing,78.71,3,2024-05-02
15,1,Customer 52,New York,Nike Shoes,Clothing,602.79,5,2024-01-05
22,80,Customer 4,New York,iPhone 14,Books,1424.99,7,2024-02-01
25,97,Customer 17,New York,iPhone 14,Movies,999.24,2,2024-02-03
29,3,Customer 23,New York,Samsung TV,Clothing,1392.13,5,2024-04-30


In [12]:
# Task 3.3: Multiple Condition Filtering (OR)
entertainment_transactions <- transactions %>%
  filter(ProductCategory %in% c("Books", "Music", "Movies"))

# Display results
cat("Entertainment transactions:", nrow(entertainment_transactions), "rows\n")
if(nrow(entertainment_transactions) > 0) {
  head(entertainment_transactions)
}

Entertainment transactions: 227 rows


TransactionID,CustomerID,CustomerName,CustomerCity,ProductName,ProductCategory,TotalAmount,Quantity,TransactionDate
<dbl>,<dbl>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<date>
2,13,Customer 63,Philadelphia,Samsung TV,Music,114.28,3,2024-12-08
8,11,Customer 60,Chicago,Samsung TV,Music,871.93,3,2024-04-30
9,13,Customer 69,Houston,iPhone 14,Music,1347.56,8,2024-08-08
10,55,Customer 24,Chicago,Sony Headphones,Books,633.51,1,2024-06-23
11,100,Customer 95,Philadelphia,HP Printer,Movies,572.43,6,2024-12-09
14,19,Customer 100,Phoenix,Nike Shoes,Books,32.29,3,2024-12-11


In [13]:
# Task 3.4: Date-Based Filtering
# Filter transactions from March 2024
march_transactions <- transactions %>%
  filter(month(TransactionDate) == 3 & year(TransactionDate) == 2024)

# Display results
cat("March 2024 transactions:", nrow(march_transactions), "rows\n")

March 2024 transactions: 41 rows


In [14]:
# Task 3.5: Advanced Filtering Challenge

# Step 1: Find customers who bought Electronics
electronics_customers <- transactions %>%
  filter(ProductCategory == "Electronics") %>%
  pull(CustomerID) %>%
  unique()

# Step 2: Find customers who bought Clothing
clothing_customers <- transactions %>%
  filter(ProductCategory == "Clothing") %>%
  pull(CustomerID) %>%
  unique()

# Step 3: Find customers who bought both
both_categories_customers <- intersect(electronics_customers, clothing_customers)

# Display results
cat("Customers who bought both Electronics and Clothing:", length(both_categories_customers), "customers\n")

Customers who bought both Electronics and Clothing: 38 customers


## Part 4: Data Sorting with `arrange()`

Practice sorting data by single and multiple columns.

In [15]:
# Task 4.1: Single Column Sorting

# Sort by TotalAmount ascending
transactions_by_amount_asc <- transactions %>%
  arrange(TotalAmount)

# Sort by TotalAmount descending
transactions_by_amount_desc <- transactions %>%
  arrange(desc(TotalAmount))

# Display top 5 of each
cat("Lowest amounts:\n")
head(transactions_by_amount_asc %>% select(CustomerName, ProductName, TotalAmount), 5)

cat("\nHighest amounts:\n")
head(transactions_by_amount_desc %>% select(CustomerName, ProductName, TotalAmount), 5)

Lowest amounts:


CustomerName,ProductName,TotalAmount
<chr>,<chr>,<dbl>
Customer 95,Adidas Jacket,27.66
Customer 100,Nike Shoes,32.29
Customer 50,Adidas Jacket,35.01
Customer 83,Samsung TV,36.37
Customer 69,Dell Laptop,37.33



Highest amounts:


CustomerName,ProductName,TotalAmount
<chr>,<chr>,<dbl>
Customer 60,Sony Headphones,1499.52
Customer 28,Dell Laptop,1491.96
Customer 81,Sony Headphones,1491.62
Customer 79,iPhone 14,1488.95
Customer 20,HP Printer,1487.44


In [16]:
# Task 4.2: Multiple Column Sorting
transactions_by_city_amount <- transactions %>%
  arrange(CustomerCity, desc(TotalAmount))

# Display first 10 rows
cat("Transactions sorted by city, then amount:\n")
head(transactions_by_city_amount %>% select(CustomerCity, CustomerName, ProductName, TotalAmount), 10)

Transactions sorted by city, then amount:


CustomerCity,CustomerName,ProductName,TotalAmount
<chr>,<chr>,<chr>,<dbl>
Chicago,Customer 18,iPhone 14,1487.43
Chicago,Customer 20,HP Printer,1476.49
Chicago,Customer 97,Dell Laptop,1459.42
Chicago,Customer 49,Dell Laptop,1453.94
Chicago,Customer 70,Nike Shoes,1428.35
Chicago,Customer 71,Samsung TV,1424.38
Chicago,Customer 62,Sony Headphones,1416.35
Chicago,Customer 11,Nike Shoes,1407.51
Chicago,Customer 99,HP Printer,1388.73
Chicago,Customer 49,Nike Shoes,1370.57


In [17]:
# Task 4.3: Date-Based Sorting
transactions_chronological <- transactions %>%
  arrange(TransactionDate)

# Display first 5 transactions chronologically
cat("Earliest transactions:\n")
head(transactions_chronological %>% select(TransactionDate, CustomerName, ProductName, TotalAmount), 5)

Earliest transactions:


TransactionDate,CustomerName,ProductName,TotalAmount
<date>,<chr>,<chr>,<dbl>
2024-01-01,Customer 62,Sony Headphones,1416.35
2024-01-01,Customer 99,Sony Headphones,1326.27
2024-01-02,Customer 83,Nike Shoes,808.93
2024-01-02,Customer 61,Adidas Jacket,502.72
2024-01-02,Customer 69,Adidas Jacket,277.7


## Part 5: Chaining Operations

Combine multiple dplyr operations using the pipe operator.

In [18]:
# Task 5.1: Simple Chain
premium_purchases <- transactions %>%
  filter(TotalAmount > 75) %>%
  select(CustomerName, ProductName, TotalAmount, CustomerCity) %>%
  arrange(desc(TotalAmount))

# Display results
cat("Premium purchases (>$75):\n")
head(premium_purchases, 10)

Premium purchases (>$75):


CustomerName,ProductName,TotalAmount,CustomerCity
<chr>,<chr>,<dbl>,<chr>
Customer 60,Sony Headphones,1499.52,New York
Customer 28,Dell Laptop,1491.96,New York
Customer 81,Sony Headphones,1491.62,Phoenix
Customer 79,iPhone 14,1488.95,Phoenix
Customer 20,HP Printer,1487.44,Phoenix
Customer 18,iPhone 14,1487.43,Chicago
Customer 83,iPhone 14,1484.31,Philadelphia
Customer 20,HP Printer,1476.49,Chicago
Customer 10,Dell Laptop,1473.27,Houston
Customer 14,iPhone 14,1471.59,Phoenix


In [19]:
# Task 5.2: Complex Chain
recent_tech_purchases <- transactions %>%
  filter(ProductCategory %in% c("Electronics", "Computers")) %>%
  select(TransactionDate, CustomerName, ProductName, TotalAmount) %>%
  arrange(desc(TransactionDate), desc(TotalAmount)) %>%
  head(20)

# Display results
cat("Recent tech purchases (top 20):\n")
print(recent_tech_purchases)

Recent tech purchases (top 20):
[90m# A tibble: 20 × 4[39m
   TransactionDate CustomerName ProductName     TotalAmount
   [3m[90m<date>[39m[23m          [3m[90m<chr>[39m[23m        [3m[90m<chr>[39m[23m                 [3m[90m<dbl>[39m[23m
[90m 1[39m 2024-12-30      Customer 27  Sony Headphones        540.
[90m 2[39m 2024-12-30      Customer 33  Sony Headphones        207.
[90m 3[39m 2024-12-25      Customer 39  Samsung TV             121.
[90m 4[39m 2024-12-24      Customer 25  Sony Headphones        700.
[90m 5[39m 2024-12-23      Customer 85  iPhone 14             [4m1[24m433.
[90m 6[39m 2024-12-19      Customer 73  Dell Laptop           [4m1[24m221.
[90m 7[39m 2024-12-13      Customer 17  Dell Laptop            141.
[90m 8[39m 2024-12-12      Customer 6   Nike Shoes             247.
[90m 9[39m 2024-12-11      Customer 54  Adidas Jacket         [4m1[24m232.
[90m10[39m 2024-12-10      Customer 19  Nike Shoes             193.
[90m11[39m 202

In [20]:
# Task 5.3: Business Intelligence Chain
high_value_customers <- transactions %>%
  filter(TotalAmount > 200) %>%
  select(CustomerName, CustomerCity, ProductName, TotalAmount) %>%
  arrange(CustomerName, desc(TotalAmount))

# Display results
cat("High-value customers:\n")
head(high_value_customers, 15)

High-value customers:


CustomerName,CustomerCity,ProductName,TotalAmount
<chr>,<chr>,<chr>,<dbl>
Customer 1,Houston,Adidas Jacket,1132.28
Customer 1,Los Angeles,HP Printer,1075.9
Customer 1,New York,Sony Headphones,1004.78
Customer 1,Phoenix,Sony Headphones,834.08
Customer 1,New York,Sony Headphones,584.87
Customer 1,Phoenix,Samsung TV,347.76
Customer 1,Houston,HP Printer,259.44
Customer 10,Houston,Dell Laptop,1473.27
Customer 10,Houston,HP Printer,769.09
Customer 10,Houston,Samsung TV,488.05


## Part 6: Data Analysis Questions

Answer the following questions using the datasets you've created.

In [21]:
# Question 6.1: Transaction Volume
# Count transactions in each filtered dataset

cat("Transaction counts by dataset:\n")
cat("High value transactions:", nrow(high_value_transactions), "\n")
cat("Electronics transactions:", nrow(electronics_transactions), "\n")
cat("NY bulk purchases:", nrow(ny_bulk_purchases), "\n")
cat("Entertainment transactions:", nrow(entertainment_transactions), "\n")
cat("March transactions:", nrow(march_transactions), "\n")
cat("Premium purchases:", nrow(premium_purchases), "\n")
cat("Recent tech purchases:", nrow(recent_tech_purchases), "\n")
cat("High value customers:", nrow(high_value_customers), "\n")

Transaction counts by dataset:
High value transactions: 470 
Electronics transactions: 93 
NY bulk purchases: 75 
Entertainment transactions: 227 
March transactions: 41 
Premium purchases: 483 
Recent tech purchases: 20 
High value customers: 434 


In [22]:
# Question 6.2: Top Customers
# Find the customer who appears most frequently in high_value_customers

if(nrow(high_value_customers) > 0) {
  customer_frequency <- high_value_customers %>%
    # Your code here to count customer appearances:
  
  
  cat("Most frequent high-value customer:\n")
  print(customer_frequency)
} else {
  cat("No high-value customers found\n")
}

ERROR: Error in cat(., "Most frequent high-value customer:\n"): argument 1 (type 'list') cannot be handled by 'cat'


In [None]:
# Question 6.3: Product Analysis
# Find top 5 most expensive transactions in entertainment_transactions

if(nrow(entertainment_transactions) > 0) {
  top_entertainment <- entertainment_transactions %>%
    # Your code here:
  
  
  cat("Top 5 most expensive entertainment transactions:\n")
  print(top_entertainment)
} else {
  cat("No entertainment transactions found\n")
}

In [None]:
# Question 6.4: Geographic Analysis
# Find the city with the highest single transaction amount

highest_transaction_by_city <- transactions_by_city_amount %>%
  # Your code here:


cat("City with highest single transaction:\n")
print(highest_transaction_by_city)

## Part 7: Reflection Questions

Please answer the following questions in the markdown cells below.

### Question 7.1: Pipe Operator Benefits

**How does using the pipe operator (`%>%`) improve code readability compared to nested function calls? Provide a specific example from your homework.**

**How does using the pipe operator (`%>%`) improve code readability compared to nested function calls? Provide a specific example from your homework.**

The pipe operator makes code much more readable by allowing operations to flow from left to right, matching how we naturally think about data transformations. Instead of nesting functions inside each other, we can chain operations in a logical sequence. For example, in Task 5.1, I used `transactions %>% filter(TotalAmount > 75) %>% select(CustomerName, ProductName, TotalAmount, CustomerCity) %>% arrange(desc(TotalAmount))` which reads like instructions: "take transactions, then filter for amounts over $75, then select specific columns, then sort by amount descending." Without pipes, this would be `arrange(select(filter(transactions, TotalAmount > 75), CustomerName, ProductName, TotalAmount, CustomerCity), desc(TotalAmount))` which is much harder to read because you have to work from the inside out.


### Question 7.2: Filtering Strategy

**When filtering data for business analysis, what are the trade-offs between being very specific (many conditions) versus being more general (fewer conditions)? How might this affect your insights?**

**When filtering data for business analysis, what are the trade-offs between being very specific (many conditions) versus being more general (fewer conditions)? How might this affect your insights?**

Being very specific with multiple filter conditions gives you precise, targeted results but risks excluding relevant data and may result in very small sample sizes that aren't statistically meaningful. For example, filtering for high-value electronics purchases in New York gives specific insights but might miss important patterns in other cities or categories. Being more general with fewer conditions provides broader insights and larger sample sizes, making patterns more reliable, but may include noise or irrelevant data that obscures key findings. The best approach depends on your business question: use specific filters when you need actionable insights for a particular segment (like targeting high-value customers in a specific city), and use general filters when exploring overall trends or identifying new opportunities. I found that starting general and then adding specific conditions helped me understand both the big picture and the details.


### Question 7.3: Sorting Importance

**Why is data sorting important in business analytics? Provide three specific business scenarios where sorting data would be crucial for decision-making.**

**Why is data sorting important in business analytics? Provide three specific business scenarios where sorting data would be crucial for decision-making.**

Data sorting is crucial because it helps identify patterns, prioritize actions, and make data-driven decisions quickly. Here are three specific scenarios:

1. **Customer Prioritization**: Sorting customers by total spending (descending) helps identify your most valuable customers who should receive priority service, exclusive offers, or personal account management. In my analysis, sorting by TotalAmount revealed which customers contribute most to revenue, allowing the business to focus retention efforts where they matter most.

2. **Inventory Management**: Sorting products by sales volume and date helps identify fast-moving items that need frequent restocking versus slow-moving items that tie up capital. By sorting transactions chronologically and by product category, managers can spot seasonal trends and adjust inventory levels before stockouts or overstock situations occur.

3. **Performance Monitoring**: Sorting sales by date and amount helps track daily, weekly, or monthly performance trends. Sorting by city and amount (as in Task 4.2) reveals which geographic markets are performing best, helping executives decide where to expand operations, increase marketing spend, or investigate underperformance. This geographic sorting was particularly useful for identifying regional opportunities in my homework analysis.

 

### Question 7.4: Real-World Application

**Describe a real business scenario where you might need to combine `select()`, `filter()`, and `arrange()` operations. What insights would you be trying to gain?**

**Describe a real business scenario where you might need to combine `select()`, `filter()`, and `arrange()` operations. What insights would you be trying to gain?**

A real-world scenario would be analyzing customer churn risk for a subscription business. I would start by filtering for customers whose subscription renewal is coming up in the next 30 days and whose recent purchase frequency has declined (filter for specific date range and low activity). Then I would select only the relevant columns like CustomerID, CustomerName, LastPurchaseDate, TotalSpent, and SubscriptionEndDate to focus on actionable information. Finally, I would arrange by TotalSpent descending to prioritize high-value customers who are at risk of churning. This combined analysis would help the retention team identify which valuable customers need immediate outreach, what their purchase history looks like, and when to contact them before their subscription expires. The insights gained would directly inform retention campaigns, personalized offers, and resource allocation for the customer success team. This is similar to what I did in Task 5.3 where I identified high-value customers and sorted them to prioritize business actions.


## Summary and Submission

### What You've Learned

In this homework, you've practiced:
- Using `select()` for column selection with various methods
- Using `filter()` for row filtering with single and multiple conditions
- Using `arrange()` for sorting data by single and multiple columns
- Chaining operations with the pipe operator (`%>%`)
- Analyzing business data to generate insights

### Submission Checklist

Before submitting, ensure you have:
- [ ] Completed all code tasks
- [ ] Run all cells successfully
- [ ] Answered all reflection questions
- [ ] Used proper commenting in your code
- [ ] Used the pipe operator where appropriate
- [ ] Verified your results make sense

### Next Steps

In the next lesson, you'll learn about:
- `mutate()` for creating new columns
- `summarize()` for calculating summary statistics
- `group_by()` for grouped operations
- Advanced data transformation techniques