# Homework Assignment - Lesson 3: Data Transformation with dplyr - Part 1

**Student Name:** [Deon Schoeman]

**Due Date:** [09/21/2025]

**Objective:** Learn to use dplyr functions (`select()`, `filter()`, `arrange()`) and the pipe operator (`%>%`) for data transformation and analysis.

---

## Instructions

- Complete all tasks in this notebook
- Use the pipe operator (`%>%`) wherever possible to chain operations
- Ensure your code is well-commented and easy to understand
- Run all cells to verify your code works correctly
- Answer all reflection questions at the end

---

## Part 1: Data Import and Setup

In this section, you'll import the retail transactions dataset and perform initial exploration.

**Dataset:** `retail_transactions.csv` - This dataset contains transaction records from a retail business with information about customers, products, dates, amounts, and quantities.

In [None]:
# Load required libraries
# library(tidyverse)

# Set working directory if needed
# setwd("/workspaces/Assignment-3-Data-Transformation-with-dplyr---Part-1/data/")


# Task 1.1: Import the retail_transactions.csv file
# Create a data frame named 'transactions'
# Note: Import the retail_transactions.csv file

# Your code here:
transactions <- read.csv("retail_transactions.csv")

# Display success message
cat("Data imported successfully!\n")
cat("Dataset dimensions:", nrow(transactions), "rows x", ncol(transactions), "columns\n")

Data imported successfully!
Dataset dimensions: 500 rows x 9 columns


In [6]:
# Task 1.2: Initial Exploration

# Display the first 10 rows
cat("First 10 rows of the dataset:\n")
head(transactions, n = 10)


# Check the structure of the dataset
cat("\nDataset structure:\n")
str(transactions)


# Display column names and their data types
cat("\nColumn names:\n")
sapply(transactions, class)

First 10 rows of the dataset:


Unnamed: 0_level_0,TransactionID,CustomerID,CustomerName,CustomerCity,ProductName,ProductCategory,TotalAmount,Quantity,TransactionDate
Unnamed: 0_level_1,<int>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<int>,<chr>
1,1,81,Customer 39,Chicago,Adidas Jacket,Clothing,632.39,3,2024-03-09
2,2,13,Customer 63,Philadelphia,Samsung TV,Music,114.28,3,2024-12-08
3,3,18,Customer 98,Chicago,Adidas Jacket,Computers,1289.24,7,2024-01-22
4,4,76,Customer 39,Houston,Dell Laptop,Computers,885.4,2,2024-07-02
5,5,86,Customer 45,New York,Nike Shoes,Computers,95.95,5,2024-08-13
6,6,37,Customer 8,Philadelphia,Adidas Jacket,Electronics,1126.34,2,2024-04-15
7,7,45,Customer 83,New York,HP Printer,Clothing,78.71,3,2024-05-02
8,8,11,Customer 60,Chicago,Samsung TV,Music,871.93,3,2024-04-30
9,9,13,Customer 69,Houston,iPhone 14,Music,1347.56,8,2024-08-08
10,10,55,Customer 24,Chicago,Sony Headphones,Books,633.51,1,2024-06-23



Dataset structure:
'data.frame':	500 obs. of  9 variables:
 $ TransactionID  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ CustomerID     : int  81 13 18 76 86 37 45 11 13 55 ...
 $ CustomerName   : chr  "Customer 39" "Customer 63" "Customer 98" "Customer 39" ...
 $ CustomerCity   : chr  "Chicago" "Philadelphia" "Chicago" "Houston" ...
 $ ProductName    : chr  "Adidas Jacket" "Samsung TV" "Adidas Jacket" "Dell Laptop" ...
 $ ProductCategory: chr  "Clothing" "Music" "Computers" "Computers" ...
 $ TotalAmount    : num  632 114 1289 885 96 ...
 $ Quantity       : int  3 3 7 2 5 2 3 3 8 1 ...
 $ TransactionDate: chr  "2024-03-09" "2024-12-08" "2024-01-22" "2024-07-02" ...

Column names:


## Part 2: Column Selection with `select()`

Practice different methods of selecting columns from your dataset.

In [7]:
# Task 2.1: Basic Selection
# Create 'basic_info' with TransactionID, CustomerID, ProductName, and TotalAmount

basic_info <- transactions %>%
  select(TransactionID, CustomerID, ProductName, TotalAmount)


# Display the result
cat("Basic info dataset (first 5 rows):\n")
head(basic_info, 5)

Basic info dataset (first 5 rows):


Unnamed: 0_level_0,TransactionID,CustomerID,ProductName,TotalAmount
Unnamed: 0_level_1,<int>,<int>,<chr>,<dbl>
1,1,81,Adidas Jacket,632.39
2,2,13,Samsung TV,114.28
3,3,18,Adidas Jacket,1289.24
4,4,76,Dell Laptop,885.4
5,5,86,Nike Shoes,95.95


In [8]:
# Task 2.2: Range Selection
# Create 'customer_details' with all columns from CustomerID to CustomerCity (inclusive)

customer_details <- transactions %>%
  select(CustomerID:CustomerCity)


# Display the result
cat("Customer details (first 5 rows):\n")
head(customer_details, 5)

Customer details (first 5 rows):


Unnamed: 0_level_0,CustomerID,CustomerName,CustomerCity
Unnamed: 0_level_1,<int>,<chr>,<chr>
1,81,Customer 39,Chicago
2,13,Customer 63,Philadelphia
3,18,Customer 98,Chicago
4,76,Customer 39,Houston
5,86,Customer 45,New York


In [9]:
# Task 2.3: Pattern-Based Selection

# Create 'date_columns' with columns starting with "Date" or "Time"
date_columns <- transactions %>%
  select(starts_with("date"), starts_with("time"))


# Create 'amount_columns' with columns containing the word "Amount"
amount_columns <- transactions %>%
  select(contains("Amount"))


# Display column names for verification
cat("Date/Time columns:", names(date_columns), "\n")
cat("Amount columns:", names(amount_columns), "\n")

Date/Time columns:  
Amount columns: TotalAmount 


In [10]:
# Task 2.4: Exclusion Selection
# Create 'no_ids' without TransactionID and CustomerID columns

no_ids <- transactions %>%
  select(-TransactionID, -CustomerID)


# Display column names for verification
cat("Columns after removing IDs:", names(no_ids), "\n")
cat("Number of columns:", ncol(no_ids), "\n")

Columns after removing IDs: CustomerName CustomerCity ProductName ProductCategory TotalAmount Quantity TransactionDate 
Number of columns: 7 


## Part 3: Row Filtering with `filter()`

Learn to filter rows based on various conditions.

In [13]:
# Task 3.1: Single Condition Filtering

# Filter transactions with TotalAmount > $100
high_value_transactions <- transactions %>%
  filter(TotalAmount > 100)


# Filter transactions from "Electronics" category 
electronics_transactions <- transactions %>%
  filter(ProductCategory == "Electronics")


# Display results
cat("High value transactions (>$100):", nrow(high_value_transactions), "rows\n")
cat("Electronics transactions:", nrow(electronics_transactions), "rows\n")

High value transactions (>$100): 470 rows
Electronics transactions: 93 rows


In [14]:
# Task 3.2: Multiple Condition Filtering (AND)
# Filter for TotalAmount > $50 AND Quantity > 1 AND CustomerCity == "New York"

ny_bulk_purchases <- transactions %>%
  filter(TotalAmount > 50, CustomerCity == "New York")


# Display results
cat("NY bulk purchases:", nrow(ny_bulk_purchases), "rows\n")
if(nrow(ny_bulk_purchases) > 0) {
  head(ny_bulk_purchases)
}

NY bulk purchases: 90 rows


Unnamed: 0_level_0,TransactionID,CustomerID,CustomerName,CustomerCity,ProductName,ProductCategory,TotalAmount,Quantity,TransactionDate
Unnamed: 0_level_1,<int>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<int>,<chr>
1,5,86,Customer 45,New York,Nike Shoes,Computers,95.95,5,2024-08-13
2,7,45,Customer 83,New York,HP Printer,Clothing,78.71,3,2024-05-02
3,15,1,Customer 52,New York,Nike Shoes,Clothing,602.79,5,2024-01-05
4,17,2,Customer 38,New York,Sony Headphones,Electronics,520.91,1,2024-06-16
5,22,80,Customer 4,New York,iPhone 14,Books,1424.99,7,2024-02-01
6,25,97,Customer 17,New York,iPhone 14,Movies,999.24,2,2024-02-03


In [18]:
# Task 3.3: Multiple Condition Filtering (OR)
# Filter for ProductCategory = "Books" OR "Music" OR "Movies"

entertainment_transactions <- transactions %>%
  filter(ProductCategory %in% c("Books", "Music", "Movies"))


# Display results
cat("Entertainment transactions:", nrow(entertainment_transactions), "rows\n")
if(nrow(entertainment_transactions) > 0) {
  head(entertainment_transactions)
}

Entertainment transactions: 227 rows


Unnamed: 0_level_0,TransactionID,CustomerID,CustomerName,CustomerCity,ProductName,ProductCategory,TotalAmount,Quantity,TransactionDate
Unnamed: 0_level_1,<int>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<int>,<chr>
1,2,13,Customer 63,Philadelphia,Samsung TV,Music,114.28,3,2024-12-08
2,8,11,Customer 60,Chicago,Samsung TV,Music,871.93,3,2024-04-30
3,9,13,Customer 69,Houston,iPhone 14,Music,1347.56,8,2024-08-08
4,10,55,Customer 24,Chicago,Sony Headphones,Books,633.51,1,2024-06-23
5,11,100,Customer 95,Philadelphia,HP Printer,Movies,572.43,6,2024-12-09
6,14,19,Customer 100,Phoenix,Nike Shoes,Books,32.29,3,2024-12-11


In [20]:
# Task 3.4: Date-Based Filtering
# Filter transactions from March 2024
# Note: Adjust the date format and column name based on your actual data

march_transactions <- transactions %>%
    filter(TransactionDate >= as.Date("2024-03-01"), TransactionDate <= as.Date("2024-03-31"))


# Display results
cat("March 2024 transactions:", nrow(march_transactions), "rows\n")

March 2024 transactions: 41 rows


In [23]:
# Task 3.5: Advanced Filtering Challenge
# Find customers who made purchases in both "Electronics" AND "Clothing" categories
# Hint: This requires identifying customers who appear in both categories

# Step 1: Find customers who bought Electronics
electronics_customers <- transactions %>%
  filter(ProductCategory == "Electronics")


# Step 2: Find customers who bought Clothing
clothing_customers <- transactions %>%
  filter(ProductCategory == "Clothing")


# Step 3: Find customers who bought both
both_categories_customers <- inner_join(electronics_customers, clothing_customers)


# Display results
cat("Customers who bought both Electronics and Clothing:", length(both_categories_customers), "customers\n")

[1m[22mJoining with `by = join_by(TransactionID, CustomerID, CustomerName,
CustomerCity, ProductName, ProductCategory, TotalAmount, Quantity,
TransactionDate)`


Customers who bought both Electronics and Clothing: 9 customers


## Part 4: Data Sorting with `arrange()`

Practice sorting data by single and multiple columns.

In [24]:
# Task 4.1: Single Column Sorting

# Sort by TotalAmount ascending
transactions_by_amount_asc <- transactions %>%
  arrange(TotalAmount)


# Sort by TotalAmount descending
transactions_by_amount_desc <- transactions %>%
  arrange(desc(TotalAmount))


# Display top 5 of each
cat("Lowest amounts:\n")
head(transactions_by_amount_asc %>% select(CustomerName, ProductName, TotalAmount), 5)

cat("\nHighest amounts:\n")
head(transactions_by_amount_desc %>% select(CustomerName, ProductName, TotalAmount), 5)

Lowest amounts:


Unnamed: 0_level_0,CustomerName,ProductName,TotalAmount
Unnamed: 0_level_1,<chr>,<chr>,<dbl>
1,Customer 95,Adidas Jacket,27.66
2,Customer 100,Nike Shoes,32.29
3,Customer 50,Adidas Jacket,35.01
4,Customer 83,Samsung TV,36.37
5,Customer 69,Dell Laptop,37.33



Highest amounts:


Unnamed: 0_level_0,CustomerName,ProductName,TotalAmount
Unnamed: 0_level_1,<chr>,<chr>,<dbl>
1,Customer 60,Sony Headphones,1499.52
2,Customer 28,Dell Laptop,1491.96
3,Customer 81,Sony Headphones,1491.62
4,Customer 79,iPhone 14,1488.95
5,Customer 20,HP Printer,1487.44


In [25]:
# Task 4.2: Multiple Column Sorting
# Sort by CustomerCity (ascending), then by TotalAmount (descending)

transactions_by_city_amount <- transactions %>%
  arrange(CustomerCity, (desc(TotalAmount)))


# Display first 10 rows
cat("Transactions sorted by city, then amount:\n")
head(transactions_by_city_amount %>% select(CustomerCity, CustomerName, ProductName, TotalAmount), 10)

Transactions sorted by city, then amount:


Unnamed: 0_level_0,CustomerCity,CustomerName,ProductName,TotalAmount
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>
1,Chicago,Customer 18,iPhone 14,1487.43
2,Chicago,Customer 20,HP Printer,1476.49
3,Chicago,Customer 97,Dell Laptop,1459.42
4,Chicago,Customer 49,Dell Laptop,1453.94
5,Chicago,Customer 70,Nike Shoes,1428.35
6,Chicago,Customer 71,Samsung TV,1424.38
7,Chicago,Customer 62,Sony Headphones,1416.35
8,Chicago,Customer 11,Nike Shoes,1407.51
9,Chicago,Customer 99,HP Printer,1388.73
10,Chicago,Customer 49,Nike Shoes,1370.57


In [28]:
# Task 4.3: Date-Based Sorting
# Sort by TransactionDate chronologically (oldest first)

transactions_chronological <- transactions %>%
  arrange(TransactionDate)


# Display first 5 transactions chronologically
cat("Earliest transactions:\n")
head(transactions_chronological %>% select(TransactionDate, CustomerName, ProductName, TotalAmount), 5)

Earliest transactions:


Unnamed: 0_level_0,TransactionDate,CustomerName,ProductName,TotalAmount
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>
1,2024-01-01,Customer 62,Sony Headphones,1416.35
2,2024-01-01,Customer 99,Sony Headphones,1326.27
3,2024-01-02,Customer 83,Nike Shoes,808.93
4,2024-01-02,Customer 61,Adidas Jacket,502.72
5,2024-01-02,Customer 69,Adidas Jacket,277.7


## Part 5: Chaining Operations

Combine multiple dplyr operations using the pipe operator.

In [31]:
# Task 5.1: Simple Chain
# Filter TotalAmount > $75, select specific columns, arrange by TotalAmount descending

premium_purchases <- transactions %>%
  filter(TotalAmount > 75) %>%
  select(TransactionDate, CustomerName, ProductCategory, ProductName, TotalAmount) %>%
  arrange(desc(TotalAmount))


# Display results
cat("Premium purchases (>$75):\n")
head(premium_purchases, 10)

Premium purchases (>$75):


Unnamed: 0_level_0,TransactionDate,CustomerName,ProductCategory,ProductName,TotalAmount
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<dbl>
1,2024-03-03,Customer 60,Electronics,Sony Headphones,1499.52
2,2024-08-15,Customer 28,Computers,Dell Laptop,1491.96
3,2024-10-18,Customer 81,Books,Sony Headphones,1491.62
4,2024-10-23,Customer 79,Computers,iPhone 14,1488.95
5,2024-05-02,Customer 20,Electronics,HP Printer,1487.44
6,2024-07-15,Customer 18,Computers,iPhone 14,1487.43
7,2024-12-24,Customer 83,Movies,iPhone 14,1484.31
8,2024-05-29,Customer 20,Electronics,HP Printer,1476.49
9,2024-04-14,Customer 10,Books,Dell Laptop,1473.27
10,2024-11-16,Customer 14,Computers,iPhone 14,1471.59


In [32]:
# Task 5.2: Complex Chain
# Filter for Electronics/Computers, select columns, arrange by date/amount, keep top 20

recent_tech_purchases <- transactions %>%
  filter(ProductCategory %in% c("Electronics", "Computers")) %>%
  select(TransactionDate, TotalAmount, CustomerName, ProductCategory, ProductName) %>%
  arrange(TransactionDate, TotalAmount) %>%
  head(20)



# Display results
cat("Recent tech purchases (top 20):\n")
print(recent_tech_purchases)

Recent tech purchases (top 20):
   TransactionDate TotalAmount CustomerName ProductCategory     ProductName
1       2024-01-01     1326.27  Customer 99     Electronics Sony Headphones
2       2024-01-01     1416.35  Customer 62     Electronics Sony Headphones
3       2024-01-02      277.70  Customer 69     Electronics   Adidas Jacket
4       2024-01-03      463.25   Customer 2     Electronics      HP Printer
5       2024-01-08      185.86  Customer 56       Computers      HP Printer
6       2024-01-08      462.73  Customer 76       Computers Sony Headphones
7       2024-01-12      997.09  Customer 76       Computers Sony Headphones
8       2024-01-14       96.83  Customer 98     Electronics     Dell Laptop
9       2024-01-16      216.36  Customer 44     Electronics   Adidas Jacket
10      2024-01-16     1011.00  Customer 28       Computers   Adidas Jacket
11      2024-01-17      584.87   Customer 1     Electronics Sony Headphones
12      2024-01-18     1136.57  Customer 42       Comput

In [35]:
# Task 5.3: Business Intelligence Chain
# Identify high-value repeat customers (TotalAmount > $200)

high_value_customers <- transactions %>%
  filter(TotalAmount > 200) %>%
  select(CustomerName, TotalAmount, ProductCategory, TransactionDate) %>%
  arrange(CustomerName, TransactionDate)


# Display results
cat("High-value customers:\n")
head(high_value_customers, 15)

High-value customers:


Unnamed: 0_level_0,CustomerName,TotalAmount,ProductCategory,TransactionDate
Unnamed: 0_level_1,<chr>,<dbl>,<chr>,<chr>
1,Customer 1,584.87,Electronics,2024-01-17
2,Customer 1,259.44,Music,2024-04-27
3,Customer 1,834.08,Computers,2024-07-24
4,Customer 1,1132.28,Movies,2024-10-02
5,Customer 1,347.76,Books,2024-11-05
6,Customer 1,1075.9,Movies,2024-12-21
7,Customer 1,1004.78,Clothing,2024-12-22
8,Customer 10,488.05,Electronics,2024-02-12
9,Customer 10,1473.27,Books,2024-04-14
10,Customer 10,769.09,Clothing,2024-09-18


## Part 6: Data Analysis Questions

Answer the following questions using the datasets you've created.

In [36]:
# Question 6.1: Transaction Volume
# Count transactions in each filtered dataset

cat("Transaction counts by dataset:\n")
cat("High value transactions:", nrow(high_value_transactions), "\n")
cat("Electronics transactions:", nrow(electronics_transactions), "\n")
cat("NY bulk purchases:", nrow(ny_bulk_purchases), "\n")
cat("Entertainment transactions:", nrow(entertainment_transactions), "\n")
cat("March transactions:", nrow(march_transactions), "\n")
cat("Premium purchases:", nrow(premium_purchases), "\n")
cat("Recent tech purchases:", nrow(recent_tech_purchases), "\n")
cat("High value customers:", nrow(high_value_customers), "\n")

Transaction counts by dataset:
High value transactions: 470 
Electronics transactions: 93 
NY bulk purchases: 90 
Entertainment transactions: 227 
March transactions: 41 
Premium purchases: 483 
Recent tech purchases: 20 


High value customers: 434 


In [42]:
# Question 6.2: Top Customers
# Find the customer who appears most frequently in high_value_customers

if(nrow(high_value_customers) > 0) {
  customer_frequency <- high_value_customers %>%
    count(high_value_customers$CustomerName)
  
  
  cat("Most frequent high-value customer:\n")
  print(customer_frequency)
} else {
  cat("No high-value customers found\n")
}

Most frequent high-value customer:
   high_value_customers$CustomerName  n
1                         Customer 1  7
2                        Customer 10  3
3                       Customer 100  2
4                        Customer 11  8
5                        Customer 12  3
6                        Customer 13  5
7                        Customer 14  4
8                        Customer 15  6
9                        Customer 16  4
10                       Customer 17  5
11                       Customer 18 12
12                       Customer 19  1
13                        Customer 2  3
14                       Customer 20  7
15                       Customer 21  6
16                       Customer 22  4
17                       Customer 23  6
18                       Customer 24  4
19                       Customer 25 10
20                       Customer 26  5
21                       Customer 27  5
22                       Customer 28  3
23                       Customer 29  4
24   

In [45]:
# Question 6.3: Product Analysis
# Find top 5 most expensive transactions in entertainment_transactions

if(nrow(entertainment_transactions) > 0) {
  top_entertainment <- entertainment_transactions %>%
    arrange(desc(TotalAmount)) %>%
    select(TotalAmount, Quantity, CustomerName, ProductCategory, TransactionDate) %>%
    head(5)

  
  
  cat("Top 5 most expensive entertainment transactions:\n")
  print(top_entertainment)
} else {
  cat("No entertainment transactions found\n")
}

Top 5 most expensive entertainment transactions:
  TotalAmount Quantity CustomerName ProductCategory TransactionDate
1     1491.62        3  Customer 81           Books      2024-10-18
2     1484.31        3  Customer 83          Movies      2024-12-24
3     1473.27        1  Customer 10           Books      2024-04-14
4     1459.42        3  Customer 97           Books      2024-10-28
5     1456.72        2  Customer 47           Books      2024-02-02


In [48]:
# Question 6.4: Geographic Analysis
# Find the city with the highest single transaction amount

highest_transaction_by_city <- transactions_by_city_amount %>%
  arrange(desc(TotalAmount)) %>%
  select(CustomerCity, TotalAmount, CustomerName, ProductCategory, TransactionDate) %>%
  head(1)


cat("City with highest single transaction:\n")
print(highest_transaction_by_city)

City with highest single transaction:
  CustomerCity TotalAmount CustomerName ProductCategory TransactionDate
1     New York     1499.52  Customer 60     Electronics      2024-03-03


## Part 7: Reflection Questions

Please answer the following questions in the markdown cells below.

### Question 7.1: Pipe Operator Benefits

**How does using the pipe operator (`%>%`) improve code readability compared to nested function calls? Provide a specific example from your homework.**

Your answer here: In task 5.2 Complex Chain, there was a lot of filtering, arranging, and selecting columns If I had to write that out in one line. It would have gotten out of hand really quickly. I made a typo mistake while writing that code, and it was a lot easier to find using the pipe operator. If I had to look at one line of code. It would not have been very fun to find it. In short it helps me and whoever comes after me to read the code easier and understand whats going on.


### Question 7.2: Filtering Strategy

**When filtering data for business analysis, what are the trade-offs between being very specific (many conditions) versus being more general (fewer conditions)? How might this affect your insights?**

Your answer here: It all depends on what I am trying to accomplish. If I narrow the data too much. There is a good chance I may miss out on key data or patterns somewhere else; however, that also doesn't mean to use all the data and cluttering everything up. I think its important to plan out before hand what the scope of the analysis is and what should be included or excluded. 


### Question 7.3: Sorting Importance

**Why is data sorting important in business analytics? Provide three specific business scenarios where sorting data would be crucial for decision-making.**

Your answer here:

1. Sorting transaction dates by ascending and counting the amount of transactions. Might help a grocery store find out their busiest times of the month, and if its a recurring theme through out the months. It would help ensure store is properly stocked.
2. A company looking for their top 5 most sold products, sales employee, cities, or customers.
3. A company looking to find their top operational expenses either though the years, or their biggest investment into an area.

### Question 7.4: Real-World Application

**Describe a real business scenario where you might need to combine `select()`, `filter()`, and `arrange()` operations. What insights would you be trying to gain?**

Your answer here: A company is looking to find out what the top operation expenses are for the year at different areas/cities that are at least 500$. I would select OpExpenseCategories, OpExpenseSubCategories, ExpenseAmount, City, LocationNameID, TransactionID, and TransactionDate. I would filter the ExpenseAmount to be greater than $500. Then I would arrange the data to be TransactionDate ascending, ExpenseAmount Ascending, City Ascending, LocationNameID ascending. This should help to find the bigger expenses neatly and see what each city to location lowest to biggest expenses are.


## Summary and Submission

### What You've Learned

In this homework, you've practiced:
- Using `select()` for column selection with various methods
- Using `filter()` for row filtering with single and multiple conditions
- Using `arrange()` for sorting data by single and multiple columns
- Chaining operations with the pipe operator (`%>%`)
- Analyzing business data to generate insights

### Submission Checklist

Before submitting, ensure you have:
- [ ] Completed all code tasks
- [ ] Run all cells successfully
- [ ] Answered all reflection questions
- [ ] Used proper commenting in your code
- [ ] Used the pipe operator where appropriate
- [ ] Verified your results make sense

### Next Steps

In the next lesson, you'll learn about:
- `mutate()` for creating new columns
- `summarize()` for calculating summary statistics
- `group_by()` for grouped operations
- Advanced data transformation techniques


🚀 Ready to Submit?
Easy Submission Steps (No Command Line Required!):
Save this notebook (Ctrl+S or File → Save)

Use VS Code Source Control:

Click the Source Control icon in the left sidebar (tree branch symbol)
Click the "+" button next to your notebook file
Type a message: Submit homework 1 - [Your Name]
Click "Commit"
Click "Sync Changes" or "Push"
Verify on GitHub: Go to your repository online and confirm your notebook appears with your completed work

📖 Need help? See GITHUB_CLASSROOM_SUBMISSION.md for detailed instructions.

