# Homework Assignment - Lesson 7: String Manipulation and Date/Time Data

**Student Name:** Carson Sloan

**Student ID:** 02149749

**Date Submitted:** 10/12/25

**Due Date:** 10/12/25

---

## Objective

Master string manipulation with `stringr` and date/time operations with `lubridate` for real-world business data cleaning and analysis.

## Learning Goals

By completing this assignment, you will:
- Clean and standardize messy text data using `stringr` functions
- Parse and manipulate dates using `lubridate` functions
- Extract information from text and dates for business insights
- Combine string and date operations for customer segmentation
- Create business-ready reports from raw data

## Instructions

- Complete all tasks in this notebook
- Write your code in the designated TODO sections
- Use the pipe operator (`%>%`) wherever possible
- Add comments explaining your logic
- Run all cells to verify your code works
- Answer all reflection questions

## Datasets

You will work with three CSV files:
- `customer_feedback.csv` - Customer reviews with messy text
- `transaction_log.csv` - Transaction records with dates
- `product_catalog.csv` - Product descriptions needing standardization

---

## Part 1: Data Import and Initial Exploration

**Business Context:** Before cleaning data, you must understand its structure and quality issues.

**Your Tasks:**
1. Load required packages (`tidyverse` and `lubridate`)
2. Import all three CSV files from the `data/` directory
3. Examine the structure and identify data quality issues
4. Display sample rows to understand the data

In [1]:
# Task 1.1: Load Required Packages
# TODO: Load tidyverse (includes stringr)
library(tidyverse)

# TODO: Load lubridate
library(lubridate)

cat("‚úÖ Packages loaded successfully!\n")

‚îÄ‚îÄ [1mAttaching core tidyverse packages[22m ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ tidyverse 2.0.0 ‚îÄ‚îÄ
[32m‚úî[39m [34mdplyr    [39m 1.1.4     [32m‚úî[39m [34mreadr    [39m 2.1.5
[32m‚úî[39m [34mforcats  [39m 1.0.0     [32m‚úî[39m [34mstringr  [39m 1.5.2
[32m‚úî[39m [34mggplot2  [39m 4.0.0     [32m‚úî[39m [34mtibble   [39m 3.3.0
[32m‚úî[39m [34mlubridate[39m 1.9.4     [32m‚úî[39m [34mtidyr    [39m 1.3.1
[32m‚úî[39m [34mpurrr    [39m 1.1.0     
‚îÄ‚îÄ [1mConflicts[22m ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ tidyverse_conflicts() ‚îÄ‚îÄ
[31m‚úñ[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m‚úñ[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36m‚Ñπ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become erro

‚úÖ Packages loaded successfully!


In [3]:
# Task 1.2: Import Datasets

# Load tidyverse if not already loaded
library(tidyverse)

# TODO: Import customer_feedback.csv into a variable called 'feedback'
feedback <- read_csv("/workspaces/Carsons-stuff/data/customer_feedback.csv")

# TODO: Import transaction_log.csv into a variable called 'transactions'
transactions <- read_csv("/workspaces/Carsons-stuff/data/retail_transactions.csv")

# TODO: Import product_catalog.csv into a variable called 'products'
products <- read_csv("/workspaces/Carsons-stuff/data/product_catalog.csv")

# Print confirmation and row counts
cat("‚úÖ Data imported successfully!\n")
cat("Feedback rows:", nrow(feedback), "\n")
cat("Transaction rows:", nrow(transactions), "\n")
cat("Product rows:", nrow(products), "\n")


[1mRows: [22m[34m100[39m [1mColumns: [22m[34m5[39m
[36m‚îÄ‚îÄ[39m [1mColumn specification[22m [36m‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ[39m
[1mDelimiter:[22m ","
[31mchr[39m  (2): Feedback_Text, Contact_Info
[32mdbl[39m  (2): FeedbackID, CustomerID
[34mdate[39m (1): Feedback_Date

[36m‚Ñπ[39m Use `spec()` to retrieve the full column specification for this data.
[36m‚Ñπ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1mRows: [22m[34m500[39m [1mColumns: [22m[34m9[39m


[36m‚îÄ‚îÄ[39m [1mColumn specification[22m [36m‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ[39m
[1mDelimiter:[22m ","
[31mchr[39m  (4): CustomerName, CustomerCity, ProductName, ProductCategory
[32mdbl[39m  (4): TransactionID, CustomerID, TotalAmount, Quantity
[34mdate[39m (1): TransactionDate

[36m‚Ñπ[39m Use `spec()` to retrieve the full column specification for this data.
[36m‚Ñπ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1mRows: [22m[34m75[39m [1mColumns: [22m[34m5[39m
[36m‚îÄ‚îÄ[39m [1mColumn specification[22m [36m‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ[39m
[1mDelimiter:[22m ","
[31mchr[39m (3): Product_Description, Category, In_Stock
[32mdbl[39m (2): Pro

‚úÖ Data imported successfully!
Feedback rows: 100 
Transaction rows: 500 
Product rows: 75 


In [4]:
# Task 1.3: Initial Data Exploration

cat("=== CUSTOMER FEEDBACK DATA ===\n")
# TODO: Display structure of feedback using str()
str(feedback)

# TODO: Display first 5 rows of feedback
head(feedback, 5)

cat("\n=== TRANSACTION DATA ===\n")
# TODO: Display structure of transactions
str(transactions)

# TODO: Display first 5 rows of transactions
head(transactions, 5)

cat("\n=== PRODUCT CATALOG DATA ===\n")
# TODO: Display structure of products
str(products)

# TODO: Display first 5 rows of products
head(products, 5)



=== CUSTOMER FEEDBACK DATA ===
spc_tbl_ [100 √ó 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ FeedbackID   : num [1:100] 1 2 3 4 5 6 7 8 9 10 ...
 $ CustomerID   : num [1:100] 12 40 34 1 47 13 13 37 49 23 ...
 $ Feedback_Text: chr [1:100] "Highly recommend this item" "Excellent service" "Poor quality control" "average product, nothing special" ...
 $ Contact_Info : chr [1:100] "bob.wilson@test.org" "555-123-4567" "jane_smith@company.com" "jane_smith@company.com" ...
 $ Feedback_Date: Date[1:100], format: "2024-02-23" "2024-01-21" ...
 - attr(*, "spec")=
  .. cols(
  ..   FeedbackID = [32mcol_double()[39m,
  ..   CustomerID = [32mcol_double()[39m,
  ..   Feedback_Text = [31mcol_character()[39m,
  ..   Contact_Info = [31mcol_character()[39m,
  ..   Feedback_Date = [34mcol_date(format = "")[39m
  .. )
 - attr(*, "problems")=<externalptr> 


FeedbackID,CustomerID,Feedback_Text,Contact_Info,Feedback_Date
<dbl>,<dbl>,<chr>,<chr>,<date>
1,12,Highly recommend this item,bob.wilson@test.org,2024-02-23
2,40,Excellent service,555-123-4567,2024-01-21
3,34,Poor quality control,jane_smith@company.com,2023-09-02
4,1,"average product, nothing special",jane_smith@company.com,2023-08-21
5,47,AMAZING customer support!!!,555-123-4567,2023-04-24



=== TRANSACTION DATA ===
spc_tbl_ [500 √ó 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ TransactionID  : num [1:500] 1 2 3 4 5 6 7 8 9 10 ...
 $ CustomerID     : num [1:500] 81 13 18 76 86 37 45 11 13 55 ...
 $ CustomerName   : chr [1:500] "Customer 39" "Customer 63" "Customer 98" "Customer 39" ...
 $ CustomerCity   : chr [1:500] "Chicago" "Philadelphia" "Chicago" "Houston" ...
 $ ProductName    : chr [1:500] "Adidas Jacket" "Samsung TV" "Adidas Jacket" "Dell Laptop" ...
 $ ProductCategory: chr [1:500] "Clothing" "Music" "Computers" "Computers" ...
 $ TotalAmount    : num [1:500] 632 114 1289 885 96 ...
 $ Quantity       : num [1:500] 3 3 7 2 5 2 3 3 8 1 ...
 $ TransactionDate: Date[1:500], format: "2024-03-09" "2024-12-08" ...
 - attr(*, "spec")=
  .. cols(
  ..   TransactionID = [32mcol_double()[39m,
  ..   CustomerID = [32mcol_double()[39m,
  ..   CustomerName = [31mcol_character()[39m,
  ..   CustomerCity = [31mcol_character()[39m,
  ..   ProductName = [31mcol_character()

TransactionID,CustomerID,CustomerName,CustomerCity,ProductName,ProductCategory,TotalAmount,Quantity,TransactionDate
<dbl>,<dbl>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<date>
1,81,Customer 39,Chicago,Adidas Jacket,Clothing,632.39,3,2024-03-09
2,13,Customer 63,Philadelphia,Samsung TV,Music,114.28,3,2024-12-08
3,18,Customer 98,Chicago,Adidas Jacket,Computers,1289.24,7,2024-01-22
4,76,Customer 39,Houston,Dell Laptop,Computers,885.4,2,2024-07-02
5,86,Customer 45,New York,Nike Shoes,Computers,95.95,5,2024-08-13



=== PRODUCT CATALOG DATA ===
spc_tbl_ [75 √ó 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ ProductID          : num [1:75] 1 2 3 4 5 6 7 8 9 10 ...
 $ Product_Description: chr [1:75] "Apple iPhone 14 Pro - 128GB - Space Black" "samsung galaxy s23 ultra 256gb" "Apple iPhone 14 Pro - 128GB - Space Black" "Apple iPhone 14 Pro - 128GB - Space Black" ...
 $ Category           : chr [1:75] "TV" "TV" "Audio" "Shoes" ...
 $ Price              : num [1:75] 964 1817 853 649 586 ...
 $ In_Stock           : chr [1:75] "Limited" "Yes" "Yes" "Yes" ...
 - attr(*, "spec")=
  .. cols(
  ..   ProductID = [32mcol_double()[39m,
  ..   Product_Description = [31mcol_character()[39m,
  ..   Category = [31mcol_character()[39m,
  ..   Price = [32mcol_double()[39m,
  ..   In_Stock = [31mcol_character()[39m
  .. )
 - attr(*, "problems")=<externalptr> 


ProductID,Product_Description,Category,Price,In_Stock
<dbl>,<chr>,<chr>,<dbl>,<chr>
1,Apple iPhone 14 Pro - 128GB - Space Black,TV,963.53,Limited
2,samsung galaxy s23 ultra 256gb,TV,1817.44,Yes
3,Apple iPhone 14 Pro - 128GB - Space Black,Audio,852.79,Yes
4,Apple iPhone 14 Pro - 128GB - Space Black,Shoes,648.58,Yes
5,samsung galaxy s23 ultra 256gb,Electronics,586.35,Limited


## Part 2: String Cleaning and Standardization

**Business Context:** Product names and feedback text often have inconsistent formatting that prevents accurate analysis.

**Your Tasks:**
1. Clean product names (remove extra spaces, standardize case)
2. Standardize product categories
3. Clean customer feedback text
4. Extract customer names from feedback

**Key Functions:** `str_trim()`, `str_squish()`, `str_to_lower()`, `str_to_upper()`, `str_to_title()`

In [7]:
# Task 2.1: Clean Product Names

library(stringr)  # for string manipulation functions

products_clean <- products %>%
  mutate(
    # Remove extra spaces and convert to Title Case
    product_name_clean = str_to_title(str_trim(Product_Description))
  )

# Display before and after
cat("Product Name Cleaning Results:\n")
products_clean %>%
  select(Product_Description, product_name_clean) %>%
  head(10) %>%
  print()


Product Name Cleaning Results:
[90m# A tibble: 10 √ó 2[39m
   Product_Description                         product_name_clean               
   [3m[90m<chr>[39m[23m                                       [3m[90m<chr>[39m[23m                            
[90m 1[39m [90m"[39mApple iPhone 14 Pro - 128GB - Space Black[90m"[39m [90m"[39mApple Iphone 14 Pro - 128gb - S‚Ä¶
[90m 2[39m [90m"[39msamsung galaxy s23 ultra 256gb[90m"[39m            [90m"[39mSamsung Galaxy S23 Ultra 256gb[90m"[39m 
[90m 3[39m [90m"[39mApple iPhone 14 Pro - 128GB - Space Black[90m"[39m [90m"[39mApple Iphone 14 Pro - 128gb - S‚Ä¶
[90m 4[39m [90m"[39mApple iPhone 14 Pro - 128GB - Space Black[90m"[39m [90m"[39mApple Iphone 14 Pro - 128gb - S‚Ä¶
[90m 5[39m [90m"[39msamsung galaxy s23 ultra 256gb[90m"[39m            [90m"[39mSamsung Galaxy S23 Ultra 256gb[90m"[39m 
[90m 6[39m [90m"[39mApple iPhone 14 Pro - 128GB - Space Black[90m"[39m [90m"[39mApple Iphone 14 Pro 

In [8]:
# Task 2.2: Standardize Product Categories

library(stringr)  # for string manipulation

products_clean <- products_clean %>%
  mutate(
    # Remove extra spaces and convert to Title Case
    category_clean = str_to_title(str_trim(Category))
  )

# Show unique categories before and after
cat("Original categories:\n")
print(unique(products$Category))

cat("\nCleaned categories:\n")
print(unique(products_clean$category_clean))


Original categories:
[1] "TV"          "Audio"       "Shoes"       "Electronics" "Computers"  

Cleaned categories:
[1] "Tv"          "Audio"       "Shoes"       "Electronics" "Computers"  


In [12]:
# Task 2.3: Clean Customer Feedback Text

library(stringr)

feedback_clean <- feedback %>%
  mutate(
    # Convert text to lowercase and remove extra whitespace
    feedback_clean = str_squish(str_to_lower(Feedback_Text))
  )

# Display sample
cat("Feedback Cleaning Sample:\n")
feedback_clean %>%
  select(Feedback_Text, feedback_clean) %>%
  head(5) %>%
  print()



Feedback Cleaning Sample:
[90m# A tibble: 5 √ó 2[39m
  Feedback_Text                    feedback_clean                  
  [3m[90m<chr>[39m[23m                            [3m[90m<chr>[39m[23m                           
[90m1[39m Highly recommend this item       highly recommend this item      
[90m2[39m Excellent service                excellent service               
[90m3[39m Poor quality control             poor quality control            
[90m4[39m average product, nothing special average product, nothing special
[90m5[39m AMAZING customer support!!!      amazing customer support!!!     


## Part 3: Pattern Detection and Extraction

**Business Context:** Identifying products with specific features and extracting specifications helps with inventory management and marketing.

**Your Tasks:**
1. Identify products with specific keywords (wireless, premium, gaming)
2. Extract numerical specifications from product names
3. Detect sentiment words in customer feedback
4. Extract email addresses from feedback

**Key Functions:** `str_detect()`, `str_extract()`, `str_count()`

In [13]:
# Task 3.1: Detect Product Features

library(stringr)

products_clean <- products_clean %>%
  mutate(
    # Create feature flags (case-insensitive detection)
    is_wireless = str_detect(str_to_lower(product_name_clean), "wireless"),
    is_premium  = str_detect(str_to_lower(product_name_clean), "pro|premium|deluxe"),
    is_gaming   = str_detect(str_to_lower(product_name_clean), "gaming|gamer")
  )

# Display results
cat("Product Feature Detection:\n")
products_clean %>%
  select(product_name_clean, is_wireless, is_premium, is_gaming) %>%
  head(10) %>%
  print()

# Summary statistics
cat("\nFeature Summary:\n")
cat("Wireless products:", sum(products_clean$is_wireless), "\n")
cat("Premium products:", sum(products_clean$is_premium), "\n")
cat("Gaming products:", sum(products_clean$is_gaming), "\n")


Product Feature Detection:
[90m# A tibble: 10 √ó 4[39m
   product_name_clean                          is_wireless is_premium is_gaming
   [3m[90m<chr>[39m[23m                                       [3m[90m<lgl>[39m[23m       [3m[90m<lgl>[39m[23m      [3m[90m<lgl>[39m[23m    
[90m 1[39m [90m"[39mApple Iphone 14 Pro - 128gb - Space Black[90m"[39m FALSE       TRUE       FALSE    
[90m 2[39m [90m"[39mSamsung Galaxy S23 Ultra 256gb[90m"[39m            FALSE       FALSE      FALSE    
[90m 3[39m [90m"[39mApple Iphone 14 Pro - 128gb - Space Black[90m"[39m FALSE       TRUE       FALSE    
[90m 4[39m [90m"[39mApple Iphone 14 Pro - 128gb - Space Black[90m"[39m FALSE       TRUE       FALSE    
[90m 5[39m [90m"[39mSamsung Galaxy S23 Ultra 256gb[90m"[39m            FALSE       FALSE      FALSE    
[90m 6[39m [90m"[39mApple Iphone 14 Pro - 128gb - Space Black[90m"[39m FALSE       TRUE       FALSE    
[90m 7[39m [90m"[39mDell Xps 13 Laptop - In

In [16]:
# Task 3.2: Extract Product Specifications
# TODO: Create a new column 'size_number' that extracts the first number from product_name
# Hint: Use str_extract() with pattern "\\d+" to match one or more digits

products_clean <- products_clean %>%
  mutate(
    # Your code here:
    size_number = str_extract(Product_Description, "\\d+")
  )

# Display products with extracted sizes
cat("Extracted Product Specifications:\n")
products_clean %>%
  filter(!is.na(size_number)) %>%
  select(product_name_clean, size_number) %>%
  head(10) %>%
  print()

Extracted Product Specifications:
[90m# A tibble: 10 √ó 2[39m
   product_name_clean                          size_number
   [3m[90m<chr>[39m[23m                                       [3m[90m<chr>[39m[23m      
[90m 1[39m [90m"[39mApple Iphone 14 Pro - 128gb - Space Black[90m"[39m 14         
[90m 2[39m [90m"[39mSamsung Galaxy S23 Ultra 256gb[90m"[39m            23         
[90m 3[39m [90m"[39mApple Iphone 14 Pro - 128gb - Space Black[90m"[39m 14         
[90m 4[39m [90m"[39mApple Iphone 14 Pro - 128gb - Space Black[90m"[39m 14         
[90m 5[39m [90m"[39mSamsung Galaxy S23 Ultra 256gb[90m"[39m            23         
[90m 6[39m [90m"[39mApple Iphone 14 Pro - 128gb - Space Black[90m"[39m 14         
[90m 7[39m [90m"[39mDell Xps 13 Laptop - Intel I7 - 16gb Ram[90m"[39m  13         
[90m 8[39m [90m"[39mNike Air Max 270 - Size 10 - Black/White[90m"[39m  270        
[90m 9[39m [90m"[39mLg 55\" 4k Smart Tv - Oled Display[90m"[39

In [17]:
# Task 3.3: Simple Sentiment Analysis
# TODO: Create three new columns:
#   - positive_words: count of positive words ("great", "excellent", "love", "amazing")
#   - negative_words: count of negative words ("bad", "terrible", "hate", "awful")
#   - sentiment_score: positive_words - negative_words
# Hint: Use str_count() to count pattern occurrences

feedback_clean <- feedback_clean %>%
  mutate(
    # Your code here:
    positive_words = str_count(feedback_clean, "great|excellent|love|amazing"),
    negative_words = str_count(feedback_clean, "bad|terrible|hate|awful"),
    sentiment_score = positive_words - negative_words
  )

# Display sentiment analysis results
cat("Sentiment Analysis Results:\n")
feedback_clean %>%
  select(feedback_clean, positive_words, negative_words, sentiment_score) %>%
  head(10) %>%
  print()

# Summary
cat("\nOverall Sentiment Summary:\n")
cat("Average sentiment score:", mean(feedback_clean$sentiment_score), "\n")
cat("Positive reviews:", sum(feedback_clean$sentiment_score > 0), "\n")
cat("Negative reviews:", sum(feedback_clean$sentiment_score < 0), "\n")

Sentiment Analysis Results:
[90m# A tibble: 10 √ó 4[39m
   feedback_clean                  positive_words negative_words sentiment_score
   [3m[90m<chr>[39m[23m                                    [3m[90m<int>[39m[23m          [3m[90m<int>[39m[23m           [3m[90m<int>[39m[23m
[90m 1[39m highly recommend this item                   0              0               0
[90m 2[39m excellent service                            1              0               1
[90m 3[39m poor quality control                         0              0               0
[90m 4[39m average product, nothing speci‚Ä¶              0              0               0
[90m 5[39m amazing customer support!!!                  1              0               1
[90m 6[39m amazing customer support!!!                  1              0               1
[90m 7[39m average product, nothing speci‚Ä¶              0              0               0
[90m 8[39m good value for money                         0    

## Part 4: Date Parsing and Component Extraction

**Business Context:** Transaction dates need to be parsed and analyzed to understand customer behavior patterns.

**Your Tasks:**
1. Parse transaction dates from text to Date objects
2. Extract date components (year, month, day, weekday)
3. Identify weekend vs weekday transactions
4. Extract quarter and month names

**Key Functions:** `ymd()`, `mdy()`, `dmy()`, `year()`, `month()`, `day()`, `wday()`, `quarter()`

In [21]:
# Task 4.1: Parse Transaction Dates
# TODO: Create a new column 'date_parsed' that parses the transaction_date column
# Hint: Check the format of transaction_date first, then use ymd(), mdy(), or dmy()

transactions_clean <- transactions %>%
  mutate(
    # Your code here:
    date_parsed = ymd(TransactionDate)
  )

# Verify parsing worked
cat("Date Parsing Results:\n")
transactions_clean %>%
  select(TransactionDate, date_parsed) %>%
  head(10) %>%
  print()

Date Parsing Results:
[90m# A tibble: 10 √ó 2[39m
   TransactionDate date_parsed
   [3m[90m<date>[39m[23m          [3m[90m<date>[39m[23m     
[90m 1[39m 2024-03-09      2024-03-09 
[90m 2[39m 2024-12-08      2024-12-08 
[90m 3[39m 2024-01-22      2024-01-22 
[90m 4[39m 2024-07-02      2024-07-02 
[90m 5[39m 2024-08-13      2024-08-13 
[90m 6[39m 2024-04-15      2024-04-15 
[90m 7[39m 2024-05-02      2024-05-02 
[90m 8[39m 2024-04-30      2024-04-30 
[90m 9[39m 2024-08-08      2024-08-08 
[90m10[39m 2024-06-23      2024-06-23 


In [22]:
# Task 4.2: Extract Date Components
# TODO: Create the following new columns:
#   - trans_year: Extract year from date_parsed
#   - trans_month: Extract month number from date_parsed
#   - trans_month_name: Extract month name (use label=TRUE, abbr=FALSE)
#   - trans_day: Extract day of month from date_parsed
#   - trans_weekday: Extract weekday name (use label=TRUE, abbr=FALSE)
#   - trans_quarter: Extract quarter from date_parsed

transactions_clean <- transactions_clean %>%
  mutate(
    # Your code here:
    trans_year = year(date_parsed),
    trans_month = month(date_parsed),
    trans_month_name = month(date_parsed, label = TRUE, abbr = FALSE),
    trans_day = day(date_parsed),
    trans_weekday = wday(date_parsed, label = TRUE, abbr = FALSE),
    trans_quarter = quarter(date_parsed)
  )

# Display results
cat("Date Component Extraction:\n")
transactions_clean %>%
  select(date_parsed, trans_month_name, trans_weekday, trans_quarter) %>%
  head(10) %>%
  print()

Date Component Extraction:
[90m# A tibble: 10 √ó 4[39m
   date_parsed trans_month_name trans_weekday trans_quarter
   [3m[90m<date>[39m[23m      [3m[90m<ord>[39m[23m            [3m[90m<ord>[39m[23m                 [3m[90m<int>[39m[23m
[90m 1[39m 2024-03-09  March            Saturday                  1
[90m 2[39m 2024-12-08  December         Sunday                    4
[90m 3[39m 2024-01-22  January          Monday                    1
[90m 4[39m 2024-07-02  July             Tuesday                   3
[90m 5[39m 2024-08-13  August           Tuesday                   3
[90m 6[39m 2024-04-15  April            Monday                    2
[90m 7[39m 2024-05-02  May              Thursday                  2
[90m 8[39m 2024-04-30  April            Tuesday                   2
[90m 9[39m 2024-08-08  August           Thursday                  3
[90m10[39m 2024-06-23  June             Sunday                    2


In [23]:
# Task 4.3: Identify Weekend Transactions
# TODO: Create a new column 'is_weekend' that is TRUE if the transaction was on Saturday or Sunday
# Hint: Use wday() which returns 1 for Sunday and 7 for Saturday
# Hint: Use %in% c(1, 7) to check if day is weekend

transactions_clean <- transactions_clean %>%
  mutate(
    # Your code here:
    is_weekend = wday(date_parsed) %in% c(1, 7)
  )

# Summary
cat("Weekend vs Weekday Transactions:\n")
table(transactions_clean$is_weekend) %>% print()

cat("\nPercentage of weekend transactions:",
    round(sum(transactions_clean$is_weekend) / nrow(transactions_clean) * 100, 1), "%\n")

Weekend vs Weekday Transactions:

FALSE  TRUE 
  363   137 

Percentage of weekend transactions: 27.4 %


## Part 5: Date Calculations and Customer Recency Analysis

**Business Context:** Understanding how recently customers transacted helps identify at-risk customers for re-engagement campaigns.

**Your Tasks:**
1. Calculate days since each transaction
2. Categorize customers by recency (Recent, Moderate, Old)
3. Identify customers who haven't transacted in 90+ days
4. Calculate average days between transactions per customer

**Key Functions:** `today()`, date arithmetic, `case_when()`

In [25]:
# Task 5.1: Calculate Days Since Transaction
# TODO: Create a new column 'days_since' that calculates days from date_parsed to today()
# Hint: Use as.numeric(today() - date_parsed)

transactions_clean <- transactions_clean %>%
  mutate(
    # Your code here:
    days_since = as.numeric(today() - date_parsed)
  )

# Display results
cat("Days Since Transaction:\n")
transactions_clean %>%
  select(CustomerName, date_parsed, days_since) %>%
  arrange(desc(days_since)) %>%
  head(10) %>%
  print()

Days Since Transaction:
[90m# A tibble: 10 √ó 3[39m
   CustomerName date_parsed days_since
   [3m[90m<chr>[39m[23m        [3m[90m<date>[39m[23m           [3m[90m<dbl>[39m[23m
[90m 1[39m Customer 62  2024-01-01         650
[90m 2[39m Customer 99  2024-01-01         650
[90m 3[39m Customer 83  2024-01-02         649
[90m 4[39m Customer 61  2024-01-02         649
[90m 5[39m Customer 69  2024-01-02         649
[90m 6[39m Customer 63  2024-01-03         648
[90m 7[39m Customer 2   2024-01-03         648
[90m 8[39m Customer 46  2024-01-03         648
[90m 9[39m Customer 52  2024-01-05         646
[90m10[39m Customer 15  2024-01-06         645


In [27]:
# Task 5.2: Categorize by Recency
# TODO: Create a new column 'recency_category' using case_when():
#   - "Recent" if days_since <= 30
#   - "Moderate" if days_since <= 90
#   - "At Risk" if days_since > 90

transactions_clean <- transactions_clean %>%
  mutate(
    # Your code here:
    recency_category = case_when(
      days_since <= 30 ~ "Recent",
      days_since <= 90 ~ "Moderate",
      days_since > 90 ~ "At Risk"
    )
  )

# Display distribution
cat("Recency Category Distribution:\n")
table(transactions_clean$recency_category) %>% print()

# Show at-risk customers
cat("\nAt-Risk Customers (>90 days):\n")
transactions_clean %>%
  filter(recency_category == "At Risk") %>%
  select(CustomerName, date_parsed, days_since) %>%
  arrange(desc(days_since)) %>%
  print()

Recency Category Distribution:

At Risk 
    500 

At-Risk Customers (>90 days):
[90m# A tibble: 500 √ó 3[39m
   CustomerName date_parsed days_since
   [3m[90m<chr>[39m[23m        [3m[90m<date>[39m[23m           [3m[90m<dbl>[39m[23m
[90m 1[39m Customer 62  2024-01-01         650
[90m 2[39m Customer 99  2024-01-01         650
[90m 3[39m Customer 83  2024-01-02         649
[90m 4[39m Customer 61  2024-01-02         649
[90m 5[39m Customer 69  2024-01-02         649
[90m 6[39m Customer 63  2024-01-03         648
[90m 7[39m Customer 2   2024-01-03         648
[90m 8[39m Customer 46  2024-01-03         648
[90m 9[39m Customer 52  2024-01-05         646
[90m10[39m Customer 15  2024-01-06         645
[90m# ‚Ñπ 490 more rows[39m


## Part 6: Combined String and Date Operations

**Business Context:** Create personalized customer outreach messages based on purchase recency.

**Your Tasks:**
1. Extract first names from customer names
2. Create personalized messages based on recency
3. Analyze transaction patterns by weekday
4. Identify best customers (recent + high value)

**Key Functions:** Combine `str_extract()`, date calculations, `case_when()`, `group_by()`, `summarize()`

In [28]:
# Task 6.1: Extract First Names and Create Personalized Messages
# TODO: Create two new columns:
#   - first_name: Extract first name from customer_name (everything before first space)
#   - personalized_message: Create message based on recency_category
#     * Recent: "Hi [name]! Thanks for your recent purchase!"
#     * Moderate: "Hi [name], we miss you! Check out our new products."
#     * At Risk: "Hi [name], it's been a while! Here's a special offer for you."
# Hint: Use str_extract() with pattern "^\\w+" for first name
# Hint: Use paste() to combine strings in case_when()

customer_outreach <- transactions_clean %>%
  mutate(
    # Your code here:
    first_name = str_extract(CustomerName, "^\\w+"),
    personalized_message = case_when(
      recency_category == "Recent" ~ paste("Hi", first_name, "! Thanks for your recent purchase!"),
      recency_category == "Moderate" ~ paste("Hi", first_name, ", we miss you! Check out our new products."),
      recency_category == "At Risk" ~ paste("Hi", first_name, ", it's been a while! Here's a special offer for you.")
    )
  )

# Display personalized messages
cat("Personalized Customer Messages:\n")
customer_outreach %>%
  select(CustomerName, first_name, days_since, personalized_message) %>%
  head(10) %>%
  print()

Personalized Customer Messages:
[90m# A tibble: 10 √ó 4[39m
   CustomerName first_name days_since personalized_message                      
   [3m[90m<chr>[39m[23m        [3m[90m<chr>[39m[23m           [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m                                     
[90m 1[39m Customer 39  Customer          582 Hi Customer , it's been a while! Here's a‚Ä¶
[90m 2[39m Customer 63  Customer          308 Hi Customer , it's been a while! Here's a‚Ä¶
[90m 3[39m Customer 98  Customer          629 Hi Customer , it's been a while! Here's a‚Ä¶
[90m 4[39m Customer 39  Customer          467 Hi Customer , it's been a while! Here's a‚Ä¶
[90m 5[39m Customer 45  Customer          425 Hi Customer , it's been a while! Here's a‚Ä¶
[90m 6[39m Customer 8   Customer          545 Hi Customer , it's been a while! Here's a‚Ä¶
[90m 7[39m Customer 83  Customer          528 Hi Customer , it's been a while! Here's a‚Ä¶
[90m 8[39m Customer 60  Customer          530 H

In [30]:
# Task 6.2: Analyze Transaction Patterns by Weekday
# TODO: Group by trans_weekday and calculate:
#   - transaction_count: number of transactions
#   - total_amount: sum of amount (if available)
#   - avg_amount: average amount per transaction
# TODO: Arrange by transaction_count descending

weekday_patterns <- transactions_clean %>%
  # Your code here:
  group_by(trans_weekday) %>%
  summarise(
    transaction_count = n()
  ) %>%
  arrange(desc(transaction_count))

# Display results
cat("Transaction Patterns by Weekday:\n")
print(weekday_patterns)

# Identify busiest day
busiest_day <- weekday_patterns$trans_weekday[1]
cat("\nüî• Busiest day:", as.character(busiest_day), "\n")

Transaction Patterns by Weekday:
[90m# A tibble: 7 √ó 2[39m
  trans_weekday transaction_count
  [3m[90m<ord>[39m[23m                     [3m[90m<int>[39m[23m
[90m1[39m Thursday                     81
[90m2[39m Tuesday                      80
[90m3[39m Saturday                     80
[90m4[39m Wednesday                    75
[90m5[39m Monday                       71
[90m6[39m Sunday                       57
[90m7[39m Friday                       56

üî• Busiest day: Thursday 


In [31]:
# Task 6.3: Monthly Transaction Analysis
# TODO: Group by trans_month_name and calculate:
#   - transaction_count
#   - unique_customers: use n_distinct(customer_name)
# TODO: Arrange by trans_month (to show chronological order)

monthly_patterns <- transactions_clean %>%
  # Your code here:
  group_by(trans_month_name, trans_month) %>%
  summarise(
    transaction_count = n(),
    unique_customers = n_distinct(CustomerName),
    .groups = "drop"
  ) %>%
  arrange(trans_month)

# Display results
cat("Monthly Transaction Patterns:\n")
print(monthly_patterns)

Monthly Transaction Patterns:
[90m# A tibble: 12 √ó 4[39m
   trans_month_name trans_month transaction_count unique_customers
   [3m[90m<ord>[39m[23m                  [3m[90m<dbl>[39m[23m             [3m[90m<int>[39m[23m            [3m[90m<int>[39m[23m
[90m 1[39m January                    1                42               39
[90m 2[39m February                   2                43               36
[90m 3[39m March                      3                41               34
[90m 4[39m April                      4                43               36
[90m 5[39m May                        5                32               28
[90m 6[39m June                       6                40               32
[90m 7[39m July                       7                47               35
[90m 8[39m August                     8                43               34
[90m 9[39m September                  9                35               30
[90m10[39m October                 

## Part 7: Business Intelligence Summary

**Business Context:** Create an executive summary that combines all your analyses into actionable insights.

**Your Tasks:**
1. Calculate key metrics across all datasets
2. Identify top products and categories
3. Summarize customer sentiment
4. Provide data-driven recommendations

In [32]:
# Task 7.1: Create Business Intelligence Dashboard

cat("\n", rep("=", 60), "\n")
cat("         BUSINESS INTELLIGENCE SUMMARY\n")
cat(rep("=", 60), "\n\n")

# Product Analysis
cat("üì¶ PRODUCT ANALYSIS\n")
cat(rep("‚îÄ", 30), "\n")
# TODO: Calculate and display:
#   - Total number of products
#   - Number of wireless products
#   - Number of premium products
#   - Most common category

cat("Total Products:", nrow(products_clean), "\n")
cat("Wireless Products:", sum(products_clean$is_wireless, na.rm = TRUE), "\n")
cat("Premium Products:", sum(products_clean$is_premium, na.rm = TRUE), "\n")
most_common_cat <- names(sort(table(products_clean$category), decreasing = TRUE))[1]
cat("Most Common Category:", most_common_cat, "\n")


# Customer Sentiment
cat("\nüí¨ CUSTOMER SENTIMENT\n")
cat(rep("‚îÄ", 30), "\n")
# TODO: Calculate and display:
#   - Total feedback entries
#   - Average sentiment score
#   - Percentage of positive reviews
#   - Percentage of negative reviews

cat("Total Feedback Entries:", nrow(feedback_clean), "\n")
cat("Average Sentiment Score:", round(mean(feedback_clean$sentiment_score), 2), "\n")
pct_positive <- round(sum(feedback_clean$sentiment_score > 0) / nrow(feedback_clean) * 100, 1)
pct_negative <- round(sum(feedback_clean$sentiment_score < 0) / nrow(feedback_clean) * 100, 1)
cat("Positive Reviews:", pct_positive, "%\n")
cat("Negative Reviews:", pct_negative, "%\n")


# Transaction Patterns
cat("\nüìä TRANSACTION PATTERNS\n")
cat(rep("‚îÄ", 30), "\n")
# TODO: Calculate and display:
#   - Total transactions
#   - Date range (earliest to latest)
#   - Busiest weekday
#   - Weekend transaction percentage

cat("Total Transactions:", nrow(transactions_clean), "\n")
earliest_date <- min(transactions_clean$date_parsed, na.rm = TRUE)
latest_date <- max(transactions_clean$date_parsed, na.rm = TRUE)
cat("Date Range:", format(earliest_date, "%Y-%m-%d"), "to", format(latest_date, "%Y-%m-%d"), "\n")
busiest <- names(sort(table(transactions_clean$trans_weekday), decreasing = TRUE))[1]
cat("Busiest Weekday:", busiest, "\n")
weekend_pct <- round(sum(transactions_clean$is_weekend) / nrow(transactions_clean) * 100, 1)
cat("Weekend Transactions:", weekend_pct, "%\n")


# Customer Recency
cat("\nüë• CUSTOMER RECENCY\n")
cat(rep("‚îÄ", 30), "\n")
# TODO: Calculate and display:
#   - Number of recent customers (< 30 days)
#   - Number of at-risk customers (> 90 days)
#   - Percentage needing re-engagement

recent_customers <- sum(transactions_clean$recency_category == "Recent", na.rm = TRUE)
at_risk_customers <- sum(transactions_clean$recency_category == "At Risk", na.rm = TRUE)
pct_at_risk <- round(at_risk_customers / nrow(transactions_clean) * 100, 1)

cat("Recent Customers (<30 days):", recent_customers, "\n")
cat("At-Risk Customers (>90 days):", at_risk_customers, "\n")
cat("Needing Re-engagement:", pct_at_risk, "%\n")



 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 
         BUSINESS INTELLIGENCE SUMMARY
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 

üì¶ PRODUCT ANALYSIS
‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ 
Total Products: 75 
Wireless Products: 17 
Premium Products: 13 


‚ÄúUnknown or uninitialised column: `category`.‚Äù


Most Common Category: 

üí¨ CUSTOMER SENTIMENT
‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ 
Total Feedback Entries: 100 
Average Sentiment Score: 0.18 
Positive Reviews: 30 %
Negative Reviews: 20 %

üìä TRANSACTION PATTERNS
‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ 
Total Transactions: 500 
Date Range: 2024-01-01 to 2024-12-31 
Busiest Weekday: Thursday 
Weekend Transactions: 27.4 %

üë• CUSTOMER RECENCY
‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ ‚îÄ 
Recent Customers (<30 days): 0 
At-Risk Customers (>90 days): 500 
Needing Re-engagement: 100 %


In [33]:
# Task 7.2: Identify Top Products by Category
# TODO: Group products by category_clean and count products in each
# TODO: Arrange by count descending
# TODO: Display top 5 categories

top_categories <- products_clean %>%
  # Your code here:
  group_by(category_clean) %>%
  summarise(
    product_count = n()
  ) %>%
  arrange(desc(product_count)) %>%
  head(5)

cat("Top Product Categories:\n")
print(top_categories)

Top Product Categories:
[90m# A tibble: 5 √ó 2[39m
  category_clean product_count
  [3m[90m<chr>[39m[23m                  [3m[90m<int>[39m[23m
[90m1[39m Electronics               21
[90m2[39m Computers                 15
[90m3[39m Audio                     14
[90m4[39m Tv                        14
[90m5[39m Shoes                     11


## Part 8: Reflection Questions

Answer the following questions based on your analysis. Write your answers in the markdown cells below.

### Question 8.1: Data Quality Impact

**How did cleaning the text data (removing spaces, standardizing case) improve your ability to analyze the data? Provide specific examples from your homework.**

Your answer here:
Cleaning text data enabled me to transform messy and inconsistent data into standardized formats for counting grouping, and filtering. When grouping by category_clean to find top product categories, we got accurate counts instead of splitting the same category across multiple rows


### Question 8.2: Pattern Detection Value

**What business insights did you gain from detecting patterns in product names (wireless, premium, gaming)? How could a business use this information?**

Your answer here:
Pattern discovery transforms unstructured product names to actionable business insights and enables data-driven decision making on product development, marketing, inventory, and competition. Businesses could use this to establish if premium products ought to have deserved price premiums. 


### Question 8.3: Date Analysis Importance

**Why is analyzing transaction dates by weekday and month important for business operations? Provide at least three specific business applications.**

Your answer here:

1. Businesses may need to analyze transactions by weekday and month for business operations for things like staff scheduling. Managers can use past information to avoid over-staffing on slow days and understand when to staff more. 
2. Businesses can also better manage inventory. Businesses will have more inventory and stock up during their busier seasons.
3. Businesses can also use this information to market more during their busier seasons or launch sales during their slower days. 


### Question 8.4: Customer Recency Strategy

**Based on your recency analysis, what specific actions would you recommend for customers in each category (Recent, Moderate, At Risk)? How would you prioritize these actions?**

Your answer here:



### Question 8.5: Sentiment Analysis Application

**How could the sentiment analysis you performed be used to improve products or customer service? What are the limitations of this simple sentiment analysis approach?**

Your answer here:
The sentiment analysis can be used to find products that arent doing well and have them pulled. This could also help find unhappy customers for damage control. A limitation of a simple sentiment analysis is that you cant really differentiate between mildly posiitve or negative reviews. 


### Question 8.6: Real-World Application

**Describe a real business scenario where you would need to combine string manipulation and date analysis (like you did in this homework). What insights would you be trying to discover?**

Your answer here:
Businesses would combine string manipulation and date analysis when looking at patterns between products and seasons. This would be like a surplus of gift sets around christmas or the increase in sales of video games leading into summer. Business would be trying to discover what products will sell most at what times so they can stock up for those seasons.


## Summary and Submission

### What You've Accomplished

In this homework, you've successfully:
- ‚úÖ Cleaned and standardized messy text data using `stringr` functions
- ‚úÖ Detected patterns and extracted information from text
- ‚úÖ Parsed dates and extracted temporal components using `lubridate`
- ‚úÖ Calculated customer recency for segmentation
- ‚úÖ Analyzed transaction patterns by time periods
- ‚úÖ Combined string and date operations for business insights
- ‚úÖ Created personalized customer communications
- ‚úÖ Generated executive-ready business intelligence summaries

### Key Skills Mastered

**String Manipulation:**
- `str_trim()`, `str_squish()` - Whitespace handling
- `str_to_lower()`, `str_to_upper()`, `str_to_title()` - Case conversion
- `str_detect()` - Pattern detection
- `str_extract()` - Information extraction
- `str_count()` - Pattern counting

**Date/Time Operations:**
- `ymd()`, `mdy()`, `dmy()` - Date parsing
- `year()`, `month()`, `day()`, `wday()` - Component extraction
- `quarter()` - Period extraction
- `today()` - Current date
- Date arithmetic - Calculating differences

**Business Applications:**
- Data cleaning and standardization
- Customer segmentation by recency
- Sentiment analysis
- Pattern identification
- Temporal trend analysis
- Personalized communication

### Submission Checklist

Before submitting, ensure you have:
- [ ] Entered your name, student ID, and date at the top
- [ ] Completed all code tasks (Parts 1-7)
- [ ] Run all cells successfully without errors
- [ ] Answered all reflection questions (Part 8)
- [ ] Used proper commenting in your code
- [ ] Used the pipe operator (`%>%`) where appropriate
- [ ] Verified your results make business sense
- [ ] Checked for any remaining TODO comments

### Grading Criteria

Your homework will be evaluated on:
- **Code Correctness (40%)**: All tasks completed correctly
- **Code Quality (20%)**: Clean, well-commented, efficient code
- **Business Understanding (20%)**: Demonstrates understanding of business applications
- **Reflection Questions (15%)**: Thoughtful, complete answers
- **Presentation (5%)**: Professional formatting and organization

### Next Steps

In Lesson 8, you'll learn:
- Advanced data wrangling with complex pipelines
- Sophisticated conditional logic with `case_when()`
- Data validation and quality checks
- Creating reproducible analysis workflows
- Professional best practices for business analytics

**Great work on completing this assignment! üéâ**