# **CHALLENGE: VALENTINE'S DAY GIFTS ANALYSIS**

---

## **YOUR MISSION**

**Main Goal:**  
*Explore the Valentine's Day gifts dataset and discover interesting insights!*

**What You'll Do:**  
You have freedom to explore, but here's a suggested path:

1. **Load and explore** the dataset - what's in it?
2. **Clean** any messy data
3. **Analyze** interesting patterns - be creative!
4. **Visualize** at least 2-3 findings
5. **Share** 2-3 insights you discovered

**Remember:** There's no single "right answer" - focus on telling an interesting story with data!

**Dataset Link**: https://www.kaggle.com/datasets/kanchana1990/2024-amazon-best-sellers-top-valentine-gifts

---

## **EVALUATION CRITERIA**

### **Your analysis will be evaluated on:**

**1. Data Exploration & Cleaning (30 points)**
- ✓ Successfully loaded the dataset
- ✓ Examined data structure (shape, columns, types)
- ✓ Identified and handled missing/duplicate data
- ✓ Code runs without errors

**2. Analysis Quality (30 points)**
- ✓ Asked interesting questions about the data
- ✓ Used appropriate methods (filtering, grouping, calculations)
- ✓ Analysis makes logical sense
- ✓ Considered data limitations

**3. Visualizations (20 points)**
- ✓ Created 2-3 clear, readable charts
- ✓ Charts have titles and labels
- ✓ Visualizations support your insights
- ✓ Appropriate chart types for the data

**4. Insights & Communication (20 points)**
- ✓ Stated 2-3 clear findings
- ✓ Findings are supported by the data
- ✓ Explained what the findings mean
- ✓ Writing is clear and organized

**Bonus Points (up to +10)**
- Feature engineering creativity
- Advanced visualizations
- Statistical analysis
- Unique insights

---

## **SUGGESTED QUESTIONS TO EXPLORE**

**Pick 2-3 questions that interest you, or create your own!**

**About Ratings & Reviews:**
- What percentage of products have excellent ratings (80%+ 5-star)?
- Is there a relationship between number of reviews and rating quality?
- Which products have surprisingly few reviews despite good ratings?

**About Brands:**
- Which brands appear most frequently?
- Do certain brands tend to have higher ratings?
- Are there any "hidden gem" brands?

**About Product Types:**
- What types of gifts are most popular? (look at titles)
- Do different product categories have different rating patterns?
- What's the price range of top-rated items?

**Your Own Questions:**
- What else are you curious about?

---

## **I. SETUP & DATA LOADING**

**Import the necessary libraries**

**Hint:** You'll need pandas, numpy, matplotlib, and seaborn.

In [None]:
# YOUR CODE HERE


**Load the dataset**

**Hint:** Use `pd.read_csv('valentines_gift.csv')`

In [None]:
# YOUR CODE HERE


---

## **II. INITIAL EXPLORATION**

**Get to know your data!**

**Preview the first few rows**

In [None]:
# Display first 5-10 rows

# YOUR CODE HERE


**Examine the data structure**

**Hint:** Use `.info()` to see data types and missing values

In [None]:
# YOUR CODE HERE


**Get summary statistics**

**Hint:** Use `.describe()` for numerical columns

In [None]:
# YOUR CODE HERE


**Check for missing values**

**Hint:** Use `.isnull().sum()` to count missing values per column

In [None]:
# YOUR CODE HERE


---

## **III. DATA CLEANING**

**Clean your data before analysis!**

**Check for and remove duplicates**

**Hint:** Use `.duplicated().sum()` to count, then `.drop_duplicates()` if needed

In [None]:
# YOUR CODE HERE


**Handle missing values**

**Hint:** You can:
- Fill with a default value (like 'Unknown' or 0)
- Drop columns that are mostly empty
- Drop rows with missing values in important columns

Think about what makes sense for each column!

In [None]:
# Decide your strategy and implement it

# YOUR CODE HERE


---

## **IV. ANALYSIS**

**This is where you explore! Pick 2-3 questions to investigate.**

### **Analysis 1:**

**What question are you investigating?**  
_(Write your question here before coding)_

In [None]:
# Write code to answer your first question

# YOUR CODE HERE


### **Analysis 2:**

**What question are you investigating?**  
_(Write your question here before coding)_

In [None]:
# Write code to answer your second question

# YOUR CODE HERE


### **Analysis 3 (Optional):**

**What question are you investigating?**  
_(Write your question here before coding)_

In [None]:
# Write code to answer your third question

# YOUR CODE HERE


---

## **V. VISUALIZATIONS & INSIGHTS**

**Create 2-3 charts to illustrate your findings!**

**Tips:**
- Every chart should have a title
- Label your axes clearly
- Choose appropriate chart types (bar, scatter, histogram, etc.)
- Make sure the chart actually helps explain your insight

### **Visualization 1:**

**What are you showing?**  
_(Describe your chart before creating it)_

In [None]:
# Create your first visualization

# YOUR CODE HERE


**What did you find?**  
_(Write your insight here - what does the data show?)_

### **Visualization 2:**

**What are you showing?**  
_(Describe your chart)_

In [None]:
# Create your second visualization

# YOUR CODE HERE


**What did you find?**  
_(Write your insight here - what does the data show?)_

### **Visualization 3 (Optional):**

**What are you showing?**  
_(Describe your chart)_

In [None]:
# Create your third visualization

# YOUR CODE HERE


**What did you find?**  
_(Write your insight here - what does the data show?)_

---

## **VI. SUMMARY & CONCLUSION**

**Summarize what you learned!**

**Write your 2-3 key findings:**

1. _(Your first finding - what did you discover?)_

2. _(Your second finding)_

3. _(Your third finding - optional)_

**Overall conclusion:**  
_(What's the big picture? What does this tell us about Valentine's Day gifts on Amazon?)_

In [None]:
# Optional: Add any summary statistics or final calculations here

# YOUR CODE HERE


---

## **BONUS: ADVANCED ANALYSIS (OPTIONAL)**

**Only attempt these if you finish the main analysis!**

*These are challenging - don't worry if you don't get to them.*

### **Bonus Challenge 1: Create a Popularity Score**

**Task:** Engineer a new feature that combines rating quality with review quantity.

**Why?** A product with 5.0 rating but only 2 reviews isn't as "popular" as one with 4.8 rating and 10,000 reviews.

**Hint:** Try something like: `popularity = average_rating × log(review_count + 1)`

The log transformation helps balance products with wildly different review counts.

In [None]:
# Create and use your popularity score

# YOUR CODE HERE


### **Bonus Challenge 2: Advanced Visualization**

**Task:** Create a correlation heatmap or another advanced visualization.

**Hint:** Use `sns.heatmap()` with a correlation matrix of numerical variables.

In [None]:
# Create an advanced visualization

# YOUR CODE HERE


---

## **GREAT WORK!**

### **Before Submitting, Check:**

- ✓ All code cells run without errors
- ✓ You've answered at least 2-3 questions
- ✓ You've created 2-3 visualizations
- ✓ Your insights are clearly stated
- ✓ You've explained what your findings mean

### **Tips for Success:**

✓ **Quality over quantity** - Better to do a few analyses well than many poorly

✓ **Tell a story** - Connect your analyses into a narrative

✓ **Be specific** - "Many products have good ratings" is vague. "78% of products have 80%+ five-star ratings" is specific!

✓ **Show your work** - Explain why you chose certain analyses

✓ **Be honest** - If data is limited or unclear, say so!

---

**Good luck!**