# ISM6251 - Week 2 Assignment
# Retail Sales Analysis: From Data to Insights

**Student Name:** [Enter your name here]

**Date:** [Enter submission date]

---

## Business Context

### The Situation

You've just been hired as a Data Analyst at **TechMart**, a mid-sized electronics retail chain with 15 stores across Florida. The company has been in business for 10 years but has only recently started collecting detailed sales data. The CEO has asked you to analyze the last quarter's sales data (Q4 2024) to help inform strategic decisions for 2025.

TechMart sells various electronic products including:
- **Laptops** - High-margin items, average price $800-1500
- **Smartphones** - Fast-moving items, average price $400-1000
- **Headphones** - Accessory items, average price $50-300
- **Tablets** - Moderate sellers, average price $300-800
- **Smart Watches** - Growing category, average price $200-500

### The Challenge

The CEO wants to understand:
1. **Sales Performance**: Which products and stores are performing best?
2. **Trends**: Are there any patterns in daily sales that could inform staffing decisions?
3. **Regional Differences**: Do different regions prefer different products?
4. **Revenue Drivers**: What products contribute most to overall revenue?

### Your Mission

Using Python and its data science libraries (NumPy, Pandas, and Matplotlib), you will:
- Load and explore the sales data
- Clean and prepare the data for analysis
- Perform calculations to derive business insights
- Create visualizations to communicate your findings
- Provide actionable recommendations based on your analysis

### The Data

You have been provided with sales transaction data that includes:
- Transaction date
- Store location (city)
- Product category
- Units sold
- Unit price
- Total revenue per transaction

---

## Assignment Instructions

### Grading Criteria (Total: 100 points)

- **Data Loading and Exploration (20 points)**
  - Properly load the data
  - Explore basic statistics
  - Identify data quality issues

- **Data Manipulation with Pandas (25 points)**
  - Create calculated columns
  - Group and aggregate data
  - Handle any data issues

- **Numerical Analysis with NumPy (15 points)**
  - Perform array operations
  - Calculate statistics efficiently

- **Visualization with Matplotlib (25 points)**
  - Create at least 4 meaningful charts
  - Properly label and format visualizations

- **Business Insights and Communication (15 points)**
  - Clear narrative throughout
  - Actionable recommendations
  - Professional presentation

### Important Notes

1. **Write clear markdown explanations** for each step
2. **Comment your code** to show understanding
3. **Interpret your results** in business terms
4. **Be creative** but stay within the scope of Week 2 materials

---

## Part 1: Setup and Data Generation (10 points)

Since this is a learning exercise, we'll generate realistic sales data. In a real scenario, you would load this from a CSV file or database.

### Task 1.1: Import Required Libraries
Import all necessary libraries for this analysis.

In [None]:
# TODO: Import required libraries
# You'll need: pandas, numpy, matplotlib.pyplot, and datetime

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta

# Set random seed for reproducibility
np.random.seed(42)

# Configure matplotlib for better display
%matplotlib inline
plt.style.use('default')

print("Libraries imported successfully!")
print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

### Task 1.2: Generate Sales Data
Run this cell to create our dataset. In practice, you would use `pd.read_csv()` to load real data.

In [None]:
# Generate sample data (DO NOT MODIFY THIS CELL)
def generate_sales_data():
    # Define parameters
    stores = ['Miami', 'Orlando', 'Tampa', 'Jacksonville', 'Tallahassee']
    products = {
        'Laptop': (800, 1500, 5, 15),
        'Smartphone': (400, 1000, 10, 30),
        'Headphones': (50, 300, 15, 50),
        'Tablet': (300, 800, 8, 20),
        'Smart Watch': (200, 500, 10, 25)
    }
    
    # Generate dates for Q4 2024 (Oct, Nov, Dec)
    start_date = datetime(2024, 10, 1)
    end_date = datetime(2024, 12, 31)
    
    data = []
    current_date = start_date
    
    while current_date <= end_date:
        # More sales on weekends
        is_weekend = current_date.weekday() >= 5
        transactions_today = np.random.randint(40, 80) if is_weekend else np.random.randint(20, 50)
        
        for _ in range(transactions_today):
            store = np.random.choice(stores)
            product = np.random.choice(list(products.keys()))
            min_price, max_price, min_units, max_units = products[product]
            
            # Add some store preferences
            if store == 'Miami' and product == 'Laptop':
                max_units = int(max_units * 1.3)
            elif store == 'Orlando' and product == 'Smart Watch':
                max_units = int(max_units * 1.2)
            
            units = np.random.randint(min_units, max_units)
            unit_price = np.random.uniform(min_price, max_price)
            
            # Black Friday and Cyber Monday boost
            if current_date.month == 11 and 24 <= current_date.day <= 30:
                units = int(units * 2.5)
                unit_price *= 0.8  # 20% discount
            
            data.append({
                'Date': current_date,
                'Store': store,
                'Product': product,
                'Units_Sold': units,
                'Unit_Price': round(unit_price, 2),
                'Revenue': round(units * unit_price, 2)
            })
        
        current_date += timedelta(days=1)
    
    return pd.DataFrame(data)

# Generate the data
df = generate_sales_data()
print(f"Dataset created with {len(df)} transactions")
print(f"Date range: {df['Date'].min()} to {df['Date'].max()}")

---

## Part 2: Data Exploration (20 points)

### Task 2.1: Initial Data Inspection
**Your Task:** Explore the structure and content of the dataset. What can you learn about the data?

In [None]:
# TODO: Display the first 10 rows of the dataset
# Hint: Use df.head(10)



In [None]:
# TODO: Display basic information about the dataset
# Hint: Use df.info() and df.shape



In [None]:
# TODO: Display summary statistics for numerical columns
# Hint: Use df.describe()



**Your Interpretation:** [Write 2-3 sentences about what you learned from the data exploration]

### Task 2.2: Categorical Data Analysis
**Your Task:** Understand the categorical variables in the dataset.

In [None]:
# TODO: Find unique stores and products
# Display the unique values and their counts

print("Unique Stores:")
# Your code here

print("\nUnique Products:")
# Your code here

print("\nTransactions by Store:")
# Your code here - use value_counts()


---

## Part 3: Data Manipulation with Pandas (25 points)

### Task 3.1: Add Calculated Columns
**Your Task:** Enhance the dataset with additional calculated fields that will help in analysis.

In [None]:
# TODO: Add the following columns to the dataframe:
# 1. 'Month' - Extract month name from Date
# 2. 'Day_of_Week' - Extract day name from Date
# 3. 'Is_Weekend' - Boolean indicating if the sale was on weekend

# Extract month
df['Month'] = # Your code here

# Extract day of week
df['Day_of_Week'] = # Your code here

# Create weekend indicator
df['Is_Weekend'] = # Your code here

# Display first few rows to verify
df.head()

### Task 3.2: Grouping and Aggregation
**Your Task:** Use Pandas groupby to answer key business questions.

In [None]:
# TODO: Calculate total revenue by store
revenue_by_store = # Your code here

print("Total Revenue by Store:")
print(revenue_by_store)
print(f"\nBest performing store: {revenue_by_store.idxmax()}")
print(f"Revenue difference between best and worst: ${revenue_by_store.max() - revenue_by_store.min():,.2f}")

In [None]:
# TODO: Calculate average units sold per product
avg_units_by_product = # Your code here

print("Average Units Sold per Transaction by Product:")
# Your code to display results

In [None]:
# TODO: Create a summary table showing for each store and product:
# - Total units sold
# - Total revenue
# - Average unit price

store_product_summary = df.groupby(['Store', 'Product']).agg({
    'Units_Sold': # Your code here,
    'Revenue': # Your code here,
    'Unit_Price': # Your code here
}).round(2)

print("Store-Product Performance Summary:")
print(store_product_summary.head(10))

### Task 3.3: Time-based Analysis
**Your Task:** Analyze sales patterns over time.

In [None]:
# TODO: Calculate daily total revenue
daily_revenue = # Your code here

print(f"Average daily revenue: ${daily_revenue.mean():,.2f}")
print(f"Best sales day: {daily_revenue.idxmax()} with ${daily_revenue.max():,.2f}")
print(f"Worst sales day: {daily_revenue.idxmin()} with ${daily_revenue.min():,.2f}")

In [None]:
# TODO: Compare weekend vs weekday sales
weekend_comparison = # Your code here

print("Weekend vs Weekday Sales:")
# Your code to display the comparison

---

## Part 4: Numerical Analysis with NumPy (15 points)

### Task 4.1: Statistical Calculations
**Your Task:** Use NumPy to perform efficient numerical calculations.

In [None]:
# TODO: Convert revenue to NumPy array and calculate statistics
revenue_array = np.array(df['Revenue'])

# Calculate statistics using NumPy
revenue_mean = # Your code here
revenue_median = # Your code here
revenue_std = # Your code here
revenue_percentile_25 = # Your code here
revenue_percentile_75 = # Your code here

print("Revenue Statistics (using NumPy):")
print(f"Mean: ${revenue_mean:,.2f}")
print(f"Median: ${revenue_median:,.2f}")
print(f"Standard Deviation: ${revenue_std:,.2f}")
print(f"25th Percentile: ${revenue_percentile_25:,.2f}")
print(f"75th Percentile: ${revenue_percentile_75:,.2f}")

In [None]:
# TODO: Calculate correlation between Units_Sold and Revenue
units_array = # Your code here
revenue_array = # Your code here

correlation = # Your code here - use np.corrcoef

print(f"Correlation between Units Sold and Revenue: {correlation:.3f}")
print("\nInterpretation: [Write what this correlation means in business terms]")

### Task 4.2: Array Operations
**Your Task:** Demonstrate NumPy's efficiency with array operations.

In [None]:
# TODO: Calculate profit margins (assuming 30% cost)
# Use NumPy array operations

revenue_array = np.array(df['Revenue'])
cost_array = # Your code here - multiply revenue by 0.3
profit_array = # Your code here - subtract cost from revenue

# Add profit to dataframe
df['Profit'] = profit_array

print(f"Total Revenue: ${revenue_array.sum():,.2f}")
print(f"Total Cost: ${cost_array.sum():,.2f}")
print(f"Total Profit: ${profit_array.sum():,.2f}")
print(f"Profit Margin: {(profit_array.sum() / revenue_array.sum() * 100):.1f}%")

---

## Part 5: Data Visualization with Matplotlib (25 points)

### Task 5.1: Revenue by Store (Bar Chart)
**Your Task:** Create a bar chart showing total revenue by store.

In [None]:
# TODO: Create a bar chart of revenue by store
plt.figure(figsize=(10, 6))

# Your code here
# Remember to:
# - Use the revenue_by_store data from earlier
# - Add title, labels, and grid
# - Use colors effectively
# - Add value labels on bars

plt.title('Total Revenue by Store - Q4 2024', fontsize=14, fontweight='bold')
plt.xlabel('Store Location')
plt.ylabel('Revenue ($)')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Business Insight: [Write 1-2 sentences about what this chart reveals]")

### Task 5.2: Product Performance (Horizontal Bar Chart)
**Your Task:** Create a horizontal bar chart showing average units sold by product.

In [None]:
# TODO: Create a horizontal bar chart
plt.figure(figsize=(10, 6))

# Your code here
# Use plt.barh() for horizontal bars

plt.title('Average Units Sold per Transaction by Product', fontsize=14, fontweight='bold')
plt.xlabel('Average Units Sold')
plt.ylabel('Product Category')
plt.grid(True, alpha=0.3, axis='x')

plt.tight_layout()
plt.show()

print("Business Insight: [Write 1-2 sentences about what this chart reveals]")

### Task 5.3: Daily Revenue Trend (Line Chart)
**Your Task:** Create a line chart showing daily revenue over time.

In [None]:
# TODO: Create a line chart of daily revenue
plt.figure(figsize=(14, 6))

# Your code here
# - Plot daily_revenue data
# - Add a horizontal line for average
# - Highlight the Black Friday period if visible

plt.title('Daily Revenue Trend - Q4 2024', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Revenue ($)')
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)

plt.tight_layout()
plt.show()

print("Business Insight: [Write 1-2 sentences about what this chart reveals]")

### Task 5.4: Product Mix by Store (Grouped Bar Chart)
**Your Task:** Create a grouped bar chart showing product sales across stores.

In [None]:
# TODO: Create a grouped bar chart
# This is more complex - show revenue by product for each store

# Prepare data
pivot_data = df.pivot_table(values='Revenue', index='Store', columns='Product', aggfunc='sum')

# Create the plot
fig, ax = plt.subplots(figsize=(12, 6))

# Your code here
# Hint: Use pivot_data.plot(kind='bar', ax=ax)

plt.title('Product Revenue Distribution by Store', fontsize=14, fontweight='bold')
plt.xlabel('Store Location')
plt.ylabel('Revenue ($)')
plt.legend(title='Product', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("Business Insight: [Write 1-2 sentences about what this chart reveals]")

---

## Part 6: Business Insights and Recommendations (15 points)

### Task 6.1: Key Findings Summary
**Your Task:** Summarize your most important findings from the analysis.

### Key Findings

Based on my analysis of TechMart's Q4 2024 sales data, here are the key findings:

1. **Top Performing Store:** [Your finding here]
   - [Supporting data/evidence]

2. **Best Selling Product:** [Your finding here]
   - [Supporting data/evidence]

3. **Sales Patterns:** [Your finding here]
   - [Supporting data/evidence]

4. **Regional Preferences:** [Your finding here]
   - [Supporting data/evidence]

5. **Revenue Drivers:** [Your finding here]
   - [Supporting data/evidence]

### Task 6.2: Business Recommendations
**Your Task:** Provide actionable recommendations based on your analysis.

### Recommendations for TechMart Leadership

Based on the data analysis, I recommend the following actions for 2025:

#### 1. Store Operations
**Recommendation:** [Your recommendation]
**Rationale:** [Data-driven justification]
**Expected Impact:** [Potential business impact]

#### 2. Product Strategy
**Recommendation:** [Your recommendation]
**Rationale:** [Data-driven justification]
**Expected Impact:** [Potential business impact]

#### 3. Staffing Optimization
**Recommendation:** [Your recommendation]
**Rationale:** [Data-driven justification]
**Expected Impact:** [Potential business impact]

#### 4. Marketing Focus
**Recommendation:** [Your recommendation]
**Rationale:** [Data-driven justification]
**Expected Impact:** [Potential business impact]

### Next Steps
To further improve our analysis and decision-making, I recommend:
1. [Suggestion for additional data to collect]
2. [Suggestion for deeper analysis]
3. [Suggestion for tracking metrics]

---

## Part 7: Technical Reflection (Bonus: 5 points)

**Your Task:** Reflect on the tools and techniques used in this analysis.

### Technical Reflection

#### What I Learned
[Write 2-3 sentences about what you learned from this assignment]

#### Challenges Faced
[Describe any challenges you encountered and how you solved them]

#### Tools Comparison
If I had to do this analysis in Excel instead of Python:
- **What would be harder:** [Your thoughts]
- **What would be easier:** [Your thoughts]
- **Time estimate:** [How much longer/shorter would it take?]

#### Future Learning
Based on this assignment, I want to learn more about:
1. [Topic 1]
2. [Topic 2]
3. [Topic 3]

---

## Submission Checklist

Before submitting, ensure you have:

- [ ] Added your name at the top of the notebook
- [ ] Completed all code cells (no TODO comments remaining)
- [ ] Written business interpretations for all findings
- [ ] Created at least 4 visualizations with proper labels
- [ ] Provided actionable business recommendations
- [ ] Checked that all code cells run without errors
- [ ] Spell-checked your written responses
- [ ] Saved the notebook with all outputs visible

### How to Submit
1. Save this notebook with all outputs (File → Save)
2. Export as HTML (File → Download as → HTML)
3. Submit both the .ipynb and .html files to Canvas
4. Due date: [Check Canvas for deadline]

---

**Good luck with your analysis!**