# Python Data Structures for Data Analytics

## Learning Objectives
- Master Lists, Tuples, Dictionaries, and Sets
- Understand when to use each data structure
- Apply data structures to business scenarios
- Learn common operations and methods

## Data Sources
- Synthetic business data for practical learning

## 1. Lists - Ordered, Mutable Collections

In [None]:
# Business data in lists
products = ["Laptop", "Mouse", "Keyboard", "Monitor", "Headphones"]
prices = [999.99, 25.50, 75.00, 299.99, 149.99]
monthly_sales = [45, 120, 85, 30, 90]

print("Products:", products)
print("Prices:", prices)
print("Monthly Sales:", monthly_sales)

In [None]:
# List operations for business analysis
print("=== PRODUCT ANALYSIS ===")
print("First product:", products[0])
print("Last product:", products[-1])
print("Top 3 products:", products[:3])
print("Total products:", len(products))

# Adding new product
products.append("Webcam")
prices.append(89.99)
monthly_sales.append(65)

print("\nAfter adding Webcam:")
print("Products:", products)
print("Total products now:", len(products))

In [None]:
# List comprehensions for quick calculations
revenue_per_product = [prices[i] * monthly_sales[i] for i in range(len(products))]

print("=== REVENUE ANALYSIS ===")
for i, product in enumerate(products):
    print(f"{product}: ${revenue_per_product[i]:,.2f}")

total_revenue = sum(revenue_per_product)
average_revenue = total_revenue / len(revenue_per_product)
max_revenue = max(revenue_per_product)
best_product = products[revenue_per_product.index(max_revenue)]

print(f"\nTotal Revenue: ${total_revenue:,.2f}")
print(f"Average per Product: ${average_revenue:,.2f}")
print(f"Best Performing: {best_product} (${max_revenue:,.2f})")

## 2. Tuples - Immutable Collections

In [None]:
# Using tuples for fixed data
company_info = ("DataCorp Inc.", "Technology", 2020, "New York")
product_categories = ("Hardware", "Software", "Services", "Consulting")

print("Company Info:", company_info)
print("Product Categories:", product_categories)
print("Company Name:", company_info[0])
print("Industry:", company_info[1])

# Tuples are immutable - this would cause an error:
# company_info[0] = "New Corp"  # Uncomment to see error

In [None]:
# Tuple unpacking for clean code
name, industry, year, location = company_info
print(f"Welcome to {name}")
print(f"Industry: {industry}")
print(f"Founded: {year}")
print(f"Location: {location}")

# Multiple return values (simulated)
def get_sales_data():
    return (150000, 45, 12.5)  # revenue, customers, growth%

revenue, customers, growth = get_sales_data()
print(f"\nSales Data: ${revenue:,} revenue, {customers} customers, {growth}% growth")

## 3. Dictionaries - Key-Value Pairs

In [None]:
# Business data in dictionaries
customer = {
    "customer_id": "CUST001",
    "name": "Alice Johnson",
    "email": "alice@email.com",
    "age": 34,
    "city": "San Francisco",
    "total_spent": 2450.75,
    "premium_member": True
}

print("Customer Data:")
for key, value in customer.items():
    print(f"{key.replace('_', ' ').title()}: {value}")

In [None]:
# Dictionary operations
print("\n=== CUSTOMER ANALYSIS ===")
print("Customer Name:", customer["name"])
print("Is Premium Member:", customer.get("premium_member", "No"))

# Update customer data
customer["total_spent"] += 299.99  # New purchase
customer["last_purchase"] = "2024-01-15"  # Add new field

print(f"Updated Total Spent: ${customer['total_spent']:.2f}")
print("All customer keys:", list(customer.keys()))

In [None]:
# Nested dictionaries for complex data
business_data = {
    "company": {
        "name": "Tech Solutions Inc",
        "employees": 150,
        "revenue": 5000000
    },
    "products": {
        "software": ["Analytics Pro", "CRM Enterprise", "Security Suite"],
        "services": ["Consulting", "Support", "Training"]
    },
    "performance": {
        "q1": 1200000,
        "q2": 1350000,
        "q3": 1450000,
        "q4": 1000000
    }
}

print("Company Name:", business_data["company"]["name"])
print("Software Products:", business_data["products"]["software"])
print("Q2 Revenue: $", business_data["performance"]["q2"])

## 4. Sets - Unique Collections

In [None]:
# Using sets for unique data
all_cities = {"New York", "San Francisco", "Chicago", "Boston", "Austin"}
east_coast_cities = {"New York", "Boston", "Miami"}
west_coast_cities = {"San Francisco", "Los Angeles", "Seattle"}

print("All Cities:", all_cities)
print("East Coast:", east_coast_cities)
print("West Coast:", west_coast_cities)

# Set operations
print("\n=== GEOGRAPHIC ANALYSIS ===")
print("Cities in both:", all_cities & east_coast_cities)  # Intersection
print("All unique cities:", all_cities | west_coast_cities)  # Union
print("Only in All Cities:", all_cities - east_coast_cities)  # Difference
print("Is Chicago in dataset?", "Chicago" in all_cities)

In [None]:
# Practical set example - Customer segmentation
website_visitors = {"alice@email.com", "bob@company.com", "charlie@tech.org"}
newsletter_subscribers = {"bob@company.com", "diana@startup.com", "eve@business.net"}
premium_customers = {"alice@email.com", "eve@business.net"}

print("=== CUSTOMER SEGMENT ANALYSIS ===")
print("Visitors who subscribe:", website_visitors & newsletter_subscribers)
print("Premium subscribers:", premium_customers & newsletter_subscribers)
print("All contacts:", website_visitors | newsletter_subscribers)
print("Subscribers not premium:", newsletter_subscribers - premium_customers)

## 5. Comprehensive Business Example

In [None]:
# Putting it all together - Sales Analysis System

# List of sales transactions (each transaction is a dictionary)
sales_data = [
    {"product": "Laptop", "price": 999.99, "quantity": 2, "region": "North"},
    {"product": "Mouse", "price": 25.50, "quantity": 10, "region": "South"},
    {"product": "Keyboard", "price": 75.00, "quantity": 5, "region": "North"},
    {"product": "Monitor", "price": 299.99, "quantity": 3, "region": "East"},
    {"product": "Laptop", "price": 999.99, "quantity": 1, "region": "West"},
    {"product": "Headphones", "price": 149.99, "quantity": 8, "region": "South"},
]

# Analysis using different data structures
total_revenue = sum([sale["price"] * sale["quantity"] for sale in sales_data])
regions = set([sale["region"] for sale in sales_data])
products_sold = [sale["product"] for sale in sales_data]
product_counts = {product: products_sold.count(product) for product in set(products_sold)}

print("=== SALES ANALYSIS REPORT ===")
print(f"Total Revenue: ${total_revenue:,.2f}")
print(f"Regions with sales: {regions}")
print("Products sold:", product_counts)

# Regional analysis using dictionary
regional_sales = {}
for sale in sales_data:
    region = sale["region"]
    amount = sale["price"] * sale["quantity"]
    regional_sales[region] = regional_sales.get(region, 0) + amount

print("\nRegional Sales:")
for region, sales in regional_sales.items():
    print(f"{region}: ${sales:,.2f}")

## Practice Exercises

In [None]:
# Exercise 1: Employee Management System
employees = [
    {"name": "Alice", "department": "Engineering", "salary": 85000},
    {"name": "Bob", "department": "Marketing", "salary": 65000},
    {"name": "Charlie", "department": "Engineering", "salary": 95000},
    {"name": "Diana", "department": "Sales", "salary": 75000},
]

# Your task: Calculate average salary by department
department_salaries = {}
department_counts = {}

for emp in employees:
    dept = emp["department"]
    salary = emp["salary"]
    department_salaries[dept] = department_salaries.get(dept, 0) + salary
    department_counts[dept] = department_counts.get(dept, 0) + 1

print("Average Salaries by Department:")
for dept in department_salaries:
    avg_salary = department_salaries[dept] / department_counts[dept]
    print(f"{dept}: ${avg_salary:,.2f}")

In [None]:
# Exercise 2: Inventory Management
inventory = ["laptop", "mouse", "keyboard", "monitor", "laptop", "mouse"]

# Your task: Find unique products and count of each
unique_products = set(inventory)
product_counts = {}

for product in inventory:
    product_counts[product] = product_counts.get(product, 0) + 1

print("Unique Products:", unique_products)
print("Product Counts:", product_counts)

## Key Takeaways
- **Lists**: Ordered, mutable collections - perfect for sequences of items
- **Tuples**: Immutable collections - use for fixed data that shouldn't change
- **Dictionaries**: Key-value pairs - ideal for structured data with labels
- **Sets**: Unique, unordered collections - great for membership testing and operations
- **Choose the right structure** for your data and operations needed