# üêç Module 1 ‚Äî Python Fundamentals

**‚è±Ô∏è Time**: 2-3 hours | **üéØ Difficulty**: üü¢ Beginner

## üì± 30-Second Summary
**Learn Python for data science in 3 hours.** Master variables, strings, functions, and data structures through real examples like cleaning customer data and handling errors. Skip "hello world" ‚Äî go straight to practical skills you'll use daily in data work.

**üéØ You'll build**: Text cleaning functions, error-handling data processors, and reusable code patterns.

## üöÄ Quick Start Options

**Choose your learning style:**

1. **üìì Interactive Learning** (Recommended): Run each cell below step-by-step
2. **üé¨ Full Demo**: Run the complete script first, then experiment
3. **üß™ Playground First**: Jump straight to experimentation

---

# üìñ Core Concepts

Let's learn Python the practical way - with real examples you'll actually use!

## üî¢ Variables and Data Types

Python is **dynamically typed** - it figures out data types automatically. This makes it beginner-friendly!

In [None]:
# üéØ Real-world example: Customer data
customer_name = "Alice Johnson"  # str (string) - any text
customer_age = 28                  # int (integer) - whole numbers
account_balance = 1250.75         # float - decimal numbers
is_premium = True                  # bool (boolean) - True/False

# üé® f-strings: Modern way to format text
print(f"üëã Customer: {customer_name}")
print(f"üí∞ Balance: ${account_balance:,.2f}")  # :,.2f adds commas + 2 decimals
print(f"‚≠ê Premium: {is_premium}")

# üîç Check data types
print(f"\nüìä Data Types:")
print(f"'{customer_name}' is a {type(customer_name).__name__}")
print(f"{customer_age} is an {type(customer_age).__name__}")
print(f"{account_balance} is a {type(account_balance).__name__}")
print(f"{is_premium} is a {type(is_premium).__name__}")

## üßµ String Manipulation

**Why strings matter in data science**: Real data is messy! Names come as "john", "JOHN", "  John  ". String methods clean this up.

In [None]:
# üßπ Common data cleaning scenario
messy_name = "  john doe  "
messy_company = "DATA CORP"
messy_email = "JOHN.DOE@COMPANY.COM"

# üéØ String methods that save your life
clean_name = messy_name.strip().title()  # Remove spaces, capitalize properly
clean_company = messy_company.title()    # Fix capitalization
clean_email = messy_email.lower()        # Emails should be lowercase

print("üßπ Before and After:")
print(f"Name: '{messy_name}' ‚Üí '{clean_name}'")
print(f"Company: '{messy_company}' ‚Üí '{clean_company}'")
print(f"Email: '{messy_email}' ‚Üí '{clean_email}')")

# üîÄ Split and join - super useful for data processing
full_name = "Alice Johnson Smith"
name_parts = full_name.split(" ")  # Split into list
first_name = name_parts[0]
last_name = name_parts[-1]  # -1 gets the last item

print(f"\n‚úÇÔ∏è Name splitting:")
print(f"Full name: {full_name}")
print(f"Parts: {name_parts}")
print(f"First: {first_name}, Last: {last_name}")

## üóÇÔ∏è Data Structures

Choose the right tool for the job! Each data structure has a specific purpose.

In [None]:
# üìã Lists: Ordered, allows duplicates (like a to-do list)
shopping_list = ['apples', 'bread', 'milk', 'apples']  # Duplicates OK
shopping_list.append('eggs')  # Add item
print(f"üõí Shopping: {shopping_list}")

# üéØ Sets: Unique items only (like a membership roster)
unique_customers = {'Alice', 'Bob', 'Alice', 'Charlie'}  # No duplicates
unique_customers.add('Diana')  # Add member
print(f"üë• Unique customers: {unique_customers}")

# üóÉÔ∏è Dictionaries: Key-value pairs (like a phone book)
customer_info = {
    'name': 'Alice Johnson',
    'age': 28,
    'city': 'New York',
    'premium': True
}
customer_info['email'] = 'alice@email.com'  # Add new info
print(f"üìá Customer: {customer_info}")
print(f"üìß Email: {customer_info['email']}")

# üìç Tuples: Immutable coordinates (like GPS coordinates)
office_location = (40.7128, -74.0060)  # (latitude, longitude)
print(f"üåç Office location: {office_location}")

## üîÄ Control Flow

Make decisions and repeat tasks - the foundation of all programming logic.

In [None]:
# üéØ Real business logic example
account_balance = 1250.75
transaction_amount = 200.00

# üîç if/elif/else for decision making
if account_balance >= transaction_amount:
    new_balance = account_balance - transaction_amount
    print(f"‚úÖ Transaction approved! New balance: ${new_balance:,.2f}")
elif account_balance >= transaction_amount * 0.9:  # 90% coverage
    print(f"‚ö†Ô∏è Low balance warning! Current: ${account_balance:,.2f}")
else:
    print(f"‚ùå Transaction declined! Insufficient funds.")

# üîÑ Loops for automation
print(f"\nüìä Processing transactions...")
transactions = [100, 250, 75, 500, 150]

total_processed = 0
for amount in transactions:
    total_processed += amount
    print(f"üí≥ Processed: ${amount:,.2f} (Running total: ${total_processed:,.2f})")

print(f"\nüí∞ Total transactions processed: ${total_processed:,.2f}")

## ‚öôÔ∏è Functions

Write reusable code that does one thing well. Functions are the building blocks of clean, maintainable code.

In [None]:
# üßπ Real data cleaning function
def clean_customer_name(raw_name, default="Unknown"):
    """
    Clean and standardize customer names.
    Args: raw_name (str), default (str)
    Returns: str (cleaned name)
    """
    if not raw_name or not raw_name.strip():
        return default
    
    # Clean: remove extra spaces, proper capitalization
    cleaned = raw_name.strip().title()
    
    # Remove multiple spaces
    while '  ' in cleaned:
        cleaned = cleaned.replace('  ', ' ')
    
    return cleaned

# üß™ Test the function with messy data
messy_names = ["  alice   johnson  ", "BOB SMITH", "", "   ", "charlie brown"]

print("üßπ Name cleaning results:")
for name in messy_names:
    clean = clean_customer_name(name)
    print(f"'{name}' ‚Üí '{clean}'")

## üõ°Ô∏è Error Handling

Real data is messy and unpredictable. Handle errors gracefully to prevent crashes.

In [None]:
# üéØ Real scenario: Converting user input to numbers
def safe_number_convert(value, default=0):
    """Safely convert messy input to numbers"""
    try:
        # Try to convert to float first (handles ints too)
        return float(value)
    except ValueError:
        print(f"‚ö†Ô∏è Could not convert '{value}' to number, using {default}")
        return default
    except Exception as e:
        print(f"‚ùå Unexpected error: {e}")
        return default

# üß™ Test with realistic messy data
messy_data = ["123.45", "67", "abc", "", "45.67x", None, "99.99"]

print("üî¢ Number conversion results:")
total = 0
for item in messy_data:
    number = safe_number_convert(item)
    total += number
    print(f"Input: {item} ‚Üí Output: {number}")

print(f"\nüí∞ Total: ${total:,.2f}")

---

# üß™ Your Playground

**Time to experiment!** Try modifying the examples above or create your own:

In [None]:
# üéÆ Playground 1: Customer data processing
# Try changing the values and see what happens!

customer = {
    'name': '  JANE   DOE  ',
    'email': 'JANE.DOE@COMPANY.COM',
    'age': '25',  # Note: this is a string!
    'balance': '1500.75'
}

# Clean up the data
clean_customer = {
    'name': customer['name'].strip().title(),
    'email': customer['email'].lower(),
    'age': int(customer['age']),
    'balance': float(customer['balance'])
}

print("Before:", customer)
print("After:", clean_customer)

In [None]:
# üéÆ Playground 2: Build your own function
# Create a function that generates email addresses from names

def generate_email(first_name, last_name, domain="company.com"):
    """
    Generate professional email addresses
    Example: generate_email('John', 'Doe') ‚Üí 'john.doe@company.com'
    """
    # Your code here! Try to:
    # 1. Convert names to lowercase
    # 2. Remove extra spaces
    # 3. Combine with domain
    
    email = f"{first_name.strip().lower()}.{last_name.strip().lower()}@{domain}"
    return email

# Test your function
test_names = [
    ('Alice', 'Johnson'),
    ('  BOB  ', '  SMITH  '),
    ('Charlie', 'Brown')
]

print("üìß Generated emails:")
for first, last in test_names:
    email = generate_email(first, last)
    print(f"{first} {last} ‚Üí {email}")

In [None]:
# üéÆ Playground 3: Free exploration
# This cell is yours! Try anything you want:
# - Create variables with your own data
# - Write functions that solve real problems
# - Practice string methods and data structures

# Example: Analyze some text
text = "Python is awesome for data science and machine learning!"

print(f"Original: {text}")
print(f"Length: {len(text)} characters")
print(f"Words: {len(text.split())} words")
print(f"Uppercase: {text.upper()}")

# Your turn! Add your own experiments below:



---

# üéØ Practice Challenges

Ready to test your skills? Try these realistic scenarios:

In [None]:
# ü•â Challenge 1: Email validator
# Write a function that checks if an email looks valid
# (contains @ and a domain)

def is_valid_email(email):
    """
    Basic email validation
    Should return True for valid emails, False otherwise
    """
    # Your code here!
    # Hints: Check for '@' and '.' in the right places
    pass  # Replace this with your solution

# Test cases (uncomment when ready)
# test_emails = ['user@domain.com', 'invalid.email', 'test@test', 'good@example.org']
# for email in test_emails:
#     result = is_valid_email(email)
#     print(f'{email} ‚Üí {result}')

In [None]:
# ü•à Challenge 2: Data aggregator
# Calculate total sales by region from messy data

sales_data = [
    {'region': '  North  ', 'amount': '1000.50'},
    {'region': 'SOUTH', 'amount': '750.25'},
    {'region': 'north', 'amount': '500.00'},
    {'region': 'South  ', 'amount': '1200.75'},
    {'region': 'EAST', 'amount': '800.00'},
    {'region': 'east', 'amount': '650.50'}
]

# Your task: Clean the data and sum by region
# Expected output: {'North': 1500.50, 'South': 1951.00, 'East': 1450.50}

def aggregate_sales(data):
    """Aggregate sales data by region"""
    results = {}
    # Your code here!
    # Hints: Clean region names, convert amounts to float, sum by region
    return results

# Test your function
# totals = aggregate_sales(sales_data)
# print('Sales by region:', totals)

In [None]:
# ü•á Challenge 3: Error-proof CSV processor
# Process a list of customer records, handling all possible errors

messy_customers = [
    'Alice,28,alice@email.com,1500.75',
    'Bob,thirty,bob.email.com,abc',
    'Charlie,,charlie@test.com,2500.00',
    ',25,diana@email.com,1000.50',
    'Eve,22,eve@email.com'  # Missing balance
]

def process_customers(raw_data):
    """
    Process messy customer data with full error handling
    Return list of clean customer dictionaries
    """
    clean_customers = []
    
    # Your code here!
    # Handle: missing fields, invalid data types, malformed emails
    # Use try/except blocks and provide sensible defaults
    
    return clean_customers

# Test your function
# processed = process_customers(messy_customers)
# for customer in processed:
#     print(customer)

---

# üìö Quick Reference Cheat Sheet

Keep this handy while you code!

## üßµ Essential String Methods
```python
# Clean messy text (use these daily!)
name = "  JOHN DOE  "
clean = name.strip().title()  # "John Doe"
email = name.lower().replace(" ", ".") + "@company.com"

# Split and join
words = "apple,banana,orange".split(",")  # ['apple', 'banana', 'orange']
sentence = " ".join(words)  # "apple banana orange"
```

## üóÇÔ∏è Data Structure Quick Picks
```python
# When to use what:
my_list = [1, 2, 3, 2]        # Order matters, allows duplicates
my_set = {1, 2, 3}            # Unique items only, fast lookups
my_dict = {"key": "value"}    # Key-value pairs, fast access
my_tuple = (1, 2, 3)          # Immutable, perfect for coordinates
```

## üõ°Ô∏è Error Handling Pattern
```python
# Always use this pattern for data processing
try:
    result = risky_operation(data)
except ValueError as e:
    print(f"Data error: {e}")
    result = default_value
except Exception as e:
    print(f"Unexpected error: {e}")
    result = None
```

---

# ‚úÖ Self-Assessment

**üü¢ Beginner Checkpoints**:
- [ ] I can create and use variables of different types
- [ ] I can clean messy text data using string methods
- [ ] I can choose the right data structure for different tasks
- [ ] I can write functions with proper error handling

**üü° Intermediate Challenges**:
- [ ] I can process real messy datasets without crashing
- [ ] I can build reusable functions for data cleaning
- [ ] I can handle edge cases and unexpected input gracefully
- [ ] I can debug my own code when things go wrong

**Ready for Module 2?** ‚úÖ You should feel confident with strings, functions, and basic error handling!

---

## üöÄ Next Steps

**Congratulations!** üéâ You've mastered Python fundamentals!

**Continue your journey:**
- üìä **[Module 2: Data Manipulation](../2_data_manipulation/)** - Master NumPy & Pandas
- üìà **[Module 3: Data Visualization](../3_data_visualization/)** - Create stunning charts
- ü§ñ **[Module 4: Statistics & ML](../4_statistics_ml/)** - Build predictive models

**Keep practicing!** The best way to learn programming is by doing. Try building small projects with what you've learned.