# Python Basics for Data Science and Web Development

This notebook covers Python fundamentals that are essential for working with Databricks, Streamlit, and Pydantic.

## Learning Objectives
- Understand Python data types and structures
- Master control flow and functions
- Learn object-oriented programming basics
- Explore Python's ecosystem for data science


## 1. Data Types and Variables

Python has several built-in data types that are fundamental for data processing:

In [None]:
# Basic data types
name = "Alice"              # String
age = 30                    # Integer
salary = 75000.50          # Float
is_employee = True         # Boolean
department = None          # NoneType

print(f"Name: {name} (type: {type(name)})")
print(f"Age: {age} (type: {type(age)})")
print(f"Salary: {salary} (type: {type(salary)})")
print(f"Is Employee: {is_employee} (type: {type(is_employee)})")
print(f"Department: {department} (type: {type(department)})")

### String Operations

In [None]:
# String operations - essential for data processing
text = "  Python Data Science  "

print(f"Original: '{text}'")
print(f"Stripped: '{text.strip()}'")
print(f"Lower: '{text.lower()}'")
print(f"Upper: '{text.upper()}'")
print(f"Split: {text.strip().split()}'")

# F-string formatting (very useful for Streamlit)
user_data = {"name": "John", "score": 85.7}
formatted_message = f"User {user_data['name']} achieved a score of {user_data['score']:.1f}%"
print(formatted_message)

## 2. Data Structures

Understanding Python's data structures is crucial for data manipulation:

In [None]:
# Lists - ordered, mutable collections
employees = ["Alice", "Bob", "Charlie", "Diana"]
salaries = [70000, 80000, 75000, 90000]

print(f"Employees: {employees}")
print(f"First employee: {employees[0]}")
print(f"Last employee: {employees[-1]}")
print(f"Number of employees: {len(employees)}")

# List operations
employees.append("Eve")
salaries.append(85000)
print(f"After adding Eve: {employees}")

In [None]:
# Dictionaries - key-value pairs (like JSON)
employee_record = {
    "id": 1001,
    "name": "Alice Johnson",
    "department": "Engineering",
    "salary": 75000,
    "skills": ["Python", "SQL", "Machine Learning"]
}

print(f"Employee: {employee_record['name']}")
print(f"Department: {employee_record['department']}")
print(f"Skills: {', '.join(employee_record['skills'])}")

# Dictionary methods
print(f"Keys: {list(employee_record.keys())}")
print(f"Has 'email' field: {'email' in employee_record}")

In [None]:
# Sets - unique collections
python_skills = {"pandas", "numpy", "matplotlib", "scikit-learn"}
data_skills = {"pandas", "numpy", "sql", "tableau"}

print(f"Python skills: {python_skills}")
print(f"Data skills: {data_skills}")
print(f"Common skills: {python_skills.intersection(data_skills)}")
print(f"All skills: {python_skills.union(data_skills)}")

## 3. Control Flow

Control flow is essential for data processing and business logic:

In [None]:
# If statements - decision making
def categorize_salary(salary):
    if salary >= 90000:
        return "Senior"
    elif salary >= 70000:
        return "Mid-level"
    else:
        return "Junior"

# Test the function
test_salaries = [65000, 75000, 95000]
for salary in test_salaries:
    category = categorize_salary(salary)
    print(f"Salary ${salary:,} is {category} level")

In [None]:
# Loops - iteration patterns
employees_data = [
    {"name": "Alice", "department": "Engineering", "salary": 75000},
    {"name": "Bob", "department": "Marketing", "salary": 68000},
    {"name": "Charlie", "department": "Engineering", "salary": 82000},
    {"name": "Diana", "department": "Sales", "salary": 71000}
]

# For loop with enumerate
print("Employee Summary:")
for i, employee in enumerate(employees_data, 1):
    level = categorize_salary(employee['salary'])
    print(f"{i}. {employee['name']} - {employee['department']} ({level})")

In [None]:
# List comprehensions - Pythonic way to create lists
# Get all engineering salaries
engineering_salaries = [emp['salary'] for emp in employees_data if emp['department'] == 'Engineering']
print(f"Engineering salaries: {engineering_salaries}")

# Create salary categories for all employees
salary_categories = [(emp['name'], categorize_salary(emp['salary'])) for emp in employees_data]
print(f"Salary categories: {salary_categories}")

# Dictionary comprehension
name_to_salary = {emp['name']: emp['salary'] for emp in employees_data}
print(f"Name to salary mapping: {name_to_salary}")

## 4. Functions

Functions are building blocks for modular, reusable code:

In [None]:
# Function with default arguments
def calculate_bonus(salary, performance_rating=3, bonus_percentage=0.1):
    """
    Calculate employee bonus based on salary and performance.
    
    Args:
        salary (float): Employee's base salary
        performance_rating (int): Performance rating (1-5)
        bonus_percentage (float): Base bonus percentage
    
    Returns:
        float: Calculated bonus amount
    """
    multiplier = {
        1: 0.5,   # Below expectations
        2: 0.75,  # Meets some expectations  
        3: 1.0,   # Meets expectations
        4: 1.25,  # Exceeds expectations
        5: 1.5    # Outstanding
    }.get(performance_rating, 1.0)
    
    return salary * bonus_percentage * multiplier

# Test the function
test_cases = [
    (75000, 3),
    (80000, 5),
    (70000, 2)
]

for salary, rating in test_cases:
    bonus = calculate_bonus(salary, rating)
    print(f"Salary: ${salary:,}, Rating: {rating}, Bonus: ${bonus:,.2f}")

In [None]:
# Lambda functions - for simple operations
# Useful for data transformations
employees = ["alice johnson", "bob smith", "charlie brown"]

# Using lambda with map
formatted_names = list(map(lambda name: name.title(), employees))
print(f"Formatted names: {formatted_names}")

# Using lambda with filter
long_names = list(filter(lambda name: len(name) > 10, employees))
print(f"Long names: {long_names}")

# Using lambda with sorted
sorted_by_length = sorted(employees, key=lambda name: len(name))
print(f"Sorted by length: {sorted_by_length}")

## 5. Object-Oriented Programming

Classes and objects are fundamental for Pydantic models:

In [None]:
class Employee:
    """Employee class demonstrating OOP concepts."""
    
    # Class variable
    company = "TechCorp"
    
    def __init__(self, name, department, salary):
        """Initialize an Employee instance."""
        self.name = name
        self.department = department
        self.salary = salary
        self._performance_history = []  # Private attribute
    
    def __str__(self):
        """String representation of the employee."""
        return f"{self.name} - {self.department}"
    
    def __repr__(self):
        """Developer representation of the employee."""
        return f"Employee('{self.name}', '{self.department}', {self.salary})"
    
    def add_performance_review(self, rating, notes=""):
        """Add a performance review."""
        review = {
            "rating": rating,
            "notes": notes,
            "date": "2024-01-01"  # Simplified for demo
        }
        self._performance_history.append(review)
    
    def get_average_performance(self):
        """Calculate average performance rating."""
        if not self._performance_history:
            return None
        ratings = [review['rating'] for review in self._performance_history]
        return sum(ratings) / len(ratings)
    
    def promote(self, new_department, salary_increase):
        """Promote employee to new department with salary increase."""
        old_dept = self.department
        old_salary = self.salary
        
        self.department = new_department
        self.salary += salary_increase
        
        print(f"Promoted {self.name} from {old_dept} to {new_department}")
        print(f"Salary increased from ${old_salary:,} to ${self.salary:,}")

# Create and test Employee objects
alice = Employee("Alice Johnson", "Engineering", 75000)
bob = Employee("Bob Smith", "Marketing", 68000)

print(f"Employee: {alice}")
print(f"Representation: {repr(alice)}")
print(f"Company: {alice.company}")

In [None]:
# Add performance reviews and test methods
alice.add_performance_review(4, "Excellent technical skills")
alice.add_performance_review(5, "Led successful project")
alice.add_performance_review(4, "Great team collaboration")

print(f"Alice's average performance: {alice.get_average_performance():.1f}")

# Promote Alice
alice.promote("Senior Engineering", 10000)

## 6. Error Handling

Proper error handling is crucial for robust applications:

In [None]:
def safe_divide(a, b):
    """Safely divide two numbers with error handling."""
    try:
        result = a / b
        return result
    except ZeroDivisionError:
        print(f"Error: Cannot divide {a} by zero")
        return None
    except TypeError as e:
        print(f"Error: Invalid types for division - {e}")
        return None
    except Exception as e:
        print(f"Unexpected error: {e}")
        return None

# Test error handling
test_cases = [
    (10, 2),      # Normal case
    (10, 0),      # Division by zero
    ("10", 2),    # Type error
    (10, "abc")   # Type error
]

for a, b in test_cases:
    result = safe_divide(a, b)
    if result is not None:
        print(f"{a} / {b} = {result}")
    print()

## 7. Working with Files and JSON

File operations are essential for data processing:

In [None]:
import json
from pathlib import Path

# Sample data
employees_data = [
    {"id": 1, "name": "Alice Johnson", "department": "Engineering", "salary": 75000},
    {"id": 2, "name": "Bob Smith", "department": "Marketing", "salary": 68000},
    {"id": 3, "name": "Charlie Brown", "department": "Engineering", "salary": 82000}
]

# Write to JSON file
data_file = Path("../examples/employees.json")
data_file.parent.mkdir(exist_ok=True)  # Create directory if it doesn't exist

with open(data_file, 'w') as f:
    json.dump(employees_data, f, indent=2)

print(f"Data written to {data_file}")

# Read from JSON file
with open(data_file, 'r') as f:
    loaded_data = json.load(f)

print(f"Loaded {len(loaded_data)} employee records:")
for employee in loaded_data:
    print(f"  {employee['name']} - ${employee['salary']:,}")

## 8. Introduction to Python Libraries for Data Science

Brief introduction to essential libraries:

In [None]:
# Import commonly used libraries
import datetime as dt
import random
from collections import Counter, defaultdict
from typing import List, Dict, Optional

# Working with dates
today = dt.date.today()
hire_date = dt.date(2020, 1, 15)
years_employed = (today - hire_date).days / 365.25

print(f"Today: {today}")
print(f"Hire date: {hire_date}")
print(f"Years employed: {years_employed:.1f}")

# Counter for data analysis
departments = [emp['department'] for emp in employees_data]
dept_counts = Counter(departments)
print(f"Department distribution: {dict(dept_counts)}")

# defaultdict for grouping
employees_by_dept = defaultdict(list)
for emp in employees_data:
    employees_by_dept[emp['department']].append(emp['name'])

print(f"Employees by department: {dict(employees_by_dept)}")

## 9. Type Hints (Foundation for Pydantic)

Type hints improve code clarity and enable better tooling:

In [None]:
from typing import List, Dict, Optional, Union

def calculate_team_stats(employees: List[Dict[str, Union[str, int, float]]]) -> Dict[str, float]:
    """
    Calculate statistics for a team of employees.
    
    Args:
        employees: List of employee dictionaries
    
    Returns:
        Dictionary with team statistics
    """
    if not employees:
        return {"avg_salary": 0.0, "total_employees": 0, "total_payroll": 0.0}
    
    salaries = [emp['salary'] for emp in employees]
    
    return {
        "avg_salary": sum(salaries) / len(salaries),
        "total_employees": len(employees),
        "total_payroll": sum(salaries),
        "min_salary": min(salaries),
        "max_salary": max(salaries)
    }

# Test with type hints
stats = calculate_team_stats(employees_data)
print("Team Statistics:")
for key, value in stats.items():
    if isinstance(value, float):
        print(f"  {key}: ${value:,.2f}" if 'salary' in key or 'payroll' in key else f"  {key}: {value:.1f}")
    else:
        print(f"  {key}: {value}")

## 10. Practice Exercises

Try these exercises to reinforce your learning:

In [None]:
# Exercise 1: Create a function to filter employees by criteria
def filter_employees(employees: List[Dict], **criteria) -> List[Dict]:
    """
    Filter employees based on provided criteria.
    
    Example usage:
    filter_employees(data, department="Engineering", min_salary=70000)
    """
    filtered = []
    
    for emp in employees:
        match = True
        
        for key, value in criteria.items():
            if key.startswith('min_'):
                field = key[4:]  # Remove 'min_' prefix
                if emp.get(field, 0) < value:
                    match = False
                    break
            elif key.startswith('max_'):
                field = key[4:]  # Remove 'max_' prefix
                if emp.get(field, float('inf')) > value:
                    match = False
                    break
            else:
                if emp.get(key) != value:
                    match = False
                    break
        
        if match:
            filtered.append(emp)
    
    return filtered

# Test the filter function
engineering_high_earners = filter_employees(
    employees_data, 
    department="Engineering", 
    min_salary=75000
)

print("High-earning Engineers:")
for emp in engineering_high_earners:
    print(f"  {emp['name']} - ${emp['salary']:,}")

## Summary

This notebook covered the essential Python concepts you'll need for:

### For Pydantic:
- Data types and type hints
- Classes and object-oriented programming
- Error handling
- JSON serialization

### For Databricks:
- Data structures (lists, dictionaries)
- Control flow and functions
- File operations
- Data processing patterns

### For Streamlit:
- String formatting
- Functions for modular code
- Error handling for user interactions
- Data visualization preparation

## Next Steps
1. Practice with the exercises above
2. Move on to the Pydantic notebook for data modeling
3. Explore Databricks integration patterns
4. Build interactive Streamlit applications
