# Notebook 1: Introduction to Google Colab and Python Basics

## SECTION 1: GETTING STARTED WITH GOOGLE COLAB

Welcome to your first Python for Data Analysis class! This notebook will introduce you to Google Colab and Python basics, with a special focus on helping SQL users transition to Python.

In [None]:
# This is a code cell. You can run Python code here by clicking the play button
# on the left or pressing Shift+Enter.
print("Welcome to Google Colab!")

# EXERCISE: Run this cell to verify your environment is working properly.

In [None]:
# Google Colab provides many built-in Python libraries
# Let's check our Python version
import sys
print(f"Python version: {sys.version}")

## SECTION 2: PYTHON VARIABLES AND DATA TYPES

In Python, you can create variables without declaring their type. This is different from SQL where column types must be defined in tables.

In [None]:
# Numeric types
integer_value = 42
float_value = 3.14159

# Text type
text_string = "Hello, Data Analyst!"
another_string = 'Single quotes work too'

# Boolean type
is_active = True
has_data = False

# The print() function outputs values to the console
print(integer_value)
print(float_value)
print(text_string)
print(is_active)

In [None]:
# You can check variable types with the type() function
print(type(integer_value))  # <class 'int'>
print(type(float_value))    # <class 'float'>
print(type(text_string))    # <class 'str'>
print(type(is_active))      # <class 'bool'>

### EXERCISE: Create your own variables with different data types and print their values and types.

Try creating variables to represent data you might find in an e-commerce dataset (like those in the Olist database).

In [None]:
# Your code here:



## SECTION 3: BASIC OPERATIONS

In [None]:
# Arithmetic operations
addition = 10 + 5
subtraction = 10 - 5
multiplication = 10 * 5
division = 10 / 5  # Returns float
integer_division = 10 // 3  # Returns integer part only
remainder = 10 % 3  # Returns remainder
exponentiation = 10 ** 2  # 10 to the power of 2

print(f"Addition: {addition}")
print(f"Subtraction: {subtraction}")
print(f"Multiplication: {multiplication}")
print(f"Division: {division}")
print(f"Integer Division: {integer_division}")
print(f"Remainder: {remainder}")
print(f"Exponentiation: {exponentiation}")

In [None]:
# String operations
first_name = "Data"
last_name = "Analyst"

# Concatenation
full_name = first_name + " " + last_name
print(full_name)

# String methods
print(full_name.upper())  # Convert to uppercase
print(full_name.lower())  # Convert to lowercase
print(len(full_name))     # String length
print(full_name.replace("Data", "Python"))  # Replace substring

# String formatting (f-strings, introduced in Python 3.6)
age = 25
experience = 3.5
intro = f"I am a {full_name}, {age} years old with {experience} years of experience."
print(intro)

### EXERCISE: Create and manipulate your own variables to practice these operations.

Try performing calculations you might do with an e-commerce dataset, like calculating total order values or formatting product information.

In [None]:
# Your code here:



## SECTION 4: DATA STRUCTURES

### LISTS
Similar to arrays in other languages, can store mixed data types

In [None]:
# Creating a list
products = ["Laptop", "Headphones", "Mouse", "Keyboard"]
prices = [1200, 150, 25, 60]
mixed_list = ["Product", 42, 3.14, True, None]

print(products)
print(prices)
print(mixed_list)

# Accessing list elements (0-indexed)
print(f"First product: {products[0]}")  # Laptop
print(f"Last product: {products[-1]}")  # Keyboard

# Slicing lists [start:end:step]
print(f"First two products: {products[0:2]}")  # ['Laptop', 'Headphones']
print(f"Every other price: {prices[::2]}")     # [1200, 25]

In [None]:
# List methods
products.append("Monitor")  # Add to end
print(products)

products.insert(1, "Tablet")  # Insert at specific position
print(products)

products.remove("Mouse")  # Remove specific value
print(products)

popped_item = products.pop()  # Remove and return last item
print(f"Removed: {popped_item}, Remaining: {products}")

# List operations
total_items = len(products)
print(f"Total products: {total_items}")

sorted_prices = sorted(prices)
print(f"Sorted prices: {sorted_prices}")

### DICTIONARIES
Key-value pairs (similar to JSON)

In SQL, this might be like a row in a table with column names as keys

In [None]:
product = {
    "name": "Laptop",
    "brand": "TechBrand",
    "price": 1200,
    "in_stock": True,
    "specs": {
        "cpu": "Intel i7",
        "ram": "16GB",
        "storage": "512GB SSD"
    }
}

print(product)

# Accessing dictionary values
print(f"Product name: {product['name']}")
print(f"CPU: {product['specs']['cpu']}")

# Dictionary methods
print(f"Keys: {product.keys()}")
print(f"Values: {product.values()}")
print(f"Items: {product.items()}")

# Adding/updating values
product["color"] = "Silver"
product["price"] = 1100  # Update existing key
print(product)

# Safe access with get() (returns None or default if key doesn't exist)
print(product.get("weight", "Not specified"))

### EXERCISE: Create your own lists and dictionaries and practice accessing and modifying them.

Try creating data structures that might represent e-commerce orders, customers, or products.

In [None]:
# Your code here:



## SECTION 5: CONTROL FLOW

### IF STATEMENTS

In [None]:
price = 75

if price > 100:
    print("High price")
elif price > 50:
    print("Medium price")
else:
    print("Low price")

# Multiple conditions
in_stock = True
on_sale = True

if in_stock and on_sale:
    print("Available and on sale!")
elif in_stock or on_sale:
    print("Either available or on sale")
else:
    print("Neither available nor on sale")

### LOOPS

In [None]:
# For loop with list
print("Products:")
for product in products:
    print(f"- {product}")

# For loop with range
print("Numbers from 0 to 4:")
for i in range(5):  # 0, 1, 2, 3, 4
    print(i)

# For loop with range (start, stop, step)
print("Even numbers from 2 to 10:")
for i in range(2, 11, 2):  # 2, 4, 6, 8, 10
    print(i)

# For loop with dictionary
print("Product details:")
for key, value in product.items():
    # Skip nested dictionaries for cleaner output
    if isinstance(value, dict):
        continue
    print(f"{key}: {value}")

In [None]:
# While loop
count = 5
print("Countdown:")
while count > 0:
    print(count)
    count -= 1
print("Go!")

# Loop control
print("Numbers divisible by 3 (from 1 to 10):")
for i in range(1, 11):
    if i % 3 != 0:
        continue  # Skip to next iteration
    print(i)

print("Loop until finding 'Mouse':")
for item in ["Laptop", "Keyboard", "Mouse", "Monitor"]:
    print(item)
    if item == "Mouse":
        print("Found Mouse!")
        break  # Exit loop

### EXERCISE: Write your own control flow examples with if statements and loops.

Try creating examples that handle situations from an e-commerce context, like processing orders with different statuses or calculating shipping based on customer location.

In [None]:
# Your code here:



## SECTION 6: FUNCTIONS

In [None]:
# Defining a simple function
def greet(name):
    return f"Hello, {name}!"

# Calling the function
message = greet("Data Analyst")
print(message)

In [None]:
# Function with multiple parameters
def calculate_total(price, quantity, tax_rate=0.1):
    subtotal = price * quantity
    tax = subtotal * tax_rate
    return subtotal + tax

# Calling with positional arguments
total1 = calculate_total(19.99, 3)
print(f"Total 1: ${total1:.2f}")

# Calling with keyword arguments
total2 = calculate_total(price=24.99, quantity=2, tax_rate=0.05)
print(f"Total 2: ${total2:.2f}")

In [None]:
# Function with multiple return values
def get_product_stats(products):
    count = len(products)
    if count == 0:
        return 0, None, None
    
    min_product = min(products, key=len)
    max_product = max(products, key=len)
    return count, min_product, max_product

# Unpacking multiple return values
count, shortest, longest = get_product_stats(["Laptop", "Mouse", "Keyboard", "Monitor"])
print(f"Products: {count}, Shortest: {shortest}, Longest: {longest}")

### EXERCISE: Write your own functions to solve problems related to data processing.

Try creating functions that would be useful in an e-commerce context, like calculating shipping costs, formatting product details, or filtering products by category.

In [None]:
# Your code here:



## SECTION 7: SQL TO PYTHON TRANSLATION

In SQL, you use queries to retrieve and manipulate data. In Python, we often use pandas (imported as pd) for similar operations.

In [None]:
import pandas as pd

# Create a DataFrame (similar to a SQL table)
data = {
    'product_id': [1, 2, 3, 4, 5],
    'name': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Headphones'],
    'category': ['Computer', 'Accessory', 'Accessory', 'Computer', 'Audio'],
    'price': [1200, 25, 60, 300, 150],
    'in_stock': [True, True, False, True, True]
}

products_df = pd.DataFrame(data)
print("Products DataFrame:")
print(products_df)

In [None]:
# SQL: SELECT * FROM products
# Python:
print("\nAll products (SELECT *):\n")
print(products_df)

# SQL: SELECT name, price FROM products
# Python:
print("\nSelect specific columns:\n")
print(products_df[['name', 'price']])

# SQL: SELECT * FROM products WHERE category = 'Accessory'
# Python:
print("\nFiltering (WHERE):\n")
accessories = products_df[products_df['category'] == 'Accessory']
print(accessories)

In [None]:
# SQL: SELECT category, AVG(price) FROM products GROUP BY category
# Python:
print("\nGrouping and aggregation (GROUP BY):\n")
category_avg = products_df.groupby('category')['price'].mean().reset_index()
print(category_avg)

# SQL: SELECT * FROM products ORDER BY price DESC
# Python:
print("\nSorting (ORDER BY):\n")
sorted_products = products_df.sort_values('price', ascending=False)
print(sorted_products)

# SQL: SELECT category, COUNT(*) FROM products GROUP BY category HAVING COUNT(*) > 1
# Python:
print("\nGROUP BY with HAVING:\n")
category_counts = products_df.groupby('category').size().reset_index(name='count')
print(category_counts[category_counts['count'] > 1])

### EXERCISE: Try writing your own pandas operations to mimic SQL queries.

Practice translating SQL queries you might use in your work to their pandas equivalents.

In [None]:
# Your code here:



## SECTION 8: PUTTING IT ALL TOGETHER

Let's create a function that performs a complete data analysis workflow.

In [None]:
def analyze_product_data(data):
    # Create DataFrame
    df = pd.DataFrame(data)
    
    # Calculate basic statistics
    total_products = len(df)
    avg_price = df['price'].mean()
    in_stock_count = df['in_stock'].sum()
    
    # Group by category
    category_stats = df.groupby('category').agg({
        'product_id': 'count',
        'price': ['mean', 'min', 'max'],
        'in_stock': 'sum'
    })
    
    # Format and return results
    print(f"Total Products: {total_products}")
    print(f"Average Price: ${avg_price:.2f}")
    print(f"In-Stock Products: {in_stock_count} ({in_stock_count/total_products:.1%})")
    print("\nCategory Statistics:")
    print(category_stats)
    
    # Return the most expensive product
    most_expensive = df.loc[df['price'].idxmax()]
    return most_expensive

# Run the analysis
print("\nProduct Analysis Results:")
top_product = analyze_product_data(data)
print("\nMost Expensive Product:")
print(top_product)

### FINAL EXERCISE: Expand the analyze_product_data function to include more analyses.

For example, add calculations for median price, price ranges, or out-of-stock percentages.

In [None]:
# Your code here:



## NEXT STEPS

Congratulations on completing this introduction to Python and Google Colab! 

### Tomorrow's class (Thursday, April 10)
We'll dive deeper into data structures and control flow, focusing on:
- Advanced data structures (nested lists and dictionaries)
- Complex control flow patterns
- List and dictionary comprehensions
- Error handling

### Assignment
Please complete the Week 1 Minor Assignment before next Wednesday's class. The instructions and starter code are available in the course repository.