# Python Programming Course Syllabus

Welcome to the Python Programming Course! This course is designed to introduce you to the fundamentals of programming using Python, focusing on computational thinking and problem-solving. It's a **self-paced**, **interactive**, and **open-source** learning experience. You'll engage with practical exercises, mini-projects, and unit tests directly within Jupyter notebooks, guided by the excellent 'Think Python 2e' textbook and the 'Python Data Science Handbook'. Let's get started on your journey to thinking like a computer scientist!

## Course Objectives

Upon successful completion of this course, students will be able to:
*   Understand basic Python syntax, data types, and control flow.
*   Write, debug, and troubleshoot Python programs effectively.
*   Work with fundamental data structures like strings, lists, dictionaries, and tuples.
*   Implement modular programming practices using functions, modules, and packages.
*   Perform basic data manipulation and visualization using Pandas and Matplotlib.
*   Apply computational thinking to solve real-world problems using Python.

## Course Structure

This course is divided into four main modules, each building upon the last to solidify your understanding of Python programming and computational thinking. Each week focuses on a specific topic with interactive labs, mini-projects, and unit tests.



### Module 1: The Foundation (Weeks 1–4)



*   **The Environment & 'Hello World'**: [week_01](week_01.ipynb)


*   **Variables, Expressions, and Statements**: [week_02](week_02.ipynb)


*   **Logic and Conditionals**: [week_03](week_03.ipynb)


*   **Iteration (Loops)**: [week_04](week_04.ipynb)


### Module 2: Data Structures (Weeks 5–9)



*   **Strings**: [week_05](week_05.ipynb)


*   **Lists**: [week_06](week_06.ipynb)


*   **Dictionaries**: [week_07](week_07.ipynb)


*   **Tuples**: [week_08](week_08.ipynb)


*   **Files**: [week_09](week_09.ipynb)


### Module 3: Modular Programming (Weeks 10–12)



*   **Functions**: [week_10](week_10.ipynb)


*   **Debugging & Error Handling**: [week_11](week_11.ipynb)


*   **Modules and Packages**: [week_12](week_12.ipynb)


### Module 4: The Capstone (Weeks 13–14)



*   **Data Science with Pandas (Part 1)**: [week_13](week_13.ipynb)


*   **Data Visualization with Matplotlib**: [week_14](week_14.ipynb)


## Grading and Assessment

This course is self-paced and focuses on practical application. Your progress will primarily be assessed through:
*   **Interactive Labs**: Completion of in-notebook exercises.
*   **Mini-Projects**: Application of weekly concepts to solve small, focused problems.
*   **Unit Tests**: Ensuring your code functions as expected by passing provided test cases.
*   **Final Project (Optional)**: A comprehensive project applying skills from across the course (details to be provided separately).

## Resources

*   **Primary Textbook**: *Think Python 2e* by Allen B. Downey (available online: [https://greenteapress.com/wp/think-python-2e/](https://greenteapress.com/wp/think-python-2e/))
*   **Data Science Reference**: *Python Data Science Handbook* by Jake VanderPlas (available online: [https://jakevdp.github.io/PythonDataScienceHandbook/](https://jakevdp.github.io/PythonDataScienceHandbook/))
*   **Development Environment**: Google Colaboratory (Colab) - online Jupyter notebooks that require no setup.
*   **Community Support**: Online forums, Q&A sites, and the course's discussion channels.

## Important Notes

*   **Self-Paced**: You can work through the material at your own speed.
*   **Interactive**: Actively engage with the code cells and modify them to deepen your understanding.
*   **Open-Source**: The course materials are designed to be transparent and extensible.



## Navigational Links

[<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_02.ipynb)


# Week 1: The Environment & 'Hello World'

Welcome to the first week of your Python Programming Course! This module introduces you to the fundamental concepts of programming, setting up your environment, and writing your very first Python code. We'll cover basic operations, printing output, and understanding how to identify and fix common errors.

### Reading: Chapter 1 of 'Think Python 2e'

For this week's foundational concepts, please refer to Chapter 1 of our primary textbook:
[Think Python 2e - Chapter 1](https://greenteapress.com/wp/think-python-2e/)

## Interactive Lab: Getting Started

This section provides hands-on exercises to familiarize you with the Colab environment and basic Python syntax. Interact with the code cells and modify them to test your understanding.

#### Exercise 1: Basic Math Operations

Python can act as a powerful calculator. Experiment with addition, subtraction, multiplication, and division.

**Try It Yourself:** Calculate the sum, difference, product, and quotient of two numbers (e.g., 25 and 7).

In [None]:
# Your code for basic math operations here
num1 = 25
num2 = 7

print(f'Sum: {num1 + num2}')
print(f'Difference: {num1 - num2}')
print(f'Product: {num1 * num2}')
print(f'Quotient: {num1 / num2}') # Note: Division in Python 3 returns a float

#### Exercise 2: Printing Strings

The `print()` function is used to display output. You can print text (strings) by enclosing them in single or double quotes.

**Try It Yourself:** Print the phrase 'Hello, Python Learner!'

In [None]:
# Your code to print a string here
print('Hello, Python Learner!')

#### Exercise 3: Intentional Syntax Error

Understanding errors is crucial for programming. Let's intentionally create a syntax error to see how Python responds.

**Try It Yourself:** Run the following code and observe the error message. What went wrong?

In [None]:
# This code has an intentional syntax error
print('This line is missing a closing quote)

#### Exercise 4: Fixing the Syntax Error

Now, let's fix the error from the previous exercise. Pay attention to the error message; it often gives clues about what needs to be corrected.

**Try It Yourself:** Correct the syntax error in the code below so it runs successfully.

In [None]:
# Fix the syntax error here
print('This line is missing a closing quote')

## Hello Script Project

For your first mini-project, you will create a simple script that interacts with the user.

**Task:** Ask the user for their name and then print a personalized greeting using their input. For example, if the user enters 'Alice', the output should be 'Hello, Alice! Welcome to Python programming!'

In [None]:
# Your 'Hello Script' solution here
name = input('What is your name? ')
print(f'Hello, {name}! Welcome to Python programming!')

## Unit Tests for Hello Script Project

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Hello Script. Run them and verify the output.

In [None]:
# Helper function to test the Hello Script's logic (non-interactive)def run_hello_script_test(name_input):    return f'Hello, {name_input}! Welcome to Python programming!'

print('--- Running Hello Script Unit Tests ---')
# Test Case 1: Standard name
expected_output_1 = 'Hello, Alice! Welcome to Python programming!'
actual_output_1 = run_hello_script_test('Alice')
assert actual_output_1 == expected_output_1, f'Test 1 Failed: Expected {expected_output_1}, got {actual_output_1}'
print('Test 1 Passed: Standard name handled correctly.')

# Test Case 2: Name with spaces
expected_output_2 = 'Hello, John Doe! Welcome to Python programming!'
actual_output_2 = run_hello_script_test('John Doe')
assert actual_output_2 == expected_output_2, f'Test 2 Failed: Expected {expected_output_2}, got {actual_output_2}'
print('Test 2 Passed: Name with spaces handled correctly.')

# Test Case 3: Empty name (though input() prevents truly empty, check logic)
expected_output_3 = 'Hello, ! Welcome to Python programming!'
actual_output_3 = run_hello_script_test('')
assert actual_output_3 == expected_output_3, f'Test 3 Failed: Expected {expected_output_3}, got {actual_output_3}'
print('Test 3 Passed: Empty name handled correctly.')

print('
All Unit Tests Completed.')


## Hints/Solution (Optional, Expand to View)

This section contains a suggested implementation for the 'Hello Script' mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for 'Hello Script' mini-project
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

# name_solution = input('What is your name? ')
# print(f'Hello, {name_solution}! Welcome to Python programming!')


## Navigational Links

[<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_02.ipynb)


## Navigational Links

[<-- Previous Week](week_01.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_03.ipynb)


# Week 2: Variables, Expressions, and Statements

Welcome to Week 2! This week, we will delve into the fundamental building blocks of Python programming: variables, expressions, and statements. Understanding these concepts is crucial for writing any meaningful Python code, as they allow you to store data, perform operations, and control the flow of your programs.

### Reading: Chapter 2 of 'Think Python 2e'

For a comprehensive understanding of this week's topics, please refer to Chapter 2 of our primary textbook:
[Think Python 2e - Chapter 2](https://greenteapress.com/wp/think-python-2e/)

## Interactive Lab: Exploring Variables and Data Types

This section provides hands-on exercises to solidify your understanding of variables, data types, and basic operations in Python. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Understanding Data Types

Python handles several basic data types automatically. The most common ones you'll encounter are integers (`int`), floating-point numbers (`float`), and strings (`str`).

*   `int`: Whole numbers (e.g., 1, 100, -5)
*   `float`: Numbers with a decimal point (e.g., 1.0, 3.14, -0.5)
*   `str`: Sequences of characters, enclosed in single or double quotes (e.g., 'hello', "Python")

You can use the `type()` function to check the data type of any variable.

**Try It Yourself:** Create a variable for an integer, a float, and a string. Print the value and the type of each variable.

In [None]:
# Create variables of different types
my_integer = 10
my_float = 20.5
my_string = 'Hello Python'

# Print their values and types
print(f'Integer: {my_integer}, Type: {type(my_integer)}')
print(f'Float: {my_float}, Type: {type(my_float)}')
print(f'String: {my_string}, Type: {type(my_string)}')


#### Exercise 2: Variable Assignment and Basic Operations

Variables allow you to store values and refer to them by name throughout your program. You can perform various operations on these variables, including arithmetic operations.

**Try It Yourself:**
1.  Declare variables for `length` and `width` of a rectangle and calculate its `area`.
2.  Declare variables for `radius` of a circle and calculate its `area` (use `3.14159` for pi).
Print the calculated areas.

In [None]:
# Rectangle area calculation
length = 15
width = 8
rectangle_area = length * width
print(f'The area of the rectangle is: {rectangle_area}')

# Circle area calculation
radius = 7
pi = 3.14159
circle_area = pi * radius**2
print(f'The area of the circle is: {circle_area}')


#### Exercise 3: Type Conversion and Error Handling

Sometimes, Python needs to convert data from one type to another, or you might need to do it explicitly. For example, input from the user is always read as a string, even if they type a number. Trying to perform arithmetic operations on a string and a number will result in a `TypeError`.

Python provides built-in functions like `int()`, `float()`, and `str()` to convert between types.

**Try It Yourself:** The following code tries to add a user's input age (which is a string) to the number 10. This will cause a `TypeError`. Modify the code to correctly convert the user's input to an integer before performing the addition.

In [None]:
# This code will produce a TypeError. Fix it!
# current_age_str = input('Enter your age: ')
# future_age = current_age_str + 10 # This causes a TypeError
# print(f'In 10 years, you will be {future_age} years old.')

# Corrected code:
try:
    current_age_str = input('Enter your age: ')
    current_age_int = int(current_age_str)
    future_age = current_age_int + 10
    print(f'In 10 years, you will be {future_age} years old.')
except ValueError:
    print('Invalid input. Please enter a valid integer for your age.')


## Mini-Project: Unit Converter

For your second mini-project, you will create a simple unit converter. Your program should be able to convert units based on user input.

**Task:** Write a Python script that asks the user:
1.  What type of conversion they want (e.g., 'Fahrenheit to Celsius', 'Inches to Centimeters').
2.  The value they want to convert.
Then, perform the conversion and print the result. Remember to handle user input by converting it to the appropriate numeric type before calculation.

**Conversion formulas:**
*   Fahrenheit to Celsius: `(F - 32) * 5/9`
*   Inches to Centimeters: `inches * 2.54`

*Hint: Use `if/elif/else` statements to handle different conversion types (we will cover these more formally next week, but feel free to explore them for this project!).*

In [None]:
# Your 'Unit Converter' solution here
print('Welcome to the Unit Converter!')
print('Available conversions: ')
print('1. Fahrenheit to Celsius')
print('2. Inches to Centimeters')

while True:
    choice = input('Enter your choice (1 or 2): ')
    if choice in ('1', '2'):
        break
    else:
        print('Invalid choice. Please enter 1 or 2.')

try:
    value = float(input('Enter the value to convert: '))
    if choice == '1':
        celsius = (value - 32) * 5/9
        print(f'{value}°F is {celsius:.2f}°C')
    elif choice == '2':
        centimeters = value * 2.54
        print(f'{value} inches is {centimeters:.2f} cm')
except ValueError:
    print('Invalid input. Please enter a numerical value.')


## Unit Tests for Unit Converter

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Unit Converter. Run them and verify the output.

In [None]:
# Helper function to test Fahrenheit to Celsius conversion
def fahrenheit_to_celsius(fahrenheit):
    return (fahrenheit - 32) * 5/9

# Helper function to test Inches to Centimeters conversion
def inches_to_centimeters(inches):
    return inches * 2.54

print('--- Running Unit Converter Unit Tests ---')

# Test Case 1: Fahrenheit to Celsius (known value)
temp_f = 32
expected_c = 0.0
result_c = fahrenheit_to_celsius(temp_f)
assert abs(result_c - expected_c) < 0.01, f'Test 1 Failed: {temp_f}°F should be {expected_c}°C, got {result_c}°C'
print(f'Test 1 Passed: {temp_f}°F to {result_c:.2f}°C')

# Test Case 2: Fahrenheit to Celsius (another value)
temp_f = 212
expected_c = 100.0
result_c = fahrenheit_to_celsius(temp_f)
assert abs(result_c - expected_c) < 0.01, f'Test 2 Failed: {temp_f}°F should be {expected_c}°C, got {result_c}°C'
print(f'Test 2 Passed: {temp_f}°F to {result_c:.2f}°C')

# Test Case 3: Inches to Centimeters (known value)
length_in = 1
expected_cm = 2.54
result_cm = inches_to_centimeters(length_in)
assert abs(result_cm - expected_cm) < 0.01, f'Test 3 Failed: {length_in} inch should be {expected_cm} cm, got {result_cm} cm'
print(f'Test 3 Passed: {length_in} inch to {result_cm:.2f} cm')

# Test Case 4: Inches to Centimeters (another value)
length_in = 10
expected_cm = 25.4
result_cm = inches_to_centimeters(length_in)
assert abs(result_cm - expected_cm) < 0.01, f'Test 4 Failed: {length_in} inches should be {expected_cm} cm, got {result_cm} cm'
print(f'Test 4 Passed: {length_in} inches to {result_cm:.2f} cm')

print('
All Unit Tests Completed.')


## Hints/Solution (Optional, Expand to View)

This section contains a suggested implementation for the Unit Converter mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Unit Converter
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

def convert_temperature(value, unit_from, unit_to):
    if unit_from == 'Fahrenheit' and unit_to == 'Celsius':
        return (value - 32) * 5/9
    elif unit_from == 'Celsius' and unit_to == 'Fahrenheit':
        return (value * 9/5) + 32
    else:
        return None # Or raise an error for unsupported conversion

def convert_length(value, unit_from, unit_to):
    if unit_from == 'Inches' and unit_to == 'Centimeters':
        return value * 2.54
    elif unit_from == 'Centimeters' and unit_to == 'Inches':
        return value / 2.54
    else:
        return None

# Example usage of solution functions
# print(f'32F is {convert_temperature(32, "Fahrenheit", "Celsius"):.2f}C')
# print(f'10in is {convert_length(10, "Inches", "Centimeters"):.2f}cm')

print('Welcome to the Unit Converter (Solution)!')
print('Available conversions: ')
print('1. Fahrenheit to Celsius')
print('2. Inches to Centimeters')

while True:
    choice = input('Enter your choice (1 or 2): ')
    if choice in ('1', '2'):
        break
    else:
        print('Invalid choice. Please enter 1 or 2.')

try:
    value = float(input('Enter the value to convert: '))
    if choice == '1':
        celsius = convert_temperature(value, 'Fahrenheit', 'Celsius')
        print(f'{value}°F is {celsius:.2f}°C')
    elif choice == '2':
        centimeters = convert_length(value, 'Inches', 'Centimeters')
        print(f'{value} inches is {centimeters:.2f} cm')
except ValueError:
    print('Invalid input. Please enter a numerical value.')


## Navigational Links

[<-- Previous Week](week_01.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_03.ipynb)


## Navigational Links

[<-- Previous Week](week_02.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_04.ipynb)


# Week 3: Logic and Conditionals

Welcome to Week 3! This week, we will explore the fundamental concepts of conditional logic in Python. You'll learn about Boolean values, how to use logical operators (`and`, `or`, `not`), and how to control the flow of your programs using `if`, `elif`, and `else` statements. These tools are crucial for making decisions in your code and creating programs that can respond dynamically to different situations.

### Reading: Chapter 5 of 'Think Python 2e'

For a comprehensive understanding of this week's topics, please refer to Chapter 5 of our primary textbook:
[Think Python 2e - Chapter 5](https://greenteapress.com/wp/think-python-2e/)

## Interactive Lab: Conditional Logic

This section provides hands-on exercises to solidify your understanding of Boolean expressions, logical operators, and conditional statements. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Even or Odd Checker

Write a Python program that takes an integer as input from the user and determines whether the number is even or odd. Use the modulo operator (`%`) and an `if/else` statement to check for divisibility by 2.

**Try It Yourself:** Test your code with various positive and negative integers.

In [None]:
# Your Even or Odd Checker code here
number_str = input('Enter an integer: ')
number = int(number_str)

if number % 2 == 0:
    print(f'{number} is an even number.')
else:
    print(f'{number} is an odd number.')

## Mini-Project: Grade Calculator

**Task:** Write a Python script that asks the user for a numerical score (0-100) and outputs a letter grade (A, B, C, D, F) based on the following scale:

*   90-100: A
*   80-89: B
*   70-79: C
*   60-69: D
*   Below 60: F

Remember to handle potential non-numeric input or scores outside the 0-100 range gracefully (e.g., print an error message, though formal error handling will be covered later).

In [None]:
# Your Grade Calculator solution here
score_str = input('Enter the numerical score (0-100): ')
score = int(score_str)

if score < 0 or score > 100:
    print('Error: Score must be between 0 and 100.')
elif score >= 90:
    print('Grade: A')
elif score >= 80:
    print('Grade: B')
elif score >= 70:
    print('Grade: C')
elif score >= 60:
    print('Grade: D')
else:
    print('Grade: F')

## Unit Tests for Grade Calculator

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Grade Calculator. Run them and verify the output.

In [None]:
def calculate_grade(score):    if score < 0 or score > 100:        return 'Invalid Score'    elif score >= 90:        return 'A'    elif score >= 80:        return 'B'    elif score >= 70:        return 'C'    elif score >= 60:        return 'D'    else:        return 'F'print('--- Running Grade Calculator Unit Tests ---')assert calculate_grade(95) == 'A', 'Test 1 Failed: 95 should be A'assert calculate_grade(85) == 'B', 'Test 2 Failed: 85 should be B'assert calculate_grade(75) == 'C', 'Test 3 Failed: 75 should be C'assert calculate_grade(65) == 'D', 'Test 4 Failed: 65 should be D'assert calculate_grade(55) == 'F', 'Test 5 Failed: 55 should be F'assert calculate_grade(100) == 'A', 'Test 6 Failed: 100 should be A'assert calculate_grade(0) == 'F', 'Test 7 Failed: 0 should be F'assert calculate_grade(-1) == 'Invalid Score', 'Test 8 Failed: -1 should be Invalid Score'assert calculate_grade(101) == 'Invalid Score', 'Test 9 Failed: 101 should be Invalid Score'print('All Unit Tests Completed.')

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Grade Calculator. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Grade Calculator
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

def calculate_grade_solution(score):
    if score < 0 or score > 100:
        return 'Invalid Score'
    elif score >= 90:
        return 'A'
    elif score >= 80:
        return 'B'
    elif score >= 70:
        return 'C'
    elif score >= 60:
        return 'D'
    else:
        return 'F'

# Example usage (uncomment to test):
# print(f'Score 85: {calculate_grade_solution(85)}')
# print(f'Score 92: {calculate_grade_solution(92)}')
# print(f'Score 60: {calculate_grade_solution(60)}')
# print(f'Score -5: {calculate_grade_solution(-5)}')


## Navigational Links

[<-- Previous Week](week_02.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_04.ipynb)


## Navigational Links

[<-- Previous Week](week_03.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_05.ipynb)


# Week 4: Iteration (Loops)

Welcome to Week 4! This week, we will explore the powerful concept of iteration using loops in Python. Loops allow you to execute a block of code repeatedly, which is fundamental for automating tasks and processing collections of data. We'll cover `for` loops for iterating over sequences and `while` loops for repeating actions until a certain condition is met. Mastering loops is crucial for writing efficient and dynamic programs.

### Reading: Chapter 7 of 'Think Python 2e'

For a comprehensive understanding of this week's topics, please refer to Chapter 7 of our primary textbook:
[Think Python 2e - Chapter 7](https://greenteapress.com/wp/think-python-2e/)

## Interactive Lab: Understanding Loops

This section provides hands-on exercises to solidify your understanding of `for` and `while` loops. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Simple `for` Loop

The `for` loop is used to iterate over a sequence (like a list, tuple, string, or range).

**Try It Yourself:** Write a `for` loop that prints numbers from 0 to 4 (inclusive).

In [None]:
# Your simple for loop here
for i in range(5):
    print(i)

#### Exercise 2: Counting with `while` Loop

The `while` loop repeats as long as a certain condition is true.

**Try It Yourself:** Write a `while` loop that prints numbers from 1 to 5 (inclusive).

In [None]:
# Your while loop here
count = 1
while count <= 5:
    print(count)
    count += 1

#### Exercise 3: Iterating over a String

Strings are sequences, so you can iterate through their characters using a `for` loop.

**Try It Yourself:** Print each character of the word 'Python' on a new line.

In [None]:
# Your string iteration here
word = 'Python'
for char in word:
    print(char)

## Mini-Project: Multiplication Table Generator

**Task:** Write a Python script that asks the user for an integer and then prints its multiplication table from 1 to 10.

For example, if the user enters `5`, the output should be:
```
5 x 1 = 5
5 x 2 = 10
...
5 x 10 = 50
```

In [None]:
# Your Multiplication Table Generator solution here
num_str = input('Enter an integer to see its multiplication table: ')
num = int(num_str)

print(f'Multiplication Table for {num}:')
for i in range(1, 11):
    print(f'{num} x {i} = {num * i}')


## Unit Tests for Multiplication Table Generator

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Multiplication Table Generator. Run them and verify the output.

In [None]:
def generate_multiplication_table(number):    table_output = []    for i in range(1, 11):        table_output.append(f'{number} x {i} = {number * i}')    return table_outputprint('--- Running Multiplication Table Unit Tests ---')# Test Case 1: Number 5table_5 = generate_multiplication_table(5)expected_5_output = [    '5 x 1 = 5', '5 x 2 = 10', '5 x 3 = 15', '5 x 4 = 20', '5 x 5 = 25',    '5 x 6 = 30', '5 x 7 = 35', '5 x 8 = 40', '5 x 9 = 45', '5 x 10 = 50']assert table_5 == expected_5_output, 'Test 1 Failed: Table for 5 is incorrect.'print('Test 1 Passed: Table for 5 is correct.')# Test Case 2: Number 1table_1 = generate_multiplication_table(1)expected_1_output = [    '1 x 1 = 1', '1 x 2 = 2', '1 x 3 = 3', '1 x 4 = 4', '1 x 5 = 5',    '1 x 6 = 6', '1 x 7 = 7', '1 x 8 = 8', '1 x 9 = 9', '1 x 10 = 10']assert table_1 == expected_1_output, 'Test 2 Failed: Table for 1 is incorrect.'print('Test 2 Passed: Table for 1 is correct.')# Test Case 3: Number 10table_10 = generate_multiplication_table(10)expected_10_output = [    '10 x 1 = 10', '10 x 2 = 20', '10 x 3 = 30', '10 x 4 = 40', '10 x 5 = 50',    '10 x 6 = 60', '10 x 7 = 70', '10 x 8 = 80', '10 x 9 = 90', '10 x 10 = 100']assert table_10 == expected_10_output, 'Test 3 Failed: Table for 10 is incorrect.'print('Test 3 Passed: Table for 10 is correct.')print('
All Unit Tests Completed.')

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Multiplication Table Generator. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Multiplication Table Generator
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

# num_input = input('Enter an integer: ')
# try:
#     num = int(num_input)
#     print(f'Multiplication Table for {num}:')
#     for i in range(1, 11):
#         print(f'{num} x {i} = {num * i}')
# except ValueError:
#     print('Invalid input. Please enter an integer.')


## Navigational Links

[<-- Previous Week](week_03.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_05.ipynb)


## Navigational Links

[<-- Previous Week](week_04.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_06.ipynb)


# Week 5: Strings

Welcome to Week 5! This week, we delve into one of the most fundamental and versatile data types in Python: strings. You will learn how to define, manipulate, and work with text data. We'll cover string indexing and slicing for accessing parts of a string, concatenation for joining strings, and essential string methods that allow you to modify and inspect string content. Understanding strings is crucial for handling text-based information in your programs.

### Reading: Chapter 8 of 'Think Python 2e'

For a comprehensive understanding of this week's topics, please refer to Chapter 8 of our primary textbook:
[Think Python 2e - Chapter 8](https://greenteapress.com/wp/think-python-2e/)

## Interactive Lab: Working with Strings

This section provides hands-on exercises to solidify your understanding of string operations in Python. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: String Indexing

Each character in a string has an index, starting from 0 for the first character. You can access individual characters using square brackets `[]`.

In [None]:
# Try It Yourself: Create a list and access its elements
my_string = 'Python'
print(f'First character: {my_string[0]}')
print(f'Third character: {my_string[2]}')
print(f'Last character: {my_string[-1]}')
print(f'Second to last character: {my_string[-2]}')


#### Exercise 2: String Slicing

Slicing allows you to extract a portion (substring) of a string. The syntax is `[start:end]` (end is exclusive) or `[start:end:step]`.

In [None]:
# Try It Yourself: Slice the string 'Hello World'
greeting = 'Hello World'
print(f'Original string: {greeting}')
print(f'Substring from index 0 to 4: {greeting[0:5]}')  # 'Hello'
print(f'Substring from index 6 to end: {greeting[6:]}') # 'World'
print(f'Every other character: {greeting[::2]}')
print(f'Reverse string: {greeting[::-1]}')


#### Exercise 3: String Concatenation

You can combine two or more strings using the `+` operator.

In [None]:
# Try It Yourself: Join two strings
part1 = 'Python is '
part2 = 'awesome!'
full_sentence = part1 + part2
print(full_sentence)
# Using f-strings (formatted string literals) for concatenation
name = 'Alice'
age = 30
greeting_fstring = f'Hello, {name}! You are {age} years old.'
print(greeting_fstring)


#### Exercise 4: Common String Methods

Python strings come with many built-in methods for manipulation.

In [None]:
# Try It Yourself: Use various string methodstext = '  Learning Python is fun!  'print(f"Original: '{text}'")print(f'Uppercase: {text.upper()}')print(f'Lowercase: {text.lower()}')print(f"Stripped: '{text.strip()}'")print(f"Replaced 'fun' with 'awesome': {text.replace('fun', 'awesome')}")print(f"Starts with '  Learning': {text.startswith('  Learning')}")print(f"Ends with 'fun!  ': {text.endswith('fun!  ')}")print(f"Contains 'Python': {'Python' in text}")

## Mini-Project: Palindrome Checker

**Task:** Write a Python function that takes a string as input and returns `True` if it's a palindrome (reads the same forwards and backwards, ignoring case and non-alphanumeric characters), and `False` otherwise.

**Example:**
*   `'madam'` is a palindrome.
*   `'A man, a plan, a canal: Panama'` is a palindrome.
*   `'hello'` is not a palindrome.

In [None]:
# Your Palindrome Checker solution hereimport redef is_palindrome(s):    # Remove non-alphanumeric characters and convert to lowercase    s = re.sub(r'[^a-zA-Z0-9]', '', s).lower()    return s == s[::-1]# Example usage:# print(is_palindrome('madam'))       # True# print(is_palindrome('racecar'))     # True# print(is_palindrome('hello'))       # False# print(is_palindrome('A man, a plan, a canal: Panama')) # True

## Unit Tests for Palindrome Checker

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Palindrome Checker. Run them and verify the output.

In [None]:
import re

def is_palindrome(s):
    s = re.sub(r'[^a-zA-Z0-9]', '', s).lower()
    return s == s[::-1]

# Test Cases
print('--- Running Palindrome Checker Unit Tests ---')
assert is_palindrome('madam') == True, 'Test 1 Failed: madam should be True'
assert is_palindrome('racecar') == True, 'Test 2 Failed: racecar should be True'
assert is_palindrome('hello') == False, 'Test 3 Failed: hello should be False'
assert is_palindrome('A man, a plan, a canal: Panama') == True, 'Test 4 Failed: A man, a plan, a canal: Panama should be True'
assert is_palindrome('No lemon, no melon') == True, 'Test 5 Failed: No lemon, no melon should be True'
assert is_palindrome('Python') == False, 'Test 6 Failed: Python should be False'
assert is_palindrome('12321') == True, 'Test 7 Failed: 12321 should be True'
assert is_palindrome('Was it a car or a cat I saw?') == True, 'Test 8 Failed: Was it a car or a cat I saw? should be True'
print('
All Unit Tests Completed.')


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Palindrome Checker. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Palindrome Checker
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

# import re
#
# def is_palindrome_solution(s):
#     s = re.sub(r'[^a-zA-Z0-9]', '', s).lower()
#     return s == s[::-1]

# # Example usage:
# # print(is_palindrome_solution('madam'))
# # print(is_palindrome_solution('A man, a plan, a canal: Panama'))


## Navigational Links

[<-- Previous Week](week_04.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_06.ipynb)


## Navigational Links

[<-- Previous Week](week_05.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_07.ipynb)


# Week 6: Lists

Welcome to Week 6! This week, we introduce one of Python's most powerful and fundamental data structures: lists. Lists are ordered, mutable collections of items, allowing you to store and manage multiple pieces of data in a single variable. You will learn how to create lists, access and modify their elements, perform various operations like adding and removing items, and iterate through them. Mastering lists is essential for handling collections of data effectively in Python.

### Reading: Chapter 10 of 'Think Python 2e'

For a comprehensive understanding of this week's topics, please refer to Chapter 10 of our primary textbook:
[Think Python 2e - Chapter 10](https://greenteapress.com/wp/think-python-2e/)

## Interactive Lab: Working with Lists

This section provides hands-on exercises to solidify your understanding of list operations in Python. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: List Creation and Access

Lists are created using square brackets `[]` and can contain items of different data types. You can access individual elements using their index (starting from 0).

In [None]:
# Try It Yourself: Create a list and access its elements
my_list = [10, 'hello', 3.14, True]
print(f'Original list: {my_list}')
print(f'First element: {my_list[0]}')
print(f'Second element: {my_list[1]}')
print(f'Last element: {my_list[-1]}')

#### Exercise 2: Modifying List Elements

Lists are mutable, meaning you can change their elements after creation. You can also add elements using `append()` or `insert()`, and remove them using `remove()` or `pop()`.

In [None]:
# Try It Yourself: Modify a list
fruits = ['apple', 'banana', 'cherry']
print(f'Initial list: {fruits}')

# Change an element
fruits[1] = 'orange'
print(f'After changing element: {fruits}')

# Add elements
fruits.append('grape') # Adds to the end
fruits.insert(0, 'kiwi') # Inserts at a specific index
print(f'After adding elements: {fruits}')

# Remove elements
fruits.remove('cherry') # Removes the first occurrence of a value
popped_fruit = fruits.pop() # Removes and returns the last element
print(f'After removing elements: {fruits}')
print(f'Popped fruit: {popped_fruit}')


#### Exercise 3: List Slicing and Iteration

Just like strings, you can slice lists to get sub-lists. Loops are commonly used to process each item in a list.

In [None]:
# Try It Yourself: Slice and iterate through a list
numbers = [10, 20, 30, 40, 50, 60, 70]
print(f'Original list: {numbers}')

# Slice the list
print(f'First three elements: {numbers[0:3]}')
print(f'Elements from index 2 to 5: {numbers[2:6]}')
print(f'Last two elements: {numbers[-2:]}')

# Iterate through the list
print('Iterating through elements:')
for num in numbers:
    print(num)


## Mini-Project: Simple To-Do List Manager

**Task:** Create a simple console-based to-do list manager. Your program should allow the user to:
1.  **Add** a task to the list.
2.  **View** all tasks, showing their status (e.g., `[ ] task` or `[X] task`).
3.  **Mark** a task as complete.

Use a list to store your tasks. Each task could be a string, or you could consider storing tasks as dictionaries if you want to include more details (like completion status).

In [None]:
# Your To-Do List Manager solution here
tasks = [] # List to store tasks. Each task is a dictionary: {'name': 'Task name', 'completed': False}

def add_task(task_name):
    tasks.append({'name': task_name, 'completed': False})
    print(f'Task "'{task_name}'" added.')

def view_tasks():
    if not tasks:
        print('No tasks in the list.')
        return
    print('
--- YOUR TASKS ---')
    for i, task in enumerate(tasks):
        status = '[X]' if task['completed'] else '[ ]'
        print(f'{i+1}. {status} "'{task['name']}'".replace('
',''))
    print('------------------')

def mark_task_complete(task_index):
    if 0 <= task_index < len(tasks):
        tasks[task_index]['completed'] = True
        print(f'Task "'{tasks[task_index]['name']}'" marked as complete.'.replace('
',''))
    else:
        print('Invalid task number.')

def delete_task(task_index):
    if 0 <= task_index < len(tasks):
        removed_task = tasks.pop(task_index)
        print(f'Task "'{removed_task['name']}'" deleted.'.replace('
',''))
    else:
        print('Invalid task number.')

def run_todo_list():
    while True:
        print('
1. Add Task')
        print('2. View Tasks')
        print('3. Mark Task Complete')
        print('4. Delete Task')
        print('5. Exit')
        choice = input('Enter your choice: ')

        if choice == '1':
            task_name = input('Enter task name: ')
            add_task(task_name)
        elif choice == '2':
            view_tasks()
        elif choice == '3':
            view_tasks()
            try:
                task_num = int(input('Enter task number to mark complete: ')) - 1
                mark_task_complete(task_num)
            except ValueError:
                print('Invalid input. Please enter a number.')
        elif choice == '4':
            view_tasks()
            try:
                task_num = int(input('Enter task number to delete: ')) - 1
                delete_task(task_num)
            except ValueError:
                print('Invalid input. Please enter a number.')
        elif choice == '5':
            print('Exiting To-Do List Manager.')
            break
        else:
            print('Invalid choice. Please try again.')

# Uncomment the line below to run the To-Do List Manager interactively
# run_todo_list()


## Unit Tests for To-Do List Manager

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your To-Do List Manager. Run them and verify the output.

In [None]:
# Reset tasks for testing
test_tasks = []

def test_add_task(task_name):
    global test_tasks
    test_tasks.append({'name': task_name, 'completed': False})

def test_mark_task_complete(task_number):
    global test_tasks
    if 1 <= task_number <= len(test_tasks):
        test_tasks[task_number - 1]['completed'] = True
        return True
    return False

# Test Cases
print('--- Testing Add Task ---')
test_tasks = []
test_add_task('Buy groceries')
assert len(test_tasks) == 1, 'Test Failed: Task not added'
assert test_tasks[0]['name'] == 'Buy groceries', 'Test Failed: Incorrect task name'
assert test_tasks[0]['completed'] == False, 'Test Failed: Task should not be completed initially'
print('Add Task tests passed!')

print('--- Testing Mark Task Complete ---')
test_tasks = []
test_add_task('Walk the dog')
test_add_task('Do laundry')
test_mark_task_complete(1)
assert test_tasks[0]['completed'] == True, 'Test Failed: Task 1 not marked complete'
assert test_tasks[1]['completed'] == False, 'Test Failed: Task 2 should not be complete'
assert not test_mark_task_complete(99), 'Test Failed: Marking non-existent task should return False'
print('Mark Task Complete tests passed!')

print('
All Mini-Project tests concluded.')

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the To-Do List Manager. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for To-Do List Manager
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

# tasks_solution = []

# def add_task_solution(task_name):
#     tasks_solution.append({'name': task_name, 'completed': False})
#     print(f'Task "'{task_name}'" added.')

# def view_tasks_solution():
#     if not tasks_solution:
#         print('No tasks in the list.')
#         return
#     print('
--- YOUR TASKS ---')
#     for i, task in enumerate(tasks_solution):
#         status = '[X]' if task['completed'] else '[ ]'
#         print(f'{i+1}. {status} "'{task['name']}'".replace('
',''))
#     print('------------------')

# def mark_task_complete_solution(task_index):
#     if 0 <= task_index < len(tasks_solution):
#         tasks_solution[task_index]['completed'] = True
#         print(f'Task "'{tasks_solution[task_index]['name']}'" marked as complete.'.replace('
',''))
#     else:
#         print('Invalid task number.')

# def delete_task_solution(task_index):
#     if 0 <= task_index < len(tasks_solution):
#         removed_task = tasks_solution.pop(task_index)
#         print(f'Task "'{removed_task['name']}'" deleted.'.replace('
',''))
#     else:
#         print('Invalid task number.')

# def run_todo_list_solution():
#     while True:
#         print('
1. Add Task')
#         print('2. View Tasks')
#         print('3. Mark Task Complete')
#         print('4. Delete Task')
#         print('5. Exit')
#         choice = input('Enter your choice: ')

#         if choice == '1':
#             task_name = input('Enter task name: ')
#             add_task_solution(task_name)
#         elif choice == '2':
#             view_tasks_solution()
#         elif choice == '3':
#             view_tasks_solution()
#             try:
#                 task_num = int(input('Enter task number to mark complete: ')) - 1
#                 mark_task_complete_solution(task_num)
#             except ValueError:
#                 print('Invalid input. Please enter a number.')
#         elif choice == '4':
#             view_tasks_solution()
#             try:
#                 task_num = int(input('Enter task number to delete: ')) - 1
#                 delete_task_solution(task_num)
#             except ValueError:
#                 print('Invalid input. Please enter a number.')
#         elif choice == '5':
#             print('Exiting To-Do List Manager.')
#             break
#         else:
#             print('Invalid choice. Please try again.')

# # Uncomment the line below to run the To-Do List Manager interactively
# # run_todo_list_solution()


## Unit Tests for Simple To-Do List Manager

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your To-Do List Manager. Run them and verify the output.

In [None]:
# Reset tasks for testing
test_tasks = []

def test_add_task(task_name):
    global test_tasks
    test_tasks.append({'name': task_name, 'completed': False})

def test_mark_task_complete(task_index):
    global test_tasks
    if 0 <= task_index < len(test_tasks):
        test_tasks[task_index]['completed'] = True

def test_delete_task(task_index):
    global test_tasks
    if 0 <= task_index < len(test_tasks):
        test_tasks.pop(task_index)

print('--- Running To-Do List Manager Unit Tests ---')

# Test Case 1: Add tasks
test_tasks = [] # Ensure clean state
test_add_task('Buy groceries')
test_add_task('Pay bills')
assert len(test_tasks) == 2, 'Test 1 Failed: Should have 2 tasks after adding.'
assert test_tasks[0]['name'] == 'Buy groceries', 'Test 1 Failed: First task name mismatch.'
print('Test 1 Passed: Tasks added correctly.')

# Test Case 2: Mark task complete
test_mark_task_complete(0)
assert test_tasks[0]['completed'] == True, 'Test 2 Failed: First task should be complete.'
print('Test 2 Passed: Task marked complete correctly.')

# Test Case 3: Delete task
test_delete_task(1) # Delete 'Pay bills'
assert len(test_tasks) == 1, 'Test 3 Failed: Should have 1 task after deleting.'
assert test_tasks[0]['name'] == 'Buy groceries', 'Test 3 Failed: Remaining task name mismatch.'
print('Test 3 Passed: Task deleted correctly.')

# Test Case 4: Invalid index for mark complete
initial_completed_state = test_tasks[0]['completed']
test_mark_task_complete(5) # Invalid index
assert test_tasks[0]['completed'] == initial_completed_state, 'Test 4 Failed: Task state should not change for invalid index.'
print('Test 4 Passed: Invalid index for mark complete handled.')

# Test Case 5: Invalid index for delete task
initial_task_count = len(test_tasks)
test_delete_task(5) # Invalid index
assert len(test_tasks) == initial_task_count, 'Test 5 Failed: Task count should not change for invalid index.'
print('Test 5 Passed: Invalid index for delete handled.')

print('
All Unit Tests Completed.')


In [None]:
# Reset tasks for testing
test_tasks = []

def test_add_task(task_name):
    global test_tasks
    test_tasks.append({'name': task_name, 'completed': False})

def test_mark_task_complete(task_index):
    global test_tasks
    if 0 <= task_index < len(test_tasks):
        test_tasks[task_index]['completed'] = True

def test_delete_task(task_index):
    global test_tasks
    if 0 <= task_index < len(test_tasks):
        test_tasks.pop(task_index)

print('--- Running To-Do List Manager Unit Tests ---')

# Test Case 1: Add tasks
test_tasks = [] # Ensure clean state
test_add_task('Buy groceries')
test_add_task('Pay bills')
assert len(test_tasks) == 2, 'Test 1 Failed: Should have 2 tasks after adding.'
assert test_tasks[0]['name'] == 'Buy groceries', 'Test 1 Failed: First task name mismatch.'
print('Test 1 Passed: Tasks added correctly.')

# Test Case 2: Mark task complete
test_mark_task_complete(0)
assert test_tasks[0]['completed'] == True, 'Test 2 Failed: First task should be complete.'
print('Test 2 Passed: Task marked complete correctly.')

# Test Case 3: Delete task
test_delete_task(1) # Delete 'Pay bills'
assert len(test_tasks) == 1, 'Test 3 Failed: Should have 1 task after deleting.'
assert test_tasks[0]['name'] == 'Buy groceries', 'Test 3 Failed: Remaining task name mismatch.'
print('Test 3 Passed: Task deleted correctly.')

# Test Case 4: Invalid index for mark complete
initial_completed_state = test_tasks[0]['completed']
test_mark_task_complete(5) # Invalid index
assert test_tasks[0]['completed'] == initial_completed_state, 'Test 4 Failed: Task state should not change for invalid index.'
print('Test 4 Passed: Invalid index for mark complete handled.')

# Test Case 5: Invalid index for delete task
initial_task_count = len(test_tasks)
test_delete_task(5) # Invalid index
assert len(test_tasks) == initial_task_count, 'Test 5 Failed: Task count should not change for invalid index.'
print('Test 5 Passed: Invalid index for delete handled.')

print('
All Unit Tests Completed.')


## Navigational Links

[<-- Previous Week](week_05.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_07.ipynb)


## Navigational Links

[<-- Previous Week](week_06.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_08.ipynb)


# Week 7: Dictionaries

Welcome to Week 7! This week, we introduce another fundamental data structure in Python: dictionaries. Dictionaries are unordered collections of data values, used to store data values like a map, which, unlike other data types that hold only a single value as an element, holds `key:value` pairs. Understanding dictionaries is crucial for handling data where items are related by keys, such as user profiles, configuration settings, or data records. You will learn how to create, access, modify, and iterate through dictionaries.

### Reading: Chapter 11 of 'Think Python 2e'

For a comprehensive understanding of this week's topics, please refer to Chapter 11 of our primary textbook:
[Think Python 2e - Chapter 11](https://greenteapress.com/wp/think-python-2e/)

## Interactive Lab: Working with Dictionaries

This section provides hands-on exercises to solidify your understanding of dictionary operations in Python. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Dictionary Creation and Access

Dictionaries are created using curly braces `{}` and store data in `key:value` pairs. Keys must be unique and immutable (like strings, numbers, or tuples), while values can be of any data type. You can access values using their corresponding keys.

In [None]:
# Try It Yourself: Create a dictionary and access its elements
person = {
    'name': 'Alice',
    'age': 30,
    'city': 'New York'
}
print(f'Original dictionary: {person}')
print(f'Name: {person['name']}')
print(f'Age: {person['age']}')
print(f'City: {person['city']}')
# Trying to access a non-existent key will raise a KeyError
# print(person['gender']) # Uncomment to see KeyError


#### Exercise 2: Modifying and Deleting Dictionary Items

Dictionaries are mutable. You can add new key-value pairs, change the value associated with an existing key, or delete key-value pairs.

In [None]:
# Try It Yourself: Modify and delete items in a dictionary
student = {
    'id': 101,
    'name': 'Bob',
    'major': 'Computer Science',
    'courses': ['Python', 'Data Structures']
}
print(f'Initial dictionary: {student}')

# Add a new key-value pair
student['gpa'] = 3.8
print(f'After adding GPA: {student}')

# Change an existing value
student['major'] = 'Software Engineering'
print(f'After changing major: {student}')

# Delete a key-value pair
del student['courses']
print(f'After deleting courses: {student}')

# Using .pop() to remove an item and get its value
removed_id = student.pop('id')
print(f'After popping id: {student}, Removed ID: {removed_id}')


#### Exercise 3: Iterating Through Dictionaries

You can iterate through a dictionary's keys, values, or key-value pairs using `for` loops with `.keys()`, `.values()`, or `.items()` methods, respectively.

In [None]:
# Try It Yourself: Iterate through a dictionary
inventory = {
    'apple': 50,
    'banana': 100,
    'orange': 75
}
print(f'Original inventory: {inventory}')

print('Iterating through keys:')
for item_name in inventory.keys():
    print(item_name)

print('Iterating through values:')
for quantity in inventory.values():
    print(quantity)

print('Iterating through key-value pairs:')
for item_name, quantity in inventory.items():
    print(f'{item_name}: {quantity}')


## Mini-Project: Simple Contact Book

**Task:** Create a simple console-based contact book that allows a user to store and manage names and phone numbers. Your program should have the following functionalities:
1.  **Add** a new contact (name and phone number).
2.  **Look up** a contact's phone number by name.
3.  **Update** an existing contact's phone number.
4.  **Delete** a contact.
5.  **List all** contacts.

Use a dictionary to store the contacts, where the contact's name is the key and the phone number is the value.

In [None]:
# Your Simple Contact Book solution here
contacts = {} # Stores name (lowercase) -> phone_number

def add_contact(name, phone_number):
    contacts[name.lower()] = phone_number
    print(f'Contact "{name}" added/updated.')

def lookup_contact(name):
    if name.lower() in contacts:
        print(f'{name}: {contacts[name.lower()]}')
        return contacts[name.lower()]
    else:
        print(f'Contact "{name}" not found.')
        return None

def update_contact(name, new_phone_number):
    if name.lower() in contacts:
        contacts[name.lower()] = new_phone_number
        print(f'Contact "{name}" updated.')
    else:
        print(f'Contact "{name}" not found. Use add_contact to add it.')

def delete_contact(name):
    if name.lower() in contacts:
        del contacts[name.lower()]
        print(f'Contact "{name}" deleted.')
    else:
        print(f'Contact "{name}" not found.')

def list_contacts():
    if not contacts:
        print('Contact book is empty.')
        return
    print('
--- Your Contacts ---')
    for name, phone in sorted(contacts.items()):
        print(f'{name.capitalize()}: {phone}')
    print('---------------------')

def run_contact_book():
    while True:
        print('
1. Add Contact')
        print('2. Look Up Contact')
        print('3. Update Contact')
        print('4. Delete Contact')
        print('5. List All Contacts')
        print('6. Exit')
        choice = input('Enter your choice: ')

        if choice == '1':
            name = input('Enter contact name: ')
            phone = input('Enter phone number: ')
            add_contact(name, phone)
        elif choice == '2':
            name = input('Enter contact name to look up: ')
            lookup_contact(name)
        elif choice == '3':
            name = input('Enter contact name to update: ')
            phone = input('Enter new phone number: ')
            update_contact(name, phone)
        elif choice == '4':
            name = input('Enter contact name to delete: ')
            delete_contact(name)
        elif choice == '5':
            list_contacts()
        elif choice == '6':
            print('Exiting Contact Book.')
            break
        else:
            print('Invalid choice. Please try again.')

# Uncomment the line below to run the Contact Book interactively
# run_contact_book()


## Unit Tests for Simple Contact Book

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Simple Contact Book. Run them and verify the output.

In [None]:
# Helper function for testing (to isolate test contacts from global 'contacts')# This function is not meant to be run directly, but its components are used in the tests below.test_contacts = {}def add_test_contact(name, phone_number):    test_contacts[name.lower()] = phone_number    def lookup_test_contact(name):    return test_contacts.get(name.lower())def update_test_contact(name, new_phone_number):    if name.lower() in test_contacts:        test_contacts[name.lower()] = new_phone_number        return True    return Falsedef delete_test_contact(name):    if name.lower() in test_contacts:        del test_contacts[name.lower()]        return True    return Falseprint('--- Running Simple Contact Book Unit Tests ---')# Test Case 1: Add and Lookup Contacttest_contacts.clear()add_test_contact('Alice', '111-222-3333')assert lookup_test_contact('Alice') == '111-222-3333', 'Test 1 Failed: Alice lookup incorrect.'assert lookup_test_contact('Bob') is None, 'Test 1 Failed: Non-existent contact lookup incorrect.'print('Test 1 Passed: Add and Lookup Contact.')# Test Case 2: Update Contactadd_test_contact('Bob', '444-555-6666') # Add Bob firstupdate_test_contact('Bob', '999-888-7777')assert lookup_test_contact('Bob') == '999-888-7777', 'Test 2 Failed: Bob update incorrect.'assert not update_test_contact('Charlie', '000-000-0000'), 'Test 2 Failed: Update non-existent contact should return False.'print('Test 2 Passed: Update Contact.')# Test Case 3: Delete Contactadd_test_contact('Eve', '123-123-1234')assert delete_test_contact('Eve') is True, 'Test 3 Failed: Delete Eve incorrect.'assert lookup_test_contact('Eve') is None, 'Test 3 Failed: Eve should be deleted.'assert not delete_test_contact('Frank'), 'Test 3 Failed: Delete non-existent contact should return False.'print('Test 3 Passed: Delete Contact.')# Test Case 4: Case-Insensitive Lookup (assuming mini-project stores lowercase keys)test_contacts.clear()add_test_contact('John', '555-123-4567')assert lookup_test_contact('john') == '555-123-4567', 'Test 4 Failed: Case-insensitive lookup incorrect.'print('Test 4 Passed: Case-Insensitive Lookup.')print('
All Simple Contact Book Unit Tests Completed.')

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Simple Contact Book. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Simple Contact Book
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

# contacts_solution = {}

# def add_contact_solution(name, phone_number):
#     contacts_solution[name.lower()] = phone_number
#     print(f'Contact "{name}" added/updated.')

# def lookup_contact_solution(name):
#     if name.lower() in contacts_solution:
#         print(f'{name}: {contacts_solution[name.lower()]}')
#         return contacts_solution[name.lower()]
#     else:
#         print(f'Contact "{name}" not found.')
#         return None

# def update_contact_solution(name, new_phone_number):
#     if name.lower() in contacts_solution:
#         contacts_solution[name.lower()] = new_phone_number
#         print(f'Contact "{name}" updated.')
#         return True
#     else:
#         print(f'Contact "{name}" not found. Use add_contact to add it.')
#         return False

# def delete_contact_solution(name):
#     if name.lower() in contacts_solution:
#         del contacts_solution[name.lower()]
#         print(f'Contact "{name}" deleted.')
#         return True
#     else:
#         print(f'Contact "{name}" not found.')
#         return False

# def list_contacts_solution():
#     if not contacts_solution:
#         print('Contact book is empty.')
#         return
#     print('
--- Your Contacts ---')
#     for name, phone in sorted(contacts_solution.items()):
#         print(f'{name.capitalize()}: {phone}')
#     print('---------------------')

# def run_contact_book_solution():
#     while True:
#         print('
1. Add Contact')
#         print('2. Look Up Contact')
#         print('3. Update Contact')
#         print('4. Delete Contact')
#         print('5. List All Contacts')
#         print('6. Exit')
#         choice = input('Enter your choice: ')

#         if choice == '1':
#             name = input('Enter contact name: ')
#             phone = input('Enter phone number: ')
#             add_contact_solution(name, phone)
#         elif choice == '2':
#             name = input('Enter contact name to look up: ')
#             lookup_contact_solution(name)
#         elif choice == '3':
#             name = input('Enter contact name to update: ')
#             phone = input('Enter new phone number: ')
#             update_contact_solution(name, phone)
#         elif choice == '4':
#             name = input('Enter contact name to delete: ')
#             delete_contact_solution(name)
#         elif choice == '5':
#             list_contacts_solution()
#         elif choice == '6':
#             print('Exiting Contact Book.')
#             break
#         else:
#             print('Invalid choice. Please try again.')

# # Uncomment the line below to run the Contact Book interactively
# # run_contact_book_solution()


## Navigational Links

[<-- Previous Week](week_06.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_08.ipynb)


## Navigational Links

[<-- Previous Week](week_07.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_09.ipynb)


# Week 8: Tuples

Welcome to Week 8! This week, we explore tuples, another fundamental data type in Python. Tuples are ordered, immutable collections of items, similar to lists but with the key difference that their contents cannot be changed after creation. You will learn how to create tuples, access their elements, and understand their practical uses, especially in scenarios involving data that should not be altered. We'll also cover tuple packing and unpacking, powerful features for working with multiple values efficiently.

### Reading: Chapter 12 of 'Think Python 2e'

For a comprehensive understanding of this week's topics, please refer to Chapter 12 of our primary textbook:
[Think Python 2e - Chapter 12](https://greenteapress.com/wp/think-python-2e/)

## Interactive Lab: Working with Tuples

This section provides hands-on exercises to solidify your understanding of tuple operations in Python. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Tuple Creation and Access

Tuples are created using parentheses `()` and can contain items of different data types. You can access individual elements using their index (starting from 0) or negative indexing (from the end).

In [None]:
# Try It Yourself: Create a tuple and access its elements
my_tuple = (10, 'python', 3.14, False, 'end')
print(f'Original tuple: {my_tuple}')
print(f'First element: {my_tuple[0]}')
print(f'Third element: {my_tuple[2]}')
print(f'Last element: {my_tuple[-1]}')
print(f'Second to last element: {my_tuple[-2]}')

#### Exercise 2: Tuple Slicing and Immutability

Like strings and lists, you can slice tuples to get sub-tuples. However, once created, you cannot change, add, or remove elements from a tuple. This immutability is a key characteristic of tuples.

**Try It Yourself:** Slice a tuple. Then, try to change an element or add a new one, and observe the `TypeError`.

In [None]:
# Try It Yourself: Slice a tuple
coordinates = (40.7128, -74.0060, 'New York')
print(f'Original coordinates: {coordinates}')
print(f'Latitude and Longitude: {coordinates[0:2]}') # Slicing
print(f'City: {coordinates[-1]}')

# UNCOMMENT THE LINE BELOW TO SEE THE ERROR (TypeError)
# coordinates[0] = 41.0 # This line would cause a TypeError: 'tuple' object does not support item assignment


In [None]:
# UNCOMMENT THE LINE BELOW TO SEE THE ERROR (TypeError)

In [None]:
# coordinates[0] = 41.0 # This line would cause a TypeError: 'tuple' object does not support item assignment

#### Exercise 3: Tuple Packing and Unpacking

Tuple packing is when multiple values are assigned to a single variable, forming a tuple. Unpacking is the reverse, where elements of a tuple are assigned to multiple variables.

**Try It Yourself:** Pack three values into a tuple, then unpack them into three separate variables.

In [None]:
# Tuple Packing
packed_data = 'Alice', 25, 'Engineer'  # Parentheses are optional for packing
print(f'Packed data (tuple): {packed_data}')
print(f'Type of packed_data: {type(packed_data)}')

# Tuple Unpacking
name, age, occupation = packed_data
print(f'Unpacked Name: {name}')
print(f'Unpacked Age: {age}')
print(f'Unpacked Occupation: {occupation}')

# Swapping variables using tuple packing/unpacking
a = 10
b = 20
print(f'Before swap: a={a}, b={b}')
a, b = b, a # Pythonic way to swap values
print(f'After swap: a={a}, b={b}')

## Mini-Project: Coordinate Tracker

**Task:** Create a simple console-based program that allows a user to store and display a list of geographical coordinates (latitude, longitude) for different locations. Each coordinate pair should be stored as a tuple.

Your program should have the following functionalities:
1.  **Add** a new location with its latitude and longitude.
2.  **View** all stored locations with their coordinates.
3.  **Find** a location by its approximate latitude/longitude (e.g., if multiple locations are near a given point).

In [None]:
# Your Coordinate Tracker solution here
locations = [] # Stores tuples: [(name, (latitude, longitude)), ...]

def add_location(name, latitude, longitude):
    try:
        lat = float(latitude)
        lon = float(longitude)
        locations.append((name, (lat, lon)))
        print(f'Location {name} with coordinates ({latitude}, {longitude}) added.')
    except ValueError:
        print('Invalid latitude or longitude. Please enter numerical values.')

def view_locations():
    if not locations:
        print('No locations tracked yet.')
        return
    print('
--- Tracked Locations ---')
    for i, (name, coords) in enumerate(locations):
        print(f'{i+1}. {name}: Latitude {coords[0]:.4f}, Longitude {coords[1]:.4f}')
    print('-------------------------')

def find_location(name):
    found = False
    for loc_name, coords in locations:
        if loc_name.lower() == name.lower():
            print(f'Found {loc_name}: Latitude {coords[0]:.4f}, Longitude {coords[1]:.4f}')
            found = True
            return coords
    if not found:
        print(f'Location {name} not found.')
    return None

def run_coordinate_tracker():
    while True:
        print('
1. Add Location')
        print('2. View All Locations')
        print('3. Find Location by Name')
        print('4. Exit')
        choice = input('Enter your choice: ')

        if choice == '1':
            name = input('Enter location name: ')
            latitude = input('Enter latitude: ')
            longitude = input('Enter longitude: ')
            add_location(name, latitude, longitude)
        elif choice == '2':
            view_locations()
        elif choice == '3':
            name = input('Enter location name to find: ')
            find_location(name)
        elif choice == '4':
            print('Exiting Coordinate Tracker.')
            break
        else:
            print('Invalid choice. Please try again.')

# Uncomment the line below to run the Coordinate Tracker interactively
# run_coordinate_tracker()


## Unit Tests for Coordinate Tracker

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Coordinate Tracker. Run them and verify the output.

In [None]:
# Helper functions to isolate testing logic from interactive input# Global list to simulate the `locations` list in the mini-project_test_locations = []def _test_add_location(name, latitude, longitude):    try:        lat = float(latitude)        lon = float(longitude)        _test_locations.append((name, (lat, lon)))        return True    except ValueError:        return Falsedef _test_find_location(name):    for loc_name, coords in _test_locations:        if loc_name.lower() == name.lower():            return coords    return Noneprint('--- Running Coordinate Tracker Unit Tests ---')# Test Case 1: Add location with valid coordinates_test_locations.clear() # Reset for fresh testassert _test_add_location('Paris', '48.8566', '2.3522') is True, 'Test 1 Failed: Should successfully add Paris.'assert len(_test_locations) == 1, 'Test 1 Failed: Incorrect number of locations.'print('Test 1 Passed: Successfully added a location.')# Test Case 2: Add location with invalid coordinatesassert _test_add_location('Invalid City', 'abc', 'def') is False, 'Test 2 Failed: Should fail to add with invalid coords.'assert len(_test_locations) == 1, 'Test 2 Failed: Should not add location with invalid coords.'print('Test 2 Passed: Handled invalid coordinates correctly.')# Test Case 3: Find existing location (case-insensitive)coords_paris = _test_find_location('paris')assert coords_paris == (48.8566, 2.3522), 'Test 3 Failed: Should find Paris coordinates (case-insensitive).'print('Test 3 Passed: Found existing location case-insensitively.')# Test Case 4: Find non-existing locationcoords_london = _test_find_location('London')assert coords_london is None, 'Test 4 Failed: Should not find non-existing location.'print('Test 4 Passed: Handled non-existing location correctly.')print('
All Unit Tests Completed.')

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Coordinate Tracker. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Coordinate Tracker
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

# _solution_locations = [] # Use a different list name to avoid conflicts

# def add_location_solution(name, latitude, longitude):
#     try:
#         lat = float(latitude)
#         lon = float(longitude)
#         _solution_locations.append((name, (lat, lon)))
#         print(f'Location {name} with coordinates ({latitude}, {longitude}) added.')
#     except ValueError:
#         print('Invalid latitude or longitude. Please enter numerical values.')

# def view_locations_solution():
#     if not _solution_locations:
#         print('No locations tracked yet.')
#         return
#     print('
--- Tracked Locations ---')
#     for i, (name, coords) in enumerate(_solution_locations):
#         print(f'{i+1}. {name}: Latitude {coords[0]:.4f}, Longitude {coords[1]:.4f}')
#     print('-------------------------')

# def find_location_solution(name):
#     found = False
#     for loc_name, coords in _solution_locations:
#         if loc_name.lower() == name.lower():
#             print(f'Found {loc_name}: Latitude {coords[0]:.4f}, Longitude {coords[1]:.4f}')
#             found = True
#             return coords
#     if not found:
#         print(f'Location {name} not found.')
#     return None

# def run_coordinate_tracker_solution():
#     while True:
#         print('
1. Add Location')
#         print('2. View All Locations')
#         print('3. Find Location by Name')
#         print('4. Exit')
#         choice = input('Enter your choice: ')

#         if choice == '1':
#             name = input('Enter location name: ')
#             latitude = input('Enter latitude: ')
#             longitude = input('Enter longitude: ')
#             add_location_solution(name, latitude, longitude)
#         elif choice == '2':
#             view_locations_solution()
#         elif choice == '3':
#             name = input('Enter location name to find: ')
#             find_location_solution(name)
#         elif choice == '4':
#             print('Exiting Coordinate Tracker.')
#             break
#         else:
#             print('Invalid choice. Please try again.')

# # Uncomment the line below to run the Coordinate Tracker interactively
# # run_coordinate_tracker_solution()


## Navigational Links

[<-- Previous Week](week_07.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_09.ipynb)


## Navigational Links

[<-- Previous Week](week_08.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_10.ipynb)


# Week 9: Files

Welcome to Week 9! This week marks an important step in your Python journey as we dive into file input/output (I/O). Until now, our programs have primarily interacted with the console, taking input and printing output. However, real-world applications often need to store and retrieve data persistently. You will learn how to open files, read their content, write new data to them, and manage them safely using Python's built-in file operations. Mastering file handling is essential for working with data files, configuration files, and log files.

### Reading: Chapter 14 of 'Think Python 2e'

For a comprehensive understanding of this week's topics, please refer to Chapter 14 of our primary textbook:
[Think Python 2e - Chapter 14](https://greenteapress.com/wp/think-python-2e/)

## Interactive Lab: Working with Files

This section provides hands-on exercises to solidify your understanding of file operations in Python. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Writing to a File

You can open a file in write mode (`'w'`) to create a new file or overwrite an existing one. Use `f.write()` to add content.

**Try It Yourself:** Write a few lines of text into a file named `my_first_file.txt`.

In [None]:
# Your code to write to a file here
file_name = 'my_first_file.txt'
f = open(file_name, 'w')
f.write('Hello, file handling!
')
f.write('This is my first line.
')
f.write('And this is the second line.
')
f.close()
print(f'Content written to {file_name}')


#### Exercise 2: Reading from a File

To read from a file, open it in read mode (`'r'`). You can use `f.read()` to read the entire content, `f.readline()` to read one line, or iterate through the file object to read line by line.

**Try It Yourself:** Read and print the content of `my_first_file.txt`.

In [None]:
# Your code to read from a file here
file_name = 'my_first_file.txt'
f = open(file_name, 'r')
content = f.read()
print(f'
Content of {file_name}:
{content}')
f.close()

# Or read line by line
print('
Reading line by line:')
f = open(file_name, 'r')
for line in f:
    print(f'LINE: {line.strip()}') # .strip() removes newline characters
f.close()

#### Exercise 3: Using `with` statement

The `with` statement provides a cleaner and safer way to handle files. It ensures that the file is automatically closed, even if errors occur, preventing resource leaks.

**Try It Yourself:** Rewrite the file writing and reading examples using the `with` statement.

In [None]:
# Your code using 'with' statement here
file_name = 'my_second_file.txt'
with open(file_name, 'w') as f:
    f.write('This is a line written with 'with'.
')
    f.write('It closes automatically.
')
print(f'Content written to {file_name} using with statement.')

# Read content using 'with' statement
with open(file_name, 'r') as f:
    content = f.read()
    print(f'
Content of {file_name} (read with with):
{content}')


## Mini-Project: Simple Log File Analyzer

**Task:** Create a Python script that simulates a simple log file and then analyzes it. Your script should:
1.  Generate a `log.txt` file with at least 10 lines. Include a mix of 'INFO', 'WARNING', and 'ERROR' messages.
2.  Read the `log.txt` file and count the occurrences of 'INFO', 'WARNING', and 'ERROR' messages.
3.  Print a summary of the counts.

In [None]:
# Your Simple Log File Analyzer solution here
import random
import os

log_file_name = 'log.txt'

# 1. Generate a log file
def generate_log_file(filename, num_lines=10):
    log_messages = ['INFO: User logged in.', 'WARNING: Disk space low.', 'ERROR: Database connection failed.', 'INFO: Data processed.']
    with open(filename, 'w') as f:
        for _ in range(num_lines):
            f.write(random.choice(log_messages) + '
')
    print(f'Generated {num_lines} log entries in {filename}')

generate_log_file(log_file_name, 15)

# 2. Read the log.txt file and count the occurrences of 'INFO', 'WARNING', and 'ERROR' messages.
def analyze_log_file(filename):
    info_count = 0
    warning_count = 0
    error_count = 0
    try:
        with open(filename, 'r') as f:
            for line in f:
                if 'INFO' in line: info_count += 1
                elif 'WARNING' in line: warning_count += 1
                elif 'ERROR' in line: error_count += 1
    except FileNotFoundError:
        print(f'Error: File {filename} not found.')
        return None, None, None
    return info_count, warning_count, error_count

info, warning, error = analyze_log_file(log_file_name)

# 3. Print a summary of the counts
if info is not None:
    print(f'
--- Log Analysis Summary ---')
    print(f'INFO messages: {info}')
    print(f'WARNING messages: {warning}')
    print(f'ERROR messages: {error}')
    print(f'----------------------------')

# Clean up the generated log file (optional)
# if os.path.exists(log_file_name):
#     os.remove(log_file_name)
#     print(f'Cleaned up {log_file_name}')


## Unit Tests for Simple Log File Analyzer

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Log File Analyzer. Run them and verify the output.

In [None]:
import os
import random

# Re-define analyze_log_file for testing purposes (or ensure it's globally available)
def analyze_log_file_test(filename):
    info_count = 0
    warning_count = 0
    error_count = 0
    try:
        with open(filename, 'r') as f:
            for line in f:
                if 'INFO' in line: info_count += 1
                elif 'WARNING' in line: warning_count += 1
                elif 'ERROR' in line: error_count += 1
    except FileNotFoundError:
        return None, None, None
    return info_count, warning_count, error_count

print('--- Running Simple Log File Analyzer Unit Tests ---')

# Test Case 1: Basic log file with mixed entries
test_log_file_1 = 'test_log_1.txt'
with open(test_log_file_1, 'w') as f:
    f.write('INFO: User started
')
    f.write('WARNING: Low memory
')
    f.write('ERROR: Crash
')
    f.write('INFO: Operation complete
')
    f.write('WARNING: Disk full
')
info, warning, error = analyze_log_file_test(test_log_file_1)
assert info == 2, f'Test 1 Failed: Expected 2 INFO, got {info}'
assert warning == 2, f'Test 1 Failed: Expected 2 WARNING, got {warning}'
assert error == 1, f'Test 1 Failed: Expected 1 ERROR, got {error}'
print('Test 1 Passed: Basic log analysis is correct.')
os.remove(test_log_file_1) # Clean up

# Test Case 2: Empty log file
test_log_file_2 = 'test_log_2.txt'
with open(test_log_file_2, 'w') as f:
    pass
info, warning, error = analyze_log_file_test(test_log_file_2)
assert info == 0 and warning == 0 and error == 0, 'Test 2 Failed: Empty log file analysis incorrect.'
print('Test 2 Passed: Empty log file handled correctly.')
os.remove(test_log_file_2) # Clean up

# Test Case 3: Log file not found
info, warning, error = analyze_log_file_test('non_existent_file.txt')
assert info is None and warning is None and error is None, 'Test 3 Failed: File not found handling incorrect.'
print('Test 3 Passed: File not found handled correctly.')

print('
All Unit Tests Completed.')


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Simple Log File Analyzer. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Simple Log File Analyzer
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

# import random
# import os

# # Solution for generating log file
# def generate_log_file_solution(filename, num_lines=10):
#     log_messages_solution = ['INFO: User logged in.', 'WARNING: Disk space low.', 'ERROR: Database connection failed.', 'INFO: Data processed.']
#     with open(filename, 'w') as f:
#         for _ in range(num_lines):
#             f.write(random.choice(log_messages_solution) + '
')
#     print(f'Generated {num_lines} log entries in {filename}')

# # Solution for analyzing log file
# def analyze_log_file_solution(filename):
#     info_count = 0
#     warning_count = 0
#     error_count = 0
#     try:
#         with open(filename, 'r') as f:
#             for line in f:
#                 if 'INFO' in line: info_count += 1
#                 elif 'WARNING' in line: warning_count += 1
#                 elif 'ERROR' in line: error_count += 1
#     except FileNotFoundError:
#         print(f'Error: File {filename} not found.')
#         return None, None, None
#     return info_count, warning_count, error_count

# log_file_name_solution = 'log_solution.txt'
# generate_log_file_solution(log_file_name_solution, 20)
# info, warning, error = analyze_log_file_solution(log_file_name_solution)

# if info is not None:
#     print(f'
--- Solution Log Analysis Summary ---')
#     print(f'INFO messages: {info}')
#     print(f'WARNING messages: {warning}')
#     print(f'ERROR messages: {error}')
#     print(f'----------------------------')

# # Clean up the generated log file (optional)
# if os.path.exists(log_file_name_solution):
#     os.remove(log_file_name_solution)
#     print(f'Cleaned up {log_file_name_solution}')


## Navigational Links

[<-- Previous Week](week_08.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_10.ipynb)


## Navigational Links

[<-- Previous Week](week_09.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_11.ipynb)


# Week 10: Functions

Welcome to Week 10! This week, we will introduce functions, one of the most fundamental concepts in programming. Functions allow you to group related statements into a block of reusable code that performs a specific task. You'll learn how to define functions, pass arguments to them, receive return values, and understand the concept of variable scope. Mastering functions is crucial for writing modular, organized, and efficient programs.

### Reading: Chapter 3 of 'Think Python 2e'

For a comprehensive understanding of this week's topics, please refer to Chapter 3 of our primary textbook:
[Think Python 2e - Chapter 3](https://greenteapress.com/wp/think-python-2e/)

## Interactive Lab: Working with Functions

This section provides hands-on exercises to solidify your understanding of functions in Python. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Simple Function

A simple function is defined using the `def` keyword, followed by a name, parentheses `()`, and a colon `:`. The function body is indented.

In [None]:
# Try It Yourself: Define and call a simple function
def greet():
    print('Hello, Python programmer!')
    print('Welcome to functions!')

# Call the function
greet()

#### Exercise 2: Functions with Arguments

Arguments are values passed into a function when it is called. They allow functions to be more flexible and operate on different data.

In [None]:
# Try It Yourself: Define a function that takes an argument
def greet_person(name):
    print(f'Hello, {name}! How are you today?')

# Call the function with different arguments
greet_person('Alice')
greet_person('Bob')

#### Exercise 3: Functions Returning Values

Functions can send data back to the part of the program that called them using the `return` statement. This allows functions to compute a value that can be used elsewhere.

In [None]:
# Try It Yourself: Define a function that returns a value
def add_numbers(a, b):
    sum_result = a + b
    return sum_result

# Call the function and store the result
result = add_numbers(10, 5)
print(f'The sum is: {result}')

another_result = add_numbers(20, 30)
print(f'Another sum is: {another_result}')


#### Exercise 4: Understanding Scope

Scope refers to the region of a program where a variable is accessible. Variables defined inside a function have local scope and cannot be accessed from outside that function. Variables defined outside any function have global scope.

In [None]:
# Global variable
global_message = 'I am a global variable.'

def my_function():
    # Local variable
    local_message = 'I am a local variable.'
    print(f'Inside function: {local_message}')
    print(f'Inside function can also access: {global_message}')

my_function()
print(f'Outside function: {global_message}')
# print(local_message) # Uncommenting this line would cause a NameError


## Mini-Project: Simple Calculator

**Task:** Create a console-based calculator that performs basic arithmetic operations (addition, subtraction, multiplication, division). Your program should:
1.  Define a separate function for each operation (e.g., `add(x, y)`, `subtract(x, y)`).
2.  Ask the user for two numbers and the desired operation (+, -, *, /).
3.  Use the appropriate function to perform the calculation.
4.  Print the result.
5.  Handle division by zero gracefully.

In [None]:
# Your Simple Calculator solution here
def add(x, y):
    return x + y

def subtract(x, y):
    return x - y

def multiply(x, y):
    return x * y

def divide(x, y):
    if y == 0:
        return "Error! Division by zero."
    return x / y

print('Select operation:')
print('1.Add')
print('2.Subtract')
print('3.Multiply')
print('4.Divide')

while True:
    choice = input('Enter choice(1/2/3/4): ')
    if choice in ('1', '2', '3', '4'):
        try:
            num1 = float(input('Enter first number: '))
            num2 = float(input('Enter second number: '))
        except ValueError:
            print('Invalid input. Please enter numbers only!')
            continue

        if choice == '1':
            print(f'{num1} + {num2} = {add(num1, num2)}')
        elif choice == '2':
            print(f'{num1} - {num2} = {subtract(num1, num2)}')
        elif choice == '3':
            print(f'{num1} * {num2} = {multiply(num1, num2)}')
        elif choice == '4':
            print(f'{num1} / {num2} = {divide(num1, num2)}')

        next_calculation = input('Let's do next calculation? (yes/no): ')
        if next_calculation == 'no':
            break
    else:
        print('Invalid Input')


## Unit Tests for Simple Calculator

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Simple Calculator functions. Run them and verify the output.

In [None]:
# Define the functions for testing (if not already defined globally in notebook)def add(x, y):    return x + ydef subtract(x, y):    return x - ydef multiply(x, y):    return x * ydef divide(x, y):    if y == 0:        return 'Error! Division by zero.' # Match expected return from mini-project    return x / yprint('--- Running Simple Calculator Unit Tests ---')# Test Case 1: Additionassert add(5, 3) == 8, 'Test 1 Failed: Addition incorrect.'print('Test 1 Passed: Addition is correct.')# Test Case 2: Subtractionassert subtract(10, 4) == 6, 'Test 2 Failed: Subtraction incorrect.'print('Test 2 Passed: Subtraction is correct.')# Test Case 3: Multiplicationassert multiply(7, 6) == 42, 'Test 3 Failed: Multiplication incorrect.'print('Test 3 Passed: Multiplication is correct.')# Test Case 4: Divisionassert divide(20, 5) == 4.0, 'Test 4 Failed: Division incorrect.'print('Test 4 Passed: Division is correct.')# Test Case 5: Division by zeroassert divide(10, 0) == 'Error! Division by zero.', 'Test 5 Failed: Division by zero handling incorrect.'print('Test 5 Passed: Division by zero handled correctly.')# Test Case 6: Negative numbersassert add(-5, 2) == -3, 'Test 6 Failed: Addition with negative incorrect.'assert subtract(2, 5) == -3, 'Test 7 Failed: Subtraction with negative incorrect.'print('Test 6 & 7 Passed: Negative numbers handled correctly.')print('
All Unit Tests Completed.')

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Simple Calculator. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Simple Calculator
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

# def add_solution(x, y):
#     return x + y
# def subtract_solution(x, y):
#     return x - y
# def multiply_solution(x, y):
#     return x * y
# def divide_solution(x, y):
#     if y == 0:
#         return "Error! Division by zero."
#     return x / y

# print('Select operation:')
# print('1.Add')
# print('2.Subtract')
# print('3.Multiply')
# print('4.Divide')

# while True:
#     choice = input('Enter choice(1/2/3/4): ')
#     if choice in ('1', '2', '3', '4'):
#         try:
#             num1 = float(input('Enter first number: '))
#             num2 = float(input('Enter second number: '))
#         except ValueError:
#             print('Invalid input. Please enter numbers only!')
#             continue

#         if choice == '1':
#             print(f'{num1} + {num2} = {add_solution(num1, num2)}')
#         elif choice == '2':
#             print(f'{num1} - {num2} = {subtract_solution(num1, num2)}')
#         elif choice == '3':
#             print(f'{num1} * {num2} = {multiply_solution(num1, num2)}')
#         elif choice == '4':
#             print(f'{num1} / {num2} = {divide_solution(num1, num2)}')

#         next_calculation = input('Let's do next calculation? (yes/no): ')
#         if next_calculation == 'no':
#             break
#     else:
#         print('Invalid Input')


## Navigational Links

[<-- Previous Week](week_09.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_11.ipynb)


## Navigational Links

[<-- Previous Week](week_10.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_12.ipynb)


# Week 11: Debugging & Error Handling

Welcome to Week 11! This week, we tackle two essential aspects of programming: debugging and error handling. As programs grow in complexity, encountering errors becomes inevitable. You will learn how to identify, understand, and fix common types of errors (SyntaxError, TypeError, ValueError), and how to gracefully handle unexpected situations using `try`, `except`, `else`, and `finally` blocks. Mastering these skills will make your code more robust and your development process more efficient.

### Reading: Chapter 6 of 'Think Python 2e' & Python Docs

For a comprehensive understanding of this week's topics, please refer to:
*   [Think Python 2e - Chapter 6](https://greenteapress.com/wp/think-python-2e/)
*   Official Python documentation on Errors and Exceptions: [Python Docs on Errors and Exceptions](https://docs.python.org/3/tutorial/errors.html)

## Interactive Lab: Encountering and Handling Errors

This section provides hands-on exercises to help you understand different types of errors and how to handle them effectively. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: `SyntaxError`

A `SyntaxError` occurs when the Python interpreter encounters code that does not conform to the Python language's grammatical rules. This usually happens before any code is executed.

**Try It Yourself:** Run the following code cell and observe the `SyntaxError`. Identify what's wrong.

In [None]:
# Intentional SyntaxError: Missing closing parenthesis
print("Hello, Python"

#### Exercise 2: `TypeError`

A `TypeError` occurs when an operation or function is applied to an object of an inappropriate type (e.g., trying to add a number to a string).

**Try It Yourself:** Run the code. Observe the `TypeError`, then modify the code to fix it (e.g., by converting data types).

In [None]:
# Intentional TypeError
num_str = "10"
num_int = 5
# result = num_str + num_int # This will cause a TypeError
# print(result)

# Fix it by converting num_str to an int
result_fixed = int(num_str) + num_int
print(f'Fixed result: {result_fixed}')


#### Exercise 3: `ValueError`

A `ValueError` occurs when an operation or function receives an argument that has the right type but an inappropriate value (e.g., trying to convert a non-numeric string to an integer).

**Try It Yourself:** Run the code. Observe the `ValueError`, then use a `try-except` block to gracefully handle it.

In [None]:
# Intentional ValueError
invalid_num_str = "abc"
try:
    number = int(invalid_num_str) # This will cause a ValueError
    print(f'Converted number: {number}')
except ValueError:
    print('Error: Invalid literal for int(). Please enter a valid number.')


#### Exercise 4: Robust Error Handling with `try-except-else-finally`

For more complex scenarios, you can use `else` to run code when no exceptions occur, and `finally` to run code regardless of whether an exception occurred or not (e.g., for cleanup actions).

**Try It Yourself:** Modify the previous example to include `else` (print success message) and `finally` (print cleanup message).

In [None]:
# Robust Error Handling Example
def safe_divide(a, b):
    try:
        result = a / b
    except ZeroDivisionError:
        print('Error: Cannot divide by zero!')
        return None
    else:
        print(f'Division successful: {a} / {b} = {result}')
        return result
    finally:
        print('Division attempt completed.')

# Test cases
safe_divide(10, 2)
safe_divide(10, 0)
safe_divide(5, 2.5)


## Mini-Project: Robust User Input

**Task:** Write a Python program that asks the user to enter their age. The program should:
1.  Continuously prompt the user until a valid integer age (between 1 and 120, inclusive) is entered.
2.  Handle `ValueError` if the input is not an integer.
3.  Handle any other unexpected errors.
4.  Print a confirmation message with the valid age.

In [None]:
# Your Robust User Input solution here
def get_valid_age():
    while True:
        try:
            age_str = input('Please enter your age (1-120): ')
            age = int(age_str)
            if 1 <= age <= 120:
                return age
            else:
                print('Age must be between 1 and 120.')
        except ValueError:
            print('Invalid input. Please enter a whole number for your age.')
        except Exception as e:
            print(f'An unexpected error occurred: {e}')

# Uncomment the line below to test the function interactively
# valid_age = get_valid_age()
# print(f'You entered a valid age: {valid_age}')


## Unit Tests for Robust User Input

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Robust User Input function. Since `input()` is interactive, we'll simulate inputs for testing purposes.

In [None]:
print('--- Running Robust User Input Unit Tests ---')# Test Case 1: Valid input on first try# Simulate user entering '30'age1 = get_valid_age_testable(['30'])assert age1 == 30, f'Test 1 Failed: Expected 30, got {age1}'print('Test 1 Passed: Valid input on first try.')# Test Case 2: Invalid input then valid input# Simulate user entering 'abc', then '0', then '121', then '45'age2 = get_valid_age_testable(['abc', '0', '121', '45'])assert age2 == 45, f'Test 2 Failed: Expected 45 after retries, got {age2}'print('Test 2 Passed: Handles invalid input and retries.')# Test Case 3: Boundary values (minimum)age3 = get_valid_age_testable(['1'])assert age3 == 1, f'Test 3 Failed: Expected 1, got {age3}'print('Test 3 Passed: Handles minimum boundary value.')# Test Case 4: Boundary values (maximum)age4 = get_valid_age_testable(['120'])assert age4 == 120, f'Test 4 Failed: Expected 120, got {age4}'print('Test 4 Passed: Handles maximum boundary value.')# Test Case 5: Out of range (low) then validage5 = get_valid_age_testable(['-5', '10'])assert age5 == 10, f'Test 5 Failed: Expected 10 after invalid low input, got {age5}'print('Test 5 Passed: Handles out of range (low) and retries.')# Test Case 6: Out of range (high) then validage6 = get_valid_age_testable(['150', '99'])assert age6 == 99, f'Test 6 Failed: Expected 99 after invalid high input, got {age6}'print('Test 6 Passed: Handles out of range (high) and retries.')print('
All Unit Tests Completed.')

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Robust User Input mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Robust User Input
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

# def get_valid_age_solution():
#     while True:
#         try:
#             age_str = input('Please enter your age (1-120): ')
#             age = int(age_str)
#             if 1 <= age <= 120:
#                 return age
#             else:
#                 print('Age must be between 1 and 120.')
#         except ValueError:
#             print('Invalid input. Please enter a whole number for your age.')
#         except Exception as e:
#             print(f'An unexpected error occurred: {e}')

# # Uncomment the line below to test the function interactively
# # valid_age = get_valid_age_solution()
# # print(f'You entered a valid age: {valid_age}')


## Navigational Links

[<-- Previous Week](week_10.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_12.ipynb)


## Navigational Links

[<-- Previous Week](week_11.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_13.ipynb)


# Week 12: Modules and Packages

Welcome to Week 12! This week, we will explore modules and packages, essential tools for organizing and reusing Python code. Modules allow you to logically organize your Python code, grouping related code into a file. Packages allow you to organize related modules into a directory hierarchy. You'll learn how to import and use both built-in and custom modules, enhancing the modularity and maintainability of your programs. Mastering modular programming is key to building larger, more complex applications effectively.

### Reading: Chapter 15 of 'Think Python 2e'

For a comprehensive understanding of this week's topics, please refer to Chapter 15 of our primary textbook:
[Think Python 2e - Chapter 15](https://greenteapress.com/wp/think-python-2e/)

## Interactive Lab: Working with Modules

This section provides hands-on exercises to solidify your understanding of using and creating modules in Python. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Using Built-in `math` Module

Python comes with a rich standard library, including modules for various functionalities. The `math` module provides mathematical functions.

**Try It Yourself:** Calculate the square root of 64 and the value of pi using the `math` module.

In [None]:
# Your code to use the math module here
import math

num = 64
print(f'The square root of {num} is: {math.sqrt(num)}')
print(f'The value of pi is: {math.pi}')

#### Exercise 2: Using Built-in `random` Module

The `random` module provides functions for generating random numbers.

**Try It Yourself:** Generate a random integer between 1 and 10 (inclusive) and a random floating-point number between 0.0 and 1.0.

In [None]:
# Your code to use the random module here
import random

print(f'Random integer between 1 and 10: {random.randint(1, 10)}')
print(f'Random float between 0.0 and 1.0: {random.random()}')

#### Exercise 3: Creating and Importing a Custom Module

You can create your own Python files as modules. Let's create a simple module named `my_utils.py` that contains a function, and then import and use it.

**Step 1:** Create a file named `my_utils.py` in the same directory as this notebook and add the following content:

```python
# my_utils.py
def reverse_string(s):
    return s[::-1]

def capitalize_words(s):
    return s.title()
```

**Step 2:** Run the cell below to verify the creation of `my_utils.py`.

**Try It Yourself:** Import `my_utils` and use its `reverse_string` and `capitalize_words` functions.

In [None]:
# Create my_utils.py (run this cell once)
with open('my_utils.py', 'w') as f:
    f.write('def reverse_string(s):
')
    f.write('    return s[::-1]

')
    f.write('def capitalize_words(s):
')
    f.write('    return ' '.join([word.capitalize() for word in s.split()])
')
    f.write('
')
print('Created my_utils.py with reverse_string and capitalize_words functions.')


In [None]:
# Import and use your custom module here
import my_utils

text = 'hello world'
reversed_text = my_utils.reverse_string(text)
capitalized_text = my_utils.capitalize_words(text)

print(f'Original text: {text}')
print(f'Reversed text: {reversed_text}')
print(f'Capitalized words: {capitalized_text}')


## Mini-Project: Modular Game - Guess the Number

**Task:** Create a simple 'Guess the Number' game. The game logic should be separated into a module. Your main program will import and use this module.

**Module (`game_logic.py`) should contain:**
*   `generate_random_number(lower_bound, upper_bound)`: Generates a random number within a given range.
*   `check_guess(secret_number, guess)`: Compares the guess to the secret number and returns 'Too high', 'Too low', or 'Correct'.

**Main program (`week_12.ipynb` code cell) should:**
1.  Import functions from `game_logic.py`.
2.  Set a range (e.g., 1 to 100).
3.  Use `generate_random_number` to get a secret number.
4.  Loop, asking the user for guesses and providing hints using `check_guess` until the correct number is found.
5.  Keep track of and display the number of attempts.

In [None]:
# Create game_logic.py (run this cell once)
with open('game_logic.py', 'w') as f:
    f.write('import random

')
    f.write('def generate_random_number(lower_bound, upper_bound):
')
    f.write('    return random.randint(lower_bound, upper_bound)

')
    f.write('def check_guess(secret_number, guess):
')
    f.write('    if guess < secret_number:
')
    f.write('        return 'Too low'
')
    f.write('    elif guess > secret_number:
')
    f.write('        return 'Too high'
')
    f.write('    else:
')
    f.write('        return 'Correct'
')
print('Created game_logic.py with game functions.')


In [None]:
# Your Modular Game - Guess the Number solution here
import game_logic
import random # Ensure random is imported for the main script if needed elsewhere

def play_guess_the_number():
    lower_bound = 1
    upper_bound = 100
    secret_number = game_logic.generate_random_number(lower_bound, upper_bound)
    attempts = 0
    print(f'I am thinking of a number between {lower_bound} and {upper_bound}.')

    while True:
        try:
            guess = int(input('Take a guess: '))
            attempts += 1
            result = game_logic.check_guess(secret_number, guess)
            print(result)
            if result == 'Correct':
                print(f'You guessed the number in {attempts} attempts!')
                break
        except ValueError:
            print('Invalid input. Please enter a number.')

# Uncomment the line below to play the game interactively
# play_guess_the_number()


## Unit Tests for Modular Game

It's good practice to test your module's functions independently. Below are some example test cases for your `game_logic.py` functions.

In [None]:
import importlibimport game_logicimportlib.reload(game_logic) # Reload module if it was modifiedprint('--- Running Game Logic Unit Tests ---')# Test Case 1: generate_random_number within boundslower = 1upper = 10random_num = game_logic.generate_random_number(lower, upper)assert lower <= random_num <= upper, f'Test 1 Failed: Number {random_num} not within {lower}-{upper} range.'print('Test 1 Passed: generate_random_number works within bounds.')# Test Case 2: check_guess - Too lowsecret = 50guess_low = 40assert game_logic.check_guess(secret, guess_low) == 'Too low', 'Test 2 Failed: Should be Too low.'print('Test 2 Passed: check_guess reports Too low correctly.')# Test Case 3: check_guess - Too highguess_high = 60assert game_logic.check_guess(secret, guess_high) == 'Too high', 'Test 3 Failed: Should be Too high.'print('Test 3 Passed: check_guess reports Too high correctly.')# Test Case 4: check_guess - Correctguess_correct = 50assert game_logic.check_guess(secret, guess_correct) == 'Correct', 'Test 4 Failed: Should be Correct.'print('Test 4 Passed: check_guess reports Correct correctly.')print('
All Unit Tests Completed.')

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Modular Game. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Modular Game - game_logic.py content
# import random
#
# def generate_random_number_solution(lower_bound, upper_bound):
#     return random.randint(lower_bound, upper_bound)
#
# def check_guess_solution(secret_number, guess):
#     if guess < secret_number:
#         return 'Too low'
#     elif guess > secret_number:
#         return 'Too high'
#     else:
#         return 'Correct'

# Suggested solution for the main game script
# import game_logic # Assuming game_logic.py is created as above
# import random # Only needed if random is used directly, not through game_logic
#
# def play_guess_the_number_solution():
#     lower_bound = 1
#     upper_bound = 100
#     secret_number = game_logic.generate_random_number(lower_bound, upper_bound)
#     attempts = 0
#     print(f'I am thinking of a number between {lower_bound} and {upper_bound}.')
#
#     while True:
#         try:
#             guess_str = input('Take a guess: ')
#             guess = int(guess_str)
#             attempts += 1
#             result = game_logic.check_guess(secret_number, guess)
#             print(result)
#             if result == 'Correct':
#                 print(f'You guessed the number in {attempts} attempts!')
#                 break
#         except ValueError:
#             print('Invalid input. Please enter a number.')
#
# # Uncomment the line below to play the game interactively
# # play_guess_the_number_solution()


## Navigational Links

[<-- Previous Week](week_11.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_13.ipynb)


## Navigational Links

[<-- Previous Week](week_12.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_14.ipynb)


# Week 13: Data Science with Pandas (Part 1)

Welcome to Week 13! This week marks your first step into the exciting world of Data Science. We will begin exploring the Pandas library, a foundational tool for data manipulation and analysis in Python. You'll learn how to work with Pandas Series and DataFrames, which are powerful data structures designed to make working with tabular data both easy and intuitive. Mastering Pandas is crucial for anyone looking to work with data in Python, enabling you to load, clean, transform, and analyze datasets efficiently.

### Reading: 'Think Python 2e' & 'Python Data Science Handbook'

For a comprehensive understanding of this week's topics, please refer to:
*   [Think Python 2e - Chapter 16](https://greenteapress.com/wp/think-python-2e/)
*   [Python Data Science Handbook - Chapter 2 (Introduction to Pandas)](https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-pandas.html)
*   [Python Data Science Handbook - Chapter 3 (Data Manipulation with Pandas)](https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html)


## Installation: Pandas Library

The projects in this module require the installation of the Pandas Python library. Run the following cell to install it, if you haven't already. The `!pip install` command is a Jupyter/Colab specific way to run shell commands.

In [1]:
!pip install pandas



## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to familiarize you with the basic data structures and operations in Pandas. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Creation and Basic Operations

A Pandas Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The row labels of a Series are called its index.

**Try It Yourself:** Create a Series from a Python list of temperatures and perform basic operations like getting the mean and maximum value.

In [4]:
import pandas as pd

temperatures = [20, 22, 25, 23, 21, 26, 24]
temp_series = pd.Series(temperatures, name='Daily Temperatures')
print('Temperature Series:')
print(temp_series)

print(f'\nMean temperature: {temp_series.mean()}°C')
print(f'Maximum temperature: {temp_series.max()}°C')
print(f'Minimum temperature: {temp_series.min()}°C')

Temperature Series:
0    20
1    22
2    25
3    23
4    21
5    26
6    24
Name: Daily Temperatures, dtype: int64

Mean temperature: 23.0°C
Maximum temperature: 26°C
Minimum temperature: 20°C


#### Exercise 2: Pandas DataFrame Creation and Access

A DataFrame is a two-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or a SQL table. It is generally the most commonly used Pandas object.

**Try It Yourself:** Create a DataFrame from a dictionary of student data and access specific columns and rows.

In [None]:
import pandas as pd

student_data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [20, 21, 19, 22],
    'Major': ['CS', 'Math', 'Physics', 'CS'],
    'GPA': [3.8, 3.5, 3.9, 3.7]
}
students_df = pd.DataFrame(student_data)
print('Students DataFrame:')
print(students_df)

print(f'
Names of students: {students_df["Name"].tolist()}')
print(f'Ages of students: {students_df["Age"].tolist()}')
print(f'Majors of students: {students_df["Major"].tolist()}')
print(f'GPA of students: {students_df["GPA"].tolist()}')

#### Exercise 3: Basic DataFrame Operations: Selection and Filtering

You can select columns using dictionary-like notation (`df['column']`) and filter rows using boolean indexing (e.g., `df[df['column'] > value]`).

**Try It Yourself:** From the `students_df` created above, select only the 'Name' and 'GPA' columns. Then, filter to show only students with a GPA greater than 3.7.

In [None]:
import pandas as pd

# Ensure a DataFrame is available, similar to Exercise 2
data_dict = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data_dict)
print('Initial DataFrame:
', df)

# 1. Select and print only the 'Name' column
print('
Name column:
', df['Name'])

# 2. Filter and print rows where 'Age' is greater than 28
print('
Rows where Age > 28:
', df[df['Age'] > 28])

# 3. Add a new column named 'Status' with sample values
df['Status'] = ['Active', 'Inactive', 'Active']
print('
DataFrame with new 'Status' column:
', df)


## Mini-Project: Basic Data Analysis with City Population

**Task:** Perform basic data analysis on a small dataset of city populations. Your program should:
1.  Create a Pandas DataFrame from the provided data (or a dummy CSV if you wish).
2.  Display the first few rows of the DataFrame (`.head()`).
3.  Print the basic information about the DataFrame (`.info()`).
4.  Calculate and print the total population, average population, and the city with the maximum population.

In [None]:
import pandas as pd

# 1. Create a DataFrame for city population data
city_data = {
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Population': [8419000, 3980000, 2705000, 2320000, 1680000],
    'State': ['NY', 'CA', 'IL', 'TX', 'AZ']
}
cities_df = pd.DataFrame(city_data)

# 2. Display the first few rows
print('First few rows of the DataFrame:')
print(cities_df.head())

# 3. Print basic information
print('
Basic information:')
cities_df.info()

# 4. Calculate and print statistics
total_population = cities_df['Population'].sum()
average_population = cities_df['Population'].mean()
max_population_city = cities_df.loc[cities_df['Population'].idxmax()]

print(f'
Total Population: {total_population:,}')
print(f'Average Population: {average_population:,.0f}')
print(f"City with Maximum Population: {max_population_city['City']} ({max_population_city['Population']:,})")

## Unit Tests for Basic Data Analysis

It's good practice to test your data analysis steps to ensure correctness. Below are some example test cases.

In [None]:
import pandas as pd
import numpy as np

# Test data
test_city_data = {
    'City': ['A', 'B', 'C'],
    'Population': [100000, 200000, 50000],
    'State': ['X', 'Y', 'X']
}
test_df = pd.DataFrame(test_city_data)

print('--- Running Basic Data Analysis Unit Tests ---
')

# Test 1: Total Population
expected_total_pop = 350000
calculated_total_pop = test_df['Population'].sum()
assert calculated_total_pop == expected_total_pop, f'Test 1 Failed: Expected total {expected_total_pop}, got {calculated_total_pop}'
print('Test 1 Passed: Total Population.')

# Test 2: Average Population
expected_avg_pop = 350000 / 3 # Approx 116666.666
calculated_avg_pop = test_df['Population'].mean()
assert np.isclose(calculated_avg_pop, expected_avg_pop), f'Test 2 Failed: Expected avg {expected_avg_pop}, got {calculated_avg_pop}'
print('Test 2 Passed: Average Population.')

# Test 3: City with Max Population
expected_max_city_name = 'B'
calculated_max_city = test_df.loc[test_df['Population'].idxmax()]['City']
assert calculated_max_city == expected_max_city_name, f'Test 3 Failed: Expected max city {expected_max_city_name}, got {calculated_max_city}'
print('Test 3 Passed: City with Max Population.')

# Test 4: No Location Found - Removed out-of-scope calls from a previous mini-project
# test_locations.clear()
# results = find_test_location_by_coords(10.0, 20.0)
# assert len(results) == 0, 'Test Failed: Should not find non-existent location.'
print('Test Case 4 (No Location Found) logic removed as it referenced external variables not initialized here.')

print('
--- All Basic Data Analysis Unit Tests Passed! ---')

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Basic Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Basic Data Analysis mini-project
# import pandas as pd

# city_data_solution = {
#     'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
#     'Population': [8419000, 3980000, 2705000, 2320000, 1680000],
#     'State': ['NY', 'CA', 'IL', 'TX', 'AZ']
# }
# cities_df_solution = pd.DataFrame(city_data_solution)

# # Display the first few rows
# print('First few rows of the DataFrame:')
# print(cities_df_solution.head())

# # Print basic information
# print('
Basic information:')
# cities_df_solution.info()

# # Calculate and print statistics
# total_population_solution = cities_df_solution['Population'].sum()
# average_population_solution = cities_df_solution['Population'].mean()
# max_population_city_solution = cities_df_solution.loc[cities_df_solution['Population'].idxmax()]

# print(f'
Total Population: {total_population_solution:,}')
# print(f'Average Population: {average_population_solution:,.0f}')
# print(f"City with Maximum Population: {max_population_city_solution['City']} ({max_population_city_solution['Population']:,})")

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
import pandas as pd

# 1. Create a Pandas Series
data = [10, 20, 30, 40, 50]
index_labels = ['a', 'b', 'c', 'd', 'e']
s = pd.Series(data, index=index_labels)
print('Created Series:
', s)

# 2. Access the element with label 'c'
print('
Element with label 'c':', s['c'])

# 3. Add 5 to every element
s_plus_5 = s + 5
print('
Series after adding 5:
', s_plus_5)

#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
import pandas as pd

# 1. Create a DataFrame from a dictionary
data_dict = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago']}
df_from_dict = pd.DataFrame(data_dict)
print('DataFrame from dictionary:
', df_from_dict)

# 2. Create another DataFrame from a list of dictionaries
data_list_of_dicts = [
    {'Name': 'Diana', 'Age': 28, 'City': 'Houston'},
    {'Name': 'Eve', 'Age': 22, 'City': 'Miami'}
]
df_from_list_of_dicts = pd.DataFrame(data_list_of_dicts)
print('
DataFrame from list of dictionaries:
', df_from_list_of_dicts)


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis
**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.


**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.


In [None]:
import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_sales = pd.DataFrame(data)
print("Initial Sales DataFrame:
" + str(df_sales.head()))

# 2. Calculate Total Revenue (simplified as per instructions)
df_sales['Revenue'] = df_sales['Sales']
print("
DataFrame with Revenue:
" + str(df_sales.head()))

# 3. Group by Product
product_summary = df_sales.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_sales.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


In [None]:
import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_sales = pd.DataFrame(data)
print("Initial Sales DataFrame:
" + str(df_sales.head()))

# 2. Calculate Total Revenue (simplified as per instructions)
df_sales['Revenue'] = df_sales['Sales']
print("
DataFrame with Revenue:
" + str(df_sales.head()))

# 3. Group by Product
product_summary = df_sales.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_sales.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Unit Tests for Sales Data Analysis

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Sales Data Analysis. Run them and verify the output.

In [None]:
import pandas as pd
import numpy as np

# Helper function to run the analysis for testing
def run_sales_analysis(df_input):
    # Calculate Total Revenue
    df_input['Revenue'] = df_input['Sales']
    
    # Group by Product
    product_summary = df_input.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
    
    # Group by Region
    region_summary = df_input.groupby('Region')['Revenue'].sum().reset_index()
    
    # Find Top Selling Product (by Revenue)
    top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
    
    return df_input, product_summary, region_summary, top_product

# Test Cases
print('--- Running Sales Data Analysis Unit Tests ---')

# Test Case 1: Basic data
test_data_1 = {
    'Product': ['Laptop', 'Mouse', 'Laptop', 'Keyboard'],
    'Region': ['East', 'East', 'West', 'North'],
    'Sales': [1000, 100, 1200, 300],
    'Units Sold': [5, 10, 6, 3]
}
df_test_1 = pd.DataFrame(test_data_1)
df_res_1, prod_res_1, reg_res_1, top_res_1 = run_sales_analysis(df_test_1.copy())
assert prod_res_1[prod_res_1['Product'] == 'Laptop']['Revenue'].iloc[0] == 2200, 'Test 1 Failed: Laptop revenue incorrect'
assert reg_res_1[reg_res_1['Region'] == 'East']['Revenue'].iloc[0] == 1100, 'Test Case 1 Failed: East region revenue incorrect'
assert top_res_1['Product'] == 'Laptop', 'Test Case 1 Failed: Top product incorrect'
print('Test Case 1 Passed: Basic data analysis is correct.')

# Test Case 2: All same product
test_data_2 = {
    'Product': ['Monitor', 'Monitor', 'Monitor'],
    'Region': ['South', 'North', 'South'],
    'Sales': [500, 700, 600],
    'Units Sold': [2, 3, 2]
}
df_test_2 = pd.DataFrame(test_data_2)
df_res_2, prod_res_2, reg_res_2, top_res_2 = run_sales_analysis(df_test_2.copy())
assert prod_res_2.shape[0] == 1, 'Test Case 2 Failed: Product summary row count incorrect'
assert prod_res_2['Product'].iloc[0] == 'Monitor', 'Test Case 2 Failed: Product name incorrect'
assert top_res_2['Product'] == 'Monitor', 'Test Case 2 Failed: Top product incorrect'
print('Test Case 2 Passed: All same product analysis is correct.')

print('
All Unit Tests Completed.')


#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_sales = pd.DataFrame(data)
print("Initial Sales DataFrame:
" + str(df_sales.head()))

# 2. Calculate Total Revenue (simplified as per instructions)
df_sales['Revenue'] = df_sales['Sales']
print("
DataFrame with Revenue:
" + str(df_sales.head()))

# 3. Group by Product
product_summary = df_sales.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_sales.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Unit Tests for Sales Data Analysis

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Sales Data Analysis. Run them and verify the output.

## Unit Tests for Sales Data Analysis

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Sales Data Analysis. Run them and verify the output.

In [None]:
# Your Unit Tests for Sales Data Analysis here


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Unit Tests for Sales Data Analysis

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Sales Data Analysis. Run them and verify the output.

In [None]:
# Your Unit Tests for Sales Data Analysis here


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:\n" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("\nDataFrame with Revenue:\n" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("\nGrouped by Product:\n" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("\nGrouped by Region:\n" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("\nTop Selling Product (by Revenue):\n" + str(top_product))


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Unit Tests for Sales Data Analysis

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Sales Data Analysis. Run them and verify the output.

In [None]:
# Your Unit Tests for Sales Data Analysis here


## Hints/Solution (Optional, Expand to View)

This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:\n" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("\nDataFrame with Revenue:\n" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("\nGrouped by Product:\n" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("\nGrouped by Region:\n" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("\nTop Selling Product (by Revenue):\n" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:\n" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("\nDataFrame with Revenue:\n" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("\nGrouped by Product:\n" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("\nGrouped by Region:\n" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("\nTop Selling Product (by Revenue):\n" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:\n" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("\nDataFrame with Revenue:\n" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("\nGrouped by Product:\n" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("\nGrouped by Region:\n" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("\nTop Selling Product (by Revenue):\n" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:\n" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("\nDataFrame with Revenue:\n" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("\nGrouped by Product:\n" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("\nGrouped by Region:\n" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("\nTop Selling Product (by Revenue):\n" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:\n" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("\nDataFrame with Revenue:\n" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("\nGrouped by Product:\n" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("\nGrouped by Region:\n" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("\nTop Selling Product (by Revenue):\n" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:\n" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("\nDataFrame with Revenue:\n" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("\nGrouped by Product:\n" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("\nGrouped by Region:\n" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("\nTop Selling Product (by Revenue):\n" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Navigational Links

[<-- Previous Week](week_12.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](week_14.ipynb)


## Navigational Links

[<-- Previous Week](week_13.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](course_overview.ipynb)


# Week 14: Data Visualization with Matplotlib

Welcome to Week 14, the final week of Module 4! This week, we will explore Matplotlib, Python's most popular library for creating static, animated, and interactive visualizations. Building on our work with Pandas, you'll learn how to effectively represent your data graphically, uncovering patterns and insights. We'll cover fundamental plot types like line plots, scatter plots, and bar charts, as well as essential customization techniques such as adding labels, titles, and legends. Mastering data visualization is crucial for communicating your data analysis findings clearly and persuasively.

### Reading: Chapter 4 of 'Python Data Science Handbook'

For a comprehensive understanding of this week's topics, please refer to Chapter 4 of the open-source textbook, which covers Matplotlib in detail:
[Python Data Science Handbook - Chapter 4 (Visualization with Matplotlib)](https://jakevdp.github.io/PythonDataScienceHandbook/04.00-introduction-to-matplotlib.html)

## Installation: Matplotlib Library

The projects in this module require the installation of the Matplotlib Python library. Run the following cell to install it, if you haven't already. The `!pip install` command is a Jupyter/Colab specific way to run shell commands.

In [None]:
!pip install matplotlib

## Interactive Lab: Basic Data Visualization

This section provides hands-on exercises to familiarize you with creating various types of plots using Matplotlib. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Line Plot - Trends Over Time

Line plots are ideal for showing trends over a continuous range, such as time. They connect individual data points with line segments.

**Try It Yourself:** Plot the monthly average temperature for a year.

In [None]:
import matplotlib.pyplot as plt

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
avg_temps = [2, 4, 8, 13, 17, 21, 23, 22, 18, 13, 7, 3]

plt.figure(figsize=(10, 6))
plt.plot(months, avg_temps, marker='o', linestyle='-', color='skyblue')
plt.title('Monthly Average Temperatures')
plt.xlabel('Month')
plt.ylabel('Average Temperature (°C)')
plt.grid(True)
plt.show()

#### Exercise 2: Scatter Plot - Relationships Between Variables

Scatter plots are used to observe relationships between two different quantitative variables. They show individual data points as marks.

**Try It Yourself:** Plot a scatter plot of study hours versus exam scores to see if there's a correlation.

In [None]:
import matplotlib.pyplot as plt

study_hours = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
exam_scores = [50, 55, 60, 65, 70, 75, 80, 85, 90, 95]

plt.figure(figsize=(8, 6))
plt.scatter(study_hours, exam_scores, color='lightcoral', marker='x')
plt.title('Study Hours vs. Exam Scores')
plt.xlabel('Study Hours')
plt.ylabel('Exam Score')
plt.grid(True)
plt.show()

#### Exercise 3: Bar Chart - Comparing Categories

Bar charts are excellent for comparing quantities among different categories. They use rectangular bars with lengths proportional to the values they represent.

**Try It Yourself:** Create a bar chart showing the sales figures for different product categories.

In [None]:
import matplotlib.pyplot as plt

product_categories = ['Electronics', 'Clothing', 'Books', 'Home Goods', 'Food']
sales = [15000, 12000, 8000, 10000, 18000]

plt.figure(figsize=(10, 6))
plt.bar(product_categories, sales, color='mediumseagreen')
plt.title('Sales by Product Category')
plt.xlabel('Product Category')
plt.ylabel('Sales ($)')
plt.grid(axis='y', linestyle='--')
plt.show()

## Mini-Project: Sales Data Visualization

**Task:** You are given sales data for different regions over three quarters. Create a series of visualizations using Matplotlib to summarize and present this data.

**Data:**
```python
sales_data = {
    'Quarter': ['Q1', 'Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2', 'Q2', 'Q3', 'Q3', 'Q3', 'Q3'],
    'Region': ['East', 'West', 'North', 'South', 'East', 'West', 'North', 'South', 'East', 'West', 'North', 'South'],
    'Sales': [150, 200, 100, 180, 170, 220, 110, 190, 160, 210, 105, 185]
}
```

**Instructions:**
1.  **Create a DataFrame:** Convert the `sales_data` dictionary into a Pandas DataFrame.
2.  **Total Sales per Region (Bar Chart):** Create a bar chart showing the total sales for each region across all quarters. Add appropriate labels and a title.
3.  **Sales Trend per Region (Line Plot):** Create a line plot that shows the sales trend for each region over the three quarters. Each region should be a separate line. Add a legend, labels, and a title.
4.  **Sales Distribution (Histogram):** Create a histogram to visualize the distribution of individual sales figures. Add labels and a title.
5.  **Quarterly Sales (Pie Chart):** Calculate the total sales for each quarter and display it as a pie chart. Add labels and a title.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Data: Sales data for different regions over three quarters
sales_data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'South', 'East', 'West', 'North', 'South', 'East', 'West'],
    'Product': ['A', 'B', 'A', 'C', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
    'Q1_Sales': [100, 150, 120, 130, 110, 140, 125, 135, 105, 145, 115, 120],
    'Q2_Sales': [110, 160, 130, 140, 120, 150, 135, 145, 115, 155, 125, 130],
    'Q3_Sales': [105, 155, 125, 135, 115, 145, 130, 140, 110, 150, 120, 125]
}
df_sales = pd.DataFrame(sales_data)
print("Original Sales Data:\n" + str(df_sales.head()))

# 1. Total Sales by Region (Bar Chart)
df_sales['Total_Sales'] = df_sales[['Q1_Sales', 'Q2_Sales', 'Q3_Sales']].sum(axis=1)
regional_sales = df_sales.groupby('Region')['Total_Sales'].sum().sort_values(ascending=False)

plt.figure(figsize=(10, 6))
regional_sales.plot(kind='bar', color='skyblue')
plt.title('Total Sales by Region')
plt.xlabel('Region')
plt.ylabel('Total Sales')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--')
plt.tight_layout()
plt.show()

# 2. Sales Distribution by Product (Pie Chart)
product_sales = df_sales.groupby('Product')['Total_Sales'].sum()

plt.figure(figsize=(8, 8))
plt.pie(product_sales, labels=product_sales.index, autopct='%1.1f%%', startangle=90, colors=plt.cm.Paired.colors)
plt.title('Sales Distribution by Product')
plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
plt.tight_layout()
plt.show()

# 3. Quarterly Sales Trend by Region (Line Plot)
quarterly_sales = df_sales.groupby('Region')[['Q1_Sales', 'Q2_Sales', 'Q3_Sales']].sum()

plt.figure(figsize=(12, 7))
for region in quarterly_sales.index:
    plt.plot(['Q1', 'Q2', 'Q3'], quarterly_sales.loc[region], marker='o', label=region)

plt.title('Quarterly Sales Trend by Region')
plt.xlabel('Quarter')
plt.ylabel('Sales')
plt.grid(True, linestyle='--')
plt.legend(title='Region')
plt.tight_layout()
plt.show()


## Unit Tests for Sales Data Visualization

Testing visualizations programmatically can be complex, often involving image comparison. For simplicity, we'll focus on testing the data transformations that lead to the visualizations.

In [None]:
import pandas as pd
import numpy as np

# Helper function to generate data for testing (to match mini-project structure)
def generate_test_sales_data():
    sales_data = {
        'Region': ['North', 'South', 'East', 'West', 'North', 'South', 'East', 'West', 'North', 'South', 'East', 'West'],
        'Product': ['A', 'B', 'A', 'C', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
        'Q1_Sales': [100, 150, 120, 130, 110, 140, 125, 135, 105, 145, 115, 120],
        'Q2_Sales': [110, 160, 130, 140, 120, 150, 135, 145, 115, 155, 125, 130],
        'Q3_Sales': [105, 155, 125, 135, 115, 145, 130, 140, 110, 150, 120, 125]
    }
    return pd.DataFrame(sales_data)

# Helper function to run the data transformation parts of the analysis for testing
def run_sales_data_transformations(df_input):
    df_input['Total_Sales'] = df_input[['Q1_Sales', 'Q2_Sales', 'Q3_Sales']].sum(axis=1)
    regional_sales = df_input.groupby('Region')['Total_Sales'].sum().sort_values(ascending=False)
    product_sales = df_input.groupby('Product')['Total_Sales'].sum()
    quarterly_sales = df_input.groupby('Region')[['Q1_Sales', 'Q2_Sales', 'Q3_Sales']].sum()
    return df_input, regional_sales, product_sales, quarterly_sales

# Test Cases
print('--- Running Sales Data Visualization Unit Tests ---')

# Generate test data
df_test = generate_test_sales_data()
df_transformed, regional_sales_test, product_sales_test, quarterly_sales_test = run_sales_data_transformations(df_test.copy())

# Test 1: Verify Total_Sales column
expected_total_sales_first_row = 100 + 110 + 105
assert df_transformed['Total_Sales'].iloc[0] == expected_total_sales_first_row, f'Test 1 Failed: Total_Sales for first row incorrect. Expected {expected_total_sales_first_row}, got {df_transformed['Total_Sales'].iloc[0]}'.replace('
','')
print('Test 1 Passed: Total_Sales calculation is correct.')

# Test 2: Verify regional_sales totals (e.g., North region)
expected_north_sales = df_test[df_test['Region'] == 'North'][['Q1_Sales', 'Q2_Sales', 'Q3_Sales']].sum().sum()
assert regional_sales_test['North'] == expected_north_sales, f'Test 2 Failed: North regional sales incorrect. Expected {expected_north_sales}, got {regional_sales_test['North']}'.replace('
','')
print('Test 2 Passed: North regional sales aggregation is correct.')

# Test 3: Verify product_sales totals (e.g., Product A)
expected_product_a_sales = df_test[df_test['Product'] == 'A'][['Q1_Sales', 'Q2_Sales', 'Q3_Sales']].sum().sum()
assert product_sales_test['A'] == expected_product_a_sales, f'Test 3 Failed: Product A sales incorrect. Expected {expected_product_a_sales}, got {product_sales_test['A']}'.replace('
','')
print('Test 3 Passed: Product A sales aggregation is correct.')

# Test 4: Verify quarterly_sales for a specific region and quarter (e.g., North Q1)
expected_north_q1_sales = df_test[df_test['Region'] == 'North']['Q1_Sales'].sum()
assert quarterly_sales_test.loc['North', 'Q1_Sales'] == expected_north_q1_sales, f'Test 4 Failed: North Q1 sales incorrect. Expected {expected_north_q1_sales}, got {quarterly_sales_test.loc['North', 'Q1_Sales']}'.replace('
','')
print('Test 4 Passed: North Q1 sales aggregation is correct.')

print('
All Unit Tests Completed.')


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Visualization mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Visualization mini-project
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Data (as provided in the mini-project description)
sales_data = {
    'Quarter': ['Q1', 'Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2', 'Q2', 'Q3', 'Q3', 'Q3', 'Q3'],
    'Region': ['East', 'West', 'North', 'South', 'East', 'West', 'North', 'South', 'East', 'West', 'North', 'South'],
    'Sales': [150, 200, 100, 180, 170, 220, 110, 190, 160, 210, 105, 185]
}
df_sales = pd.DataFrame(sales_data)
print("Original DataFrame:"
 + str(df_sales))

# 1. Create a DataFrame (already done above)

# 2. Total Sales per Region (Bar Chart)
regional_sales = df_sales.groupby('Region')['Sales'].sum().reset_index()

plt.figure(figsize=(10, 6))
plt.bar(regional_sales['Region'], regional_sales['Sales'], color=['skyblue', 'lightcoral', 'lightgreen', 'gold'])
plt.title('Total Sales per Region')
plt.xlabel('Region')
plt.ylabel('Total Sales')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()

# 3. Sales Trend per Region (Line Plot)
quarterly_sales_by_region = df_sales.groupby(['Quarter', 'Region'])['Sales'].sum().unstack().fillna(0)

plt.figure(figsize=(12, 7))
for region in quarterly_sales_by_region.columns:
    plt.plot(quarterly_sales_by_region.index, quarterly_sales_by_region[region], marker='o', label=region)
plt.title('Sales Trend per Region Over Quarters')
plt.xlabel('Quarter')
plt.ylabel('Total Sales')
plt.legend(title='Region')
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()

# 4. Sales Distribution (Histogram)
plt.figure(figsize=(10, 6))
plt.hist(df_sales['Sales'], bins=5, color='lightsalmon', edgecolor='black')
plt.title('Distribution of Individual Sales Figures')
plt.xlabel('Sales Amount')
plt.ylabel('Frequency')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()

# 5. Quarterly Sales (Pie Chart)
quarterly_total_sales = df_sales.groupby('Quarter')['Sales'].sum()

plt.figure(figsize=(8, 8))
plt.pie(quarterly_total_sales, labels=quarterly_total_sales.index, autopct='%1.1f%%', startangle=90, colors=['#ff9999','#66b3ff','#99ff99'])
plt.title('Total Sales Distribution by Quarter')
plt.ylabel('') # Hide y-label for pie chart
plt.show()


In [None]:
# Suggested solution for Sales Data Visualization mini-project
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Data (as provided in the mini-project description)
sales_data_solution = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'South', 'East', 'West', 'North', 'South', 'East', 'West'],
    'Product': ['A', 'B', 'A', 'C', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
    'Q1_Sales': [100, 150, 120, 130, 110, 140, 125, 135, 105, 145, 115, 120],
    'Q2_Sales': [110, 160, 130, 140, 120, 150, 135, 145, 115, 155, 125, 130],
    'Q3_Sales': [105, 155, 125, 135, 115, 145, 130, 140, 110, 150, 120, 125]
}
df_sales_solution = pd.DataFrame(sales_data_solution)

# Calculate Total Sales
df_sales_solution['Total_Sales'] = df_sales_solution[['Q1_Sales', 'Q2_Sales', 'Q3_Sales']].sum(axis=1)

# 1. Total Sales by Region (Bar Chart)
regional_sales_solution = df_sales_solution.groupby('Region')['Total_Sales'].sum().sort_values(ascending=False)

plt.figure(figsize=(10, 6))
regional_sales_solution.plot(kind='bar', color='skyblue')
plt.title('Total Sales by Region')
plt.xlabel('Region')
plt.ylabel('Total Sales')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--')
plt.tight_layout()
plt.show()

# 2. Sales Distribution by Product (Pie Chart)
product_sales_solution = df_sales_solution.groupby('Product')['Total_Sales'].sum()

plt.figure(figsize=(8, 8))
plt.pie(product_sales_solution, labels=product_sales_solution.index, autopct='%1.1f%%', startangle=90, colors=plt.cm.Paired.colors)
plt.title('Sales Distribution by Product')
plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
plt.tight_layout()
plt.show()

# 3. Quarterly Sales Trend by Region (Line Plot)
quarterly_sales_solution = df_sales_solution.groupby('Region')[['Q1_Sales', 'Q2_Sales', 'Q3_Sales']].sum()

plt.figure(figsize=(12, 7))
for region in quarterly_sales_solution.index:
    plt.plot(['Q1', 'Q2', 'Q3'], quarterly_sales_solution.loc[region], marker='o', label=region)

plt.title('Quarterly Sales Trend by Region')
plt.xlabel('Quarter')
plt.ylabel('Sales')
plt.grid(True, linestyle='--')
plt.legend(title='Region')
plt.tight_layout()
plt.show()


## Hints/Solution (Optional, Expand to View)

This section contains a suggested implementation for the Sales Data Visualization mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Visualization mini-project
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Data (as provided in the mini-project description)
sales_data_solution = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'South', 'East', 'West', 'North', 'South', 'East', 'West'],
    'Product': ['A', 'B', 'A', 'C', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
    'Q1_Sales': [100, 150, 120, 130, 110, 140, 125, 135, 105, 145, 115, 120],
    'Q2_Sales': [110, 160, 130, 140, 120, 150, 135, 145, 115, 155, 125, 130],
    'Q3_Sales': [105, 155, 125, 135, 115, 145, 130, 140, 110, 150, 120, 125]
}
df_sales_solution = pd.DataFrame(sales_data_solution)

# Calculate Total Sales
df_sales_solution['Total_Sales'] = df_sales_solution[['Q1_Sales', 'Q2_Sales', 'Q3_Sales']].sum(axis=1)

# 1. Total Sales by Region (Bar Chart)
regional_sales_solution = df_sales_solution.groupby('Region')['Total_Sales'].sum().sort_values(ascending=False)

plt.figure(figsize=(10, 6))
regional_sales_solution.plot(kind='bar', color='skyblue')
plt.title('Total Sales by Region')
plt.xlabel('Region')
plt.ylabel('Total Sales')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--')
plt.tight_layout()
plt.show()

# 2. Sales Distribution by Product (Pie Chart)
product_sales_solution = df_sales_solution.groupby('Product')['Total_Sales'].sum()

plt.figure(figsize=(8, 8))
plt.pie(product_sales_solution, labels=product_sales_solution.index, autopct='%1.1f%%', startangle=90, colors=plt.cm.Paired.colors)
plt.title('Sales Distribution by Product')
plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
plt.tight_layout()
plt.show()

# 3. Quarterly Sales Trend by Region (Line Plot)
quarterly_sales_solution = df_sales_solution.groupby('Region')[['Q1_Sales', 'Q2_Sales', 'Q3_Sales']].sum()

plt.figure(figsize=(12, 7))
for region in quarterly_sales_solution.index:
    plt.plot(['Q1', 'Q2', 'Q3'], quarterly_sales_solution.loc[region], marker='o', label=region)

plt.title('Quarterly Sales Trend by Region')
plt.xlabel('Quarter')
plt.ylabel('Sales')
plt.grid(True, linestyle='--')
plt.legend(title='Region')
plt.tight_layout()
plt.show()


## Hints/Solution (Optional, Expand to View)

This section contains a suggested implementation for the Sales Data Visualization mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Visualization mini-project
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Data (as provided in the mini-project description)
sales_data_solution = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'South', 'East', 'West', 'North', 'South', 'East', 'West'],
    'Product': ['A', 'B', 'A', 'C', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
    'Q1_Sales': [100, 150, 120, 130, 110, 140, 125, 135, 105, 145, 115, 120],
    'Q2_Sales': [110, 160, 130, 140, 120, 150, 135, 145, 115, 155, 125, 130],
    'Q3_Sales': [105, 155, 125, 135, 115, 145, 130, 140, 110, 150, 120, 125]
}
df_sales_solution = pd.DataFrame(sales_data_solution)

# Calculate Total Sales
df_sales_solution['Total_Sales'] = df_sales_solution[['Q1_Sales', 'Q2_Sales', 'Q3_Sales']].sum(axis=1)

# 1. Total Sales by Region (Bar Chart)
regional_sales_solution = df_sales_solution.groupby('Region')['Total_Sales'].sum().sort_values(ascending=False)

plt.figure(figsize=(10, 6))
regional_sales_solution.plot(kind='bar', color='skyblue')
plt.title('Total Sales by Region')
plt.xlabel('Region')
plt.ylabel('Total Sales')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--')
plt.tight_layout()
plt.show()

# 2. Sales Distribution by Product (Pie Chart)
product_sales_solution = df_sales_solution.groupby('Product')['Total_Sales'].sum()

plt.figure(figsize=(8, 8))
plt.pie(product_sales_solution, labels=product_sales_solution.index, autopct='%1.1f%%', startangle=90, colors=plt.cm.Paired.colors)
plt.title('Sales Distribution by Product')
plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
plt.tight_layout()
plt.show()

# 3. Quarterly Sales Trend by Region (Line Plot)
quarterly_sales_solution = df_sales_solution.groupby('Region')[['Q1_Sales', 'Q2_Sales', 'Q3_Sales']].sum()

plt.figure(figsize=(12, 7))
for region in quarterly_sales_solution.index:
    plt.plot(['Q1', 'Q2', 'Q3'], quarterly_sales_solution.loc[region], marker='o', label=region)

plt.title('Quarterly Sales Trend by Region')
plt.xlabel('Quarter')
plt.ylabel('Sales')
plt.grid(True, linestyle='--')
plt.legend(title='Region')
plt.tight_layout()
plt.show()


## Navigational Links

[<-- Previous Week](week_13.ipynb) | [<-- Back to Course Overview](course_overview.ipynb) | [Next Week -->](course_overview.ipynb)
