## Navigational Links

[<-- Back to Course Overview](course_overview.ipynb)


# Week 13: Data Science with Pandas (Part 1)

Welcome to Week 13! This week marks your first step into the exciting world of Data Science. We will begin exploring the Pandas library, a foundational tool for data manipulation and analysis in Python. You'll learn how to work with Pandas Series and DataFrames, which are powerful data structures designed to make working with tabular data both easy and intuitive. Mastering Pandas is crucial for anyone looking to work with data in Python, enabling you to load, clean, transform, and analyze datasets efficiently.

### Reading: 'Think Python 2e' & 'Python Data Science Handbook'

For a comprehensive understanding of this week's topics, please refer to:
*   [Think Python 2e - Chapter 16](https://greenteapress.com/wp/think-python-2e/)
*   [Python Data Science Handbook - Chapter 2 (Introduction to Pandas)](https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-pandas.html)
*   [Python Data Science Handbook - Chapter 3 (Data Manipulation with Pandas)](https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html)


## Installation: Pandas Library

The projects in this module require the installation of the Pandas Python library. Run the following cell to install it, if you haven't already. The `!pip install` command is a Jupyter/Colab specific way to run shell commands.

In [1]:
!pip install pandas



## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to familiarize you with the basic data structures and operations in Pandas. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Creation and Basic Operations

A Pandas Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The row labels of a Series are called its index.

**Try It Yourself:** Create a Series from a Python list of temperatures and perform basic operations like getting the mean and maximum value.

In [4]:
import pandas as pd

temperatures = [20, 22, 25, 23, 21, 26, 24]
temp_series = pd.Series(temperatures, name='Daily Temperatures')
print('Temperature Series:')
print(temp_series)

print(f'\nMean temperature: {temp_series.mean()}°C')
print(f'Maximum temperature: {temp_series.max()}°C')
print(f'Minimum temperature: {temp_series.min()}°C')

Temperature Series:
0    20
1    22
2    25
3    23
4    21
5    26
6    24
Name: Daily Temperatures, dtype: int64

Mean temperature: 23.0°C
Maximum temperature: 26°C
Minimum temperature: 20°C


#### Exercise 2: Pandas DataFrame Creation and Access

A DataFrame is a two-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or a SQL table. It is generally the most commonly used Pandas object.

**Try It Yourself:** Create a DataFrame from a dictionary of student data and access specific columns and rows.

In [None]:
import pandas as pd

student_data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [20, 21, 19, 22],
    'Major': ['CS', 'Math', 'Physics', 'CS'],
    'GPA': [3.8, 3.5, 3.9, 3.7]
}
students_df = pd.DataFrame(student_data)
print('Students DataFrame:')
print(students_df)

print(f'
Names of students: {students_df["Name"].tolist()}')
print(f'Ages of students: {students_df["Age"].tolist()}')
print(f'Majors of students: {students_df["Major"].tolist()}')
print(f'GPA of students: {students_df["GPA"].tolist()}')

#### Exercise 3: Basic DataFrame Operations: Selection and Filtering

You can select columns using dictionary-like notation (`df['column']`) and filter rows using boolean indexing (e.g., `df[df['column'] > value]`).

**Try It Yourself:** From the `students_df` created above, select only the 'Name' and 'GPA' columns. Then, filter to show only students with a GPA greater than 3.7.

In [None]:
import pandas as pd

# Ensure a DataFrame is available, similar to Exercise 2
data_dict = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data_dict)
print('Initial DataFrame:
', df)

# 1. Select and print only the 'Name' column
print('
Name column:
', df['Name'])

# 2. Filter and print rows where 'Age' is greater than 28
print('
Rows where Age > 28:
', df[df['Age'] > 28])

# 3. Add a new column named 'Status' with sample values
df['Status'] = ['Active', 'Inactive', 'Active']
print('
DataFrame with new 'Status' column:
', df)


## Mini-Project: Basic Data Analysis with City Population

**Task:** Perform basic data analysis on a small dataset of city populations. Your program should:
1.  Create a Pandas DataFrame from the provided data (or a dummy CSV if you wish).
2.  Display the first few rows of the DataFrame (`.head()`).
3.  Print the basic information about the DataFrame (`.info()`).
4.  Calculate and print the total population, average population, and the city with the maximum population.

In [None]:
import pandas as pd

# 1. Create a DataFrame for city population data
city_data = {
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Population': [8419000, 3980000, 2705000, 2320000, 1680000],
    'State': ['NY', 'CA', 'IL', 'TX', 'AZ']
}
cities_df = pd.DataFrame(city_data)

# 2. Display the first few rows
print('First few rows of the DataFrame:')
print(cities_df.head())

# 3. Print basic information
print('
Basic information:')
cities_df.info()

# 4. Calculate and print statistics
total_population = cities_df['Population'].sum()
average_population = cities_df['Population'].mean()
max_population_city = cities_df.loc[cities_df['Population'].idxmax()]

print(f'
Total Population: {total_population:,}')
print(f'Average Population: {average_population:,.0f}')
print(f"City with Maximum Population: {max_population_city['City']} ({max_population_city['Population']:,})")

## Unit Tests for Basic Data Analysis

It's good practice to test your data analysis steps to ensure correctness. Below are some example test cases.

In [None]:
import pandas as pd
import numpy as np

# Test data
test_city_data = {
    'City': ['A', 'B', 'C'],
    'Population': [100000, 200000, 50000],
    'State': ['X', 'Y', 'X']
}
test_df = pd.DataFrame(test_city_data)

print('--- Running Basic Data Analysis Unit Tests ---
')

# Test 1: Total Population
expected_total_pop = 350000
calculated_total_pop = test_df['Population'].sum()
assert calculated_total_pop == expected_total_pop, f'Test 1 Failed: Expected total {expected_total_pop}, got {calculated_total_pop}'
print('Test 1 Passed: Total Population.')

# Test 2: Average Population
expected_avg_pop = 350000 / 3 # Approx 116666.666
calculated_avg_pop = test_df['Population'].mean()
assert np.isclose(calculated_avg_pop, expected_avg_pop), f'Test 2 Failed: Expected avg {expected_avg_pop}, got {calculated_avg_pop}'
print('Test 2 Passed: Average Population.')

# Test 3: City with Max Population
expected_max_city_name = 'B'
calculated_max_city = test_df.loc[test_df['Population'].idxmax()]['City']
assert calculated_max_city == expected_max_city_name, f'Test 3 Failed: Expected max city {expected_max_city_name}, got {calculated_max_city}'
print('Test 3 Passed: City with Max Population.')

# Test 4: No Location Found - Removed out-of-scope calls from a previous mini-project
# test_locations.clear()
# results = find_test_location_by_coords(10.0, 20.0)
# assert len(results) == 0, 'Test Failed: Should not find non-existent location.'
print('Test Case 4 (No Location Found) logic removed as it referenced external variables not initialized here.')

print('
--- All Basic Data Analysis Unit Tests Passed! ---')

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Basic Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Basic Data Analysis mini-project
# import pandas as pd

# city_data_solution = {
#     'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
#     'Population': [8419000, 3980000, 2705000, 2320000, 1680000],
#     'State': ['NY', 'CA', 'IL', 'TX', 'AZ']
# }
# cities_df_solution = pd.DataFrame(city_data_solution)

# # Display the first few rows
# print('First few rows of the DataFrame:')
# print(cities_df_solution.head())

# # Print basic information
# print('
Basic information:')
# cities_df_solution.info()

# # Calculate and print statistics
# total_population_solution = cities_df_solution['Population'].sum()
# average_population_solution = cities_df_solution['Population'].mean()
# max_population_city_solution = cities_df_solution.loc[cities_df_solution['Population'].idxmax()]

# print(f'
Total Population: {total_population_solution:,}')
# print(f'Average Population: {average_population_solution:,.0f}')
# print(f"City with Maximum Population: {max_population_city_solution['City']} ({max_population_city_solution['Population']:,})")

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
import pandas as pd

# 1. Create a Pandas Series
data = [10, 20, 30, 40, 50]
index_labels = ['a', 'b', 'c', 'd', 'e']
s = pd.Series(data, index=index_labels)
print('Created Series:
', s)

# 2. Access the element with label 'c'
print('
Element with label 'c':', s['c'])

# 3. Add 5 to every element
s_plus_5 = s + 5
print('
Series after adding 5:
', s_plus_5)

#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
import pandas as pd

# 1. Create a DataFrame from a dictionary
data_dict = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago']}
df_from_dict = pd.DataFrame(data_dict)
print('DataFrame from dictionary:
', df_from_dict)

# 2. Create another DataFrame from a list of dictionaries
data_list_of_dicts = [
    {'Name': 'Diana', 'Age': 28, 'City': 'Houston'},
    {'Name': 'Eve', 'Age': 22, 'City': 'Miami'}
]
df_from_list_of_dicts = pd.DataFrame(data_list_of_dicts)
print('
DataFrame from list of dictionaries:
', df_from_list_of_dicts)


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis
**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.


**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.


In [None]:
import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_sales = pd.DataFrame(data)
print("Initial Sales DataFrame:
" + str(df_sales.head()))

# 2. Calculate Total Revenue (simplified as per instructions)
df_sales['Revenue'] = df_sales['Sales']
print("
DataFrame with Revenue:
" + str(df_sales.head()))

# 3. Group by Product
product_summary = df_sales.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_sales.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


In [None]:
import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_sales = pd.DataFrame(data)
print("Initial Sales DataFrame:
" + str(df_sales.head()))

# 2. Calculate Total Revenue (simplified as per instructions)
df_sales['Revenue'] = df_sales['Sales']
print("
DataFrame with Revenue:
" + str(df_sales.head()))

# 3. Group by Product
product_summary = df_sales.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_sales.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Unit Tests for Sales Data Analysis

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Sales Data Analysis. Run them and verify the output.

In [None]:
import pandas as pd
import numpy as np

# Helper function to run the analysis for testing
def run_sales_analysis(df_input):
    # Calculate Total Revenue
    df_input['Revenue'] = df_input['Sales']
    
    # Group by Product
    product_summary = df_input.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
    
    # Group by Region
    region_summary = df_input.groupby('Region')['Revenue'].sum().reset_index()
    
    # Find Top Selling Product (by Revenue)
    top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
    
    return df_input, product_summary, region_summary, top_product

# Test Cases
print('--- Running Sales Data Analysis Unit Tests ---')

# Test Case 1: Basic data
test_data_1 = {
    'Product': ['Laptop', 'Mouse', 'Laptop', 'Keyboard'],
    'Region': ['East', 'East', 'West', 'North'],
    'Sales': [1000, 100, 1200, 300],
    'Units Sold': [5, 10, 6, 3]
}
df_test_1 = pd.DataFrame(test_data_1)
df_res_1, prod_res_1, reg_res_1, top_res_1 = run_sales_analysis(df_test_1.copy())
assert prod_res_1[prod_res_1['Product'] == 'Laptop']['Revenue'].iloc[0] == 2200, 'Test 1 Failed: Laptop revenue incorrect'
assert reg_res_1[reg_res_1['Region'] == 'East']['Revenue'].iloc[0] == 1100, 'Test Case 1 Failed: East region revenue incorrect'
assert top_res_1['Product'] == 'Laptop', 'Test Case 1 Failed: Top product incorrect'
print('Test Case 1 Passed: Basic data analysis is correct.')

# Test Case 2: All same product
test_data_2 = {
    'Product': ['Monitor', 'Monitor', 'Monitor'],
    'Region': ['South', 'North', 'South'],
    'Sales': [500, 700, 600],
    'Units Sold': [2, 3, 2]
}
df_test_2 = pd.DataFrame(test_data_2)
df_res_2, prod_res_2, reg_res_2, top_res_2 = run_sales_analysis(df_test_2.copy())
assert prod_res_2.shape[0] == 1, 'Test Case 2 Failed: Product summary row count incorrect'
assert prod_res_2['Product'].iloc[0] == 'Monitor', 'Test Case 2 Failed: Product name incorrect'
assert top_res_2['Product'] == 'Monitor', 'Test Case 2 Failed: Top product incorrect'
print('Test Case 2 Passed: All same product analysis is correct.')

print('
All Unit Tests Completed.')


#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Navigational Links

[<-- Back to Course Overview](course_overview.ipynb)


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_sales = pd.DataFrame(data)
print("Initial Sales DataFrame:
" + str(df_sales.head()))

# 2. Calculate Total Revenue (simplified as per instructions)
df_sales['Revenue'] = df_sales['Sales']
print("
DataFrame with Revenue:
" + str(df_sales.head()))

# 3. Group by Product
product_summary = df_sales.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_sales.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Unit Tests for Sales Data Analysis

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Sales Data Analysis. Run them and verify the output.

## Unit Tests for Sales Data Analysis

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Sales Data Analysis. Run them and verify the output.

In [None]:
# Your Unit Tests for Sales Data Analysis here


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Unit Tests for Sales Data Analysis

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Sales Data Analysis. Run them and verify the output.

In [None]:
# Your Unit Tests for Sales Data Analysis here


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:\n" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("\nDataFrame with Revenue:\n" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("\nGrouped by Product:\n" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("\nGrouped by Region:\n" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("\nTop Selling Product (by Revenue):\n" + str(top_product))


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Unit Tests for Sales Data Analysis

It's good practice to test your code with various inputs to ensure it works correctly. Below are some example test cases for your Sales Data Analysis. Run them and verify the output.

In [None]:
# Your Unit Tests for Sales Data Analysis here


## Hints/Solution (Optional, Expand to View)

This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:\n" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("\nDataFrame with Revenue:\n" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("\nGrouped by Product:\n" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("\nGrouped by Region:\n" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("\nTop Selling Product (by Revenue):\n" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:\n" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("\nDataFrame with Revenue:\n" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("\nGrouped by Product:\n" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("\nGrouped by Region:\n" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("\nTop Selling Product (by Revenue):\n" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:\n" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("\nDataFrame with Revenue:\n" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("\nGrouped by Product:\n" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("\nGrouped by Region:\n" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("\nTop Selling Product (by Revenue):\n" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:\n" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("\nDataFrame with Revenue:\n" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("\nGrouped by Product:\n" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("\nGrouped by Region:\n" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("\nTop Selling Product (by Revenue):\n" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:\n" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("\nDataFrame with Revenue:\n" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("\nGrouped by Product:\n" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("\nGrouped by Region:\n" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("\nTop Selling Product (by Revenue):\n" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Interactive Lab: Introduction to Pandas

This section provides hands-on exercises to solidify your understanding of Pandas Series and DataFrames. Experiment with the code cells and modify them to test different scenarios.

#### Exercise 1: Pandas Series Basics

A Pandas Series is a one-dimensional labeled array capable of holding any data type. You can create a Series, access elements by index and label, and perform basic arithmetic operations.

**Try It Yourself:**
1. Create a Pandas Series from a list of numbers `[10, 20, 30, 40, 50]` and label its indices as `['a', 'b', 'c', 'd', 'e']`.
2. Access the element with label 'c'.
3. Add 5 to every element in the Series and print the result.

In [None]:
# Your code for Exercise 1 here


#### Exercise 2: DataFrame Creation

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It's like a spreadsheet or a SQL table, or a dictionary of Series objects.

**Try It Yourself:**
1. Create a DataFrame from a dictionary where keys are column names and values are lists (e.g., `{'Name': ['Alice', 'Bob'], 'Age': [25, 30]}`).
2. Create another DataFrame from a list of dictionaries (e.g., `[{'Name': 'Charlie', 'Age': 35}, {'Name': 'Diana', 'Age': 28}]`).
3. Print both DataFrames.

In [None]:
# Your code for Exercise 2 here


#### Exercise 3: Basic DataFrame Operations

DataFrames offer powerful ways to select and filter data.

**Try It Yourself:**
Using the first DataFrame you created in Exercise 2 (or create a similar one with at least 3 rows and 3 columns):
1. Select and print only the 'Name' column.
2. Filter and print rows where 'Age' is greater than 28.
3. Add a new column named 'City' with sample values (e.g., `['New York', 'Los Angeles', 'Chicago']`) and print the updated DataFrame.

In [None]:
# Your code for Exercise 3 here


## Mini-Project: Sales Data Analysis

**Task:** You are a junior data analyst. Your manager has provided you with a simulated dataset of product sales and wants you to perform a basic analysis using Pandas.

**Instructions:**
1.  **Create a DataFrame:** Create a Pandas DataFrame representing sales data. It should have at least the following columns:
    *   `Product` (e.g., 'Laptop', 'Mouse', 'Keyboard', 'Monitor')
    *   `Region` (e.g., 'East', 'West', 'North', 'South')
    *   `Sales` (random integer values between 100 and 1000)
    *   `Units Sold` (random integer values between 1 and 20)
    Ensure you have at least 10 rows of data.
2.  **Calculate Total Revenue:** Add a new column named `Revenue` to the DataFrame. Assume `Price per Unit = Sales / Units Sold` (handle potential division by zero by setting `Revenue` to `Sales` if `Units Sold` is 0 or NaN, or just use `Sales` as `Revenue` for simplicity).
3.  **Group by Product:** Calculate the total `Revenue` and `Units Sold` for each `Product`.
4.  **Group by Region:** Calculate the total `Revenue` for each `Region`.
5.  **Find Top Selling Product (by Revenue):** Identify which product generated the highest total revenue.

In [None]:
# Your Sales Data Analysis solution here


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:\n" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("\nDataFrame with Revenue:\n" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("\nGrouped by Product:\n" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("\nGrouped by Region:\n" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("\nTop Selling Product (by Revenue):\n" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))


## Hints/Solution (Optional, Expand to View)
This section contains a suggested implementation for the Sales Data Analysis mini-project. Review it if you get stuck or want to compare your approach.

In [None]:
# Suggested solution for Sales Data Analysis
# You can modify the previous code cell for your own solution.
# This is just one way to implement it.

import pandas as pd
import numpy as np

# 1. Create a DataFrame
np.random.seed(42) # for reproducibility
data = {
    'Product': np.random.choice(['Laptop', 'Mouse', 'Keyboard', 'Monitor'], size=15),
    'Region': np.random.choice(['East', 'West', 'North', 'South'], size=15),
    'Sales': np.random.randint(100, 1001, size=15),
    'Units Sold': np.random.randint(1, 21, size=15) # Units Sold will always be >= 1 to avoid division by zero
}
df_solution = pd.DataFrame(data)
print("Initial DataFrame:
" + str(df_solution.head()))

# 2. Calculate Total Revenue
# Simplified: Revenue is just Sales here as per instruction to simplify
df_solution['Revenue'] = df_solution['Sales']
print("
DataFrame with Revenue:
" + str(df_solution.head()))

# 3. Group by Product
product_summary = df_solution.groupby('Product').agg({'Revenue': 'sum', 'Units Sold': 'sum'}).reset_index()
print("
Grouped by Product:
" + str(product_summary))

# 4. Group by Region
region_summary = df_solution.groupby('Region')['Revenue'].sum().reset_index()
print("
Grouped by Region:
" + str(region_summary))

# 5. Find Top Selling Product (by Revenue)
top_product = product_summary.loc[product_summary['Revenue'].idxmax()]
print("
Top Selling Product (by Revenue):
" + str(top_product))
