# Lesson 10 Activity: Working with Pandas

## Learning Objectives

By the end of this activity, you will be able to:
- Create Pandas Series and DataFrames
- Load data from CSV files
- Perform basic data exploration and analysis
- Calculate descriptive statistics
- Filter and manipulate DataFrame data

## Tips

- **Creating DataFrames:** Use `pd.DataFrame(dictionary)` where dictionary keys become column names
- **Loading CSV files:** Use `pd.read_csv('filename.csv')`
- **Basic exploration:** Use `.head()`, `.tail()`, `.info()`, `.describe()`, and `.shape`
- **Filtering data:** Use conditions like `df[df['column'] > value]`
- **Column selection:** Use `df['column_name']` or `df[['col1', 'col2']]`
- **Adding columns:** Use `df['new_column'] = calculation`
- **Statistics:** Use `.mean()`, `.max()`, `.min()`, `.sum()` methods

**Remember:** Take your time with each step and test your code frequently!

In [10]:
import numpy as np
import pandas as pd


---
## Problem 1: Creating Your First DataFrame

**Scenario:** You're working at a bookstore and need to create a simple inventory system.

**Your Task:**
1. Create a DataFrame called `books_df` with the following data:
   - Book titles: ["Python Basics", "Data Science Handbook", "Web Development Guide"]
   - Authors: ["John Smith", "Jane Doe", "Mike Johnson"]
   - Prices: [29.99, 45.50, 35.00]
   - Stock: [15, 8, 12]

2. Display the DataFrame
3. Print the shape of the DataFrame
4. Display basic information about the DataFrame using `.info()`

In [3]:
# Step 1: Create the DataFrame
raw_data = {"titles": ["Python Basics", "Data Science Handbook", "Web Development Guide"],
            "authors": ["John Smith", "Jane Doe", "Mike Johnson"],
            "prices": [29.99, 45.50, 35.00],
            "stock": [15, 8, 12]}
books_df = pd.DataFrame(raw_data)

In [4]:
# Step 2: Display the DataFrame
print (books_df)

                  titles       authors  prices  stock
0          Python Basics    John Smith   29.99     15
1  Data Science Handbook      Jane Doe   45.50      8
2  Web Development Guide  Mike Johnson   35.00     12


In [5]:
# Step 3: Print the shape
print(books_df.shape)

(3, 4)


In [6]:
# Step 4: Display info
print(books_df.info)

<bound method DataFrame.info of                   titles       authors  prices  stock
0          Python Basics    John Smith   29.99     15
1  Data Science Handbook      Jane Doe   45.50      8
2  Web Development Guide  Mike Johnson   35.00     12>


---
## Problem 2: Loading and Exploring Student Data

**Scenario:** You're a teacher analyzing student performance data.

**Your Task:**
1. Load the `students.csv` file into a DataFrame called `students_df`
2. Display the first 3 rows using `.head()`
3. Display the last 2 rows using `.tail()`
4. Show descriptive statistics for numerical columns using `.describe()`
5. Find the average grade of all students

In [11]:
# Step 1: Load the CSV file
students_df = pd.read_csv("students.csv")

In [12]:
# Step 2: Display first 3 rows
print(students_df.head(3))

      name  age  grade  subject
0    Alice   20     85     Math
1      Bob   19     92  Science
2  Charlie   21     78     Math


In [13]:
# Step 3: Display last 2 rows
print(students_df.tail(2))

    name  age  grade  subject
6  Grace   20     90     Math
7  Henry   21     87  Science


In [14]:
# Step 4: Show descriptive statistics
print(students_df.describe)

<bound method NDFrame.describe of       name  age  grade  subject
0    Alice   20     85     Math
1      Bob   19     92  Science
2  Charlie   21     78     Math
3    Diana   20     88  Science
4      Eva   19     95     Math
5    Frank   22     82  Science
6    Grace   20     90     Math
7    Henry   21     87  Science>


In [15]:
# Step 5: Calculate average grade
print(students_df["grade"].mean())

87.125


---
## Problem 3: Data Filtering and Selection

**Scenario:** Continue working with the student data to find specific information.

**Your Task:**
1. Display only the 'name' and 'grade' columns from `students_df`
2. Find all students who scored above 85
3. Find all students studying 'Math'
4. Find the highest grade in the dataset
5. Count how many students are in each subject

In [18]:
# Step 1: Display only name and grade columns
columns = students_df[['name',"grade"]]
print(columns)

      name  grade
0    Alice     85
1      Bob     92
2  Charlie     78
3    Diana     88
4      Eva     95
5    Frank     82
6    Grace     90
7    Henry     87


In [19]:
# Step 2: Students with grades above 85
print(students_df[students_df["grade"] >= 85])


    name  age  grade  subject
0  Alice   20     85     Math
1    Bob   19     92  Science
3  Diana   20     88  Science
4    Eva   19     95     Math
6  Grace   20     90     Math
7  Henry   21     87  Science


In [20]:
# Step 3: Students studying Math
print(students_df[students_df["subject"] == "Math"])

      name  age  grade subject
0    Alice   20     85    Math
2  Charlie   21     78    Math
4      Eva   19     95    Math
6    Grace   20     90    Math


In [22]:
# Step 4: Highest grade
print(students_df["grade"].max())

95


In [38]:
# Step 5: Count students by subject
print(students_df["subject"].value_counts())


subject
Math       4
Science    4
Name: count, dtype: int64


---
## Problem 4: Sales Data Analysis

**Scenario:** You're analyzing sales data for an electronics store.

**Your Task:**
1. Load the `sales.csv` file into a DataFrame called `sales_df`
2. Calculate the total value for each product (price × quantity)
3. Add this as a new column called 'total_value' to the DataFrame
4. Find the product with the highest total value
5. Calculate the grand total of all sales

In [None]:
# Step 1: Load the sales data


In [None]:
# Step 2 & 3: Calculate total value and add as new column


In [None]:
# Step 4: Find product with highest total value


In [None]:
# Step 5: Calculate grand total of all sales


---
## Problem 5: Series Creation and Manipulation

**Scenario:** Create and work with Pandas Series for daily temperature data.

**Your Task:**
1. Create a Pandas Series called `temperatures` with the following data:
   - Values: [22, 25, 23, 26, 24, 27, 25]
   - Index: ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
2. Find the temperature for Wednesday
3. Find days with temperature above 24 degrees
4. Calculate the average temperature for the week
5. Find the day with the highest temperature

In [None]:
# Step 1: Create the temperature series


In [None]:
# Step 2: Temperature for Wednesday


In [None]:
# Step 3: Days with temperature above 24


In [None]:
# Step 4: Average temperature


In [None]:
# Step 5: Day with highest temperature


---
## Reflection Questions

Please answer these questions after completing the activity:

1. **What is the difference between a Pandas Series and a DataFrame?**
   
   *Your answer:*

2. **What are the advantages of using Pandas over working with plain Python lists and dictionaries?**
   
   *Your answer:*

3. **Describe a real-world scenario where you might use the filtering techniques you learned in Problem 3.**
   
   *Your answer:*

4. **What did you find most challenging about working with Pandas in this activity?**
   
   *Your answer:*