# Pandas Power: Unlocking Data Analysis Skills
**A Step-by-Step Guide to Mastering Data with Python and Pandas**

## Learning Objectives
- Use the pandas library to manipulate and analyse data.
- Understand key pandas structures like Series and DataFrames.
- Apply ethical and effective strategies for using AI tools in data analysis.

## Introduction
Pandas makes it easy to work with data in Python. In this notebook, you'll load, clean, and analyse a sample datasetâ€”perfect preparation for real-world business scenarios.


## Using AI Tools Ethically & Effectively
- **Transparency:** Note if AI helped you write or understand your code.
- **Critical Evaluation:** Always double-check AI suggestions before using them.
- **Learning Partner:** Ask AI for help with explanations, not just answers.


## Key Concepts
- **Series**: A one-dimensional labeled array.
- **DataFrame**: A 2D table of labeled data.
- **Cleaning Data**: Removing missing values and outliers to improve accuracy.


## Activity 1: Exploring Data with Pandas

You are provided with a dataset named `sales_data.csv`. Let's load it and start exploring.


In [None]:
import pandas as pd

# Load the dataset
df = pd.read_csv('sales_data.csv')

# Show the first few rows
df.head()

In [None]:
# Total sales
total_sales = df['sales'].sum()
total_sales

In [None]:
# Average sales per item
avg_sales = df['sales'].mean()
avg_sales

## Activity 2: Data Cleanup and Analysis

Now let's clean the dataset by removing missing values and detecting outliers.


In [None]:
# Remove rows with missing values
df_cleaned = df.dropna()

# Describe cleaned data
df_cleaned.describe()

In [None]:
# Detect outliers using IQR
Q1 = df_cleaned['sales'].quantile(0.25)
Q3 = df_cleaned['sales'].quantile(0.75)
IQR = Q3 - Q1

# Keep only rows within IQR bounds
filtered_df = df_cleaned[(df_cleaned['sales'] >= Q1 - 1.5 * IQR) & (df_cleaned['sales'] <= Q3 + 1.5 * IQR)]
filtered_df.describe()

## Extension Task: Group and Analyse Sales by Month

Group the data by month and summarise total sales to see which months perform best.


In [None]:
# Assuming 'date' column exists
df['date'] = pd.to_datetime(df['date'])
df['Month'] = df['date'].dt.month

monthly_sales = df.groupby('Month')['sales'].sum()
monthly_sales

## Reflection

- What did you learn about cleaning and analysing data?
- How did removing outliers affect the summary statistics?
- If you used AI to help write code, how did you check its accuracy?
