# Data Analysis with Pandas

In this section, we will learn how to master tabular data using `pandas`:
- **DataFrames**: Creating and manipulating powerful data tables.
- **Filtering & Selection**: Extracting exactly the data you need from large datasets.
- **Statistics**: Instantly calculating insights like averages, sums, and distributions.

Pandas is the most popular Python library for data analysis. Think of it as "Excel on steroids".
It allows you to work with structured data (tables) very efficiently.

## 1. Getting Started

First, we need to import pandas. It is usually imported as `pd`.

## Learning Objectives

- Build `DataFrame` objects from dictionaries or files and inspect their shape and summary stats.
- Select, filter, and transform columns/rows to answer data questions.
- Read and write common formats (CSV, JSON, Excel) with pandas.


In [None]:
import pandas as pd

## 2. Creating a DataFrame

The **DataFrame** is the heart of Pandas.
Think of it exactly like an Excel sheet or a SQL table:
- It has **Rows** (index) and **Columns** (labels).
- Each column can hold a different type of data (numbers, text, dates).

You can create one from scratch using a Python Dictionary, or load one from a file.

In [None]:
data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Age": [25, 30, 35, 40, 22],
    "Department": ["Legal", "IT", "Legal", "HR", "Economics"],
    "Salary": [60000, 70000, 80000, 55000, 50000]
}

df = pd.DataFrame(data)
df

## 3. Inspecting Data

When you load a new dataset, the first thing you should do is "meet" the data.
- **`.head()`**: Shows the first 5 rows. Great for a quick peek.
- **`.info()`**: Shows data types and missing values.
- **`.describe()`**: Shows summary statistics (mean, min, max) for numeric columns.
- **`.shape`**: Tells you how big the data is (rows, columns).

In [None]:
# Show the first 3 rows
df.head(3)

In [None]:
# Show summary statistics (count, mean, min, max, etc.)
df.describe()

## 4. Selecting Data

You often only want to look at specific parts of your table.
- **Column Selection**: `df["ColumnName"]` gives you just that one column.
- **Row Selection**:
    - **`.loc[]`**: Select by **Label** (e.g., "Row with ID 5").
    - **`.iloc[]`**: Select by **Position** (e.g., "The 5th row").

In [None]:
# Select a single column
print(df["Name"])

# Select multiple columns
print(df[["Name", "Department"]])

## 5. Filtering Data

Filtering is about asking questions. "Show me only the rows where..."
- `df[df["Age"] > 30]`: "Show me rows where Age is greater than 30."
- `df[df["Department"] == "IT"]`: "Show me rows where Department is IT."

You can combine conditions with `&` (AND) and `|` (OR).

In [None]:
# Find all employees in the Legal department
legal_team = df[df["Department"] == "Legal"]
legal_team

In [None]:
# Find employees with salary > 60000
high_earners = df[df["Salary"] > 60000]
high_earners

## 6. Reading and Writing Files

Pandas can read and write Excel, CSV, JSON, and many other formats.

```python
# Reading a CSV file
df = pd.read_csv("data.csv")

# Reading an Excel file
df = pd.read_excel("data.xlsx")

# Saving to Excel
df.to_excel("output.xlsx", index=False)
```

## Summary
- **Pandas** is essential for tabular data.
- **DataFrame** is the main object.
- You can easily **filter**, **sort**, and **analyze** data.
- It handles **Excel** and **CSV** files seamlessly.