# Lesson 3.1: DataFrames & Series

## What is Pandas?

Pandas is Python's data manipulation library. If NumPy is like raw arrays, **Pandas is like Eloquent** - it gives you named columns, filtering, grouping, and more.

| Concept | Laravel | Pandas |
|---------|---------|--------|
| A table | DB table / Model | **DataFrame** |
| One column | `$users->pluck('name')` | **Series** |
| One row | `$user` (single model) | `df.iloc[0]` |
| All records | `User::all()` | `df` |

In [None]:
import pandas as pd
import numpy as np

## Creating a DataFrame from a Dictionary

The most common way - keys become column names, values become column data.

In [None]:
# Like creating records in a Laravel seeder
filters_data = {
    'filter_id': ['F001', 'F002', 'F003', 'F004', 'F005'],
    'tds_output': [42, 78, 120, 35, 95],
    'flow_rate': [2.1, 1.5, 0.8, 2.3, 1.2],
    'age_days': [60, 180, 320, 15, 240],
    'status': ['good', 'good', 'needs_repair', 'good', 'degraded']
}

df = pd.DataFrame(filters_data)
df  # In Jupyter, just type the variable name to see a nice table!

In [None]:
# Key attributes - like checking your database schema
print(f"Shape: {df.shape}")       # (5, 5) = 5 rows, 5 columns
print(f"Columns: {list(df.columns)}")
print(f"\nData types:")
print(df.dtypes)                   # Like checking column types in migration

In [None]:
# Quick overview methods
print("=== .info() - like DESCRIBE table in SQL ===")
df.info()

In [None]:
print("=== .describe() - summary statistics for numeric columns ===")
df.describe()

In [None]:
# Peeking at data
print("First 3 rows (like ->take(3)):")
print(df.head(3))

print("\nLast 2 rows:")
print(df.tail(2))

print("\nRandom sample:")
print(df.sample(2))

## Series - A Single Column

A Series is like `$users->pluck('name')` - one column of data with an index.

In [None]:
# Get a single column â†’ returns a Series
tds = df['tds_output']
print(type(tds))  # pandas.core.series.Series
print(tds)

# Series has all the NumPy goodies
print(f"\nMean TDS: {tds.mean():.1f}")
print(f"Max TDS: {tds.max()}")
print(f"Values above 80: {(tds > 80).sum()}")

## Exercise

1. Create a DataFrame representing 5 users (name, email, role, is_active)
2. Check its shape, dtypes, and use describe()
3. Extract just the 'name' column as a Series

In [None]:
# YOUR CODE HERE
# users = pd.DataFrame({...})