# Pandas: Working with Structured Data in Python

This notebook accompanies the Pandas section (placed after NumPy). It introduces the core Pandas data structures and the most common operations with runnable examples.

**Conventions:**
- `import pandas as pd`
- A `Series` is a 1D labeled array
- A `DataFrame` is a 2D labeled table


## 1. Import Pandas

In [1]:
import pandas as pd

## 2. Series

A **Series** is a one-dimensional array with an associated index.

In [2]:
s = pd.Series([10, 20, 30, 40])
s

0    10
1    20
2    30
3    40
dtype: int64

In [3]:
s2 = pd.Series([10, 20, 30], index=["a", "b", "c"])
s2

a    10
b    20
c    30
dtype: int64

## 3. DataFrame

A **DataFrame** is a two-dimensional table consisting of rows and columns.

In [4]:
data = {
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "score": [85, 90, 95],
}

df = pd.DataFrame(data)
df

Unnamed: 0,name,age,score
0,Alice,25,85
1,Bob,30,90
2,Charlie,35,95


## 4. Inspecting a DataFrame

Before doing analysis, inspect the structure and data types.

In [5]:
df.head()

Unnamed: 0,name,age,score
0,Alice,25,85
1,Bob,30,90
2,Charlie,35,95


In [6]:
df.tail()

Unnamed: 0,name,age,score
0,Alice,25,85
1,Bob,30,90
2,Charlie,35,95


In [7]:
df.columns

Index(['name', 'age', 'score'], dtype='object')

In [8]:
df.shape

(3, 3)

In [9]:
df.dtypes

name     object
age       int64
score     int64
dtype: object

## 5. Reading Data from Files (CSV)

If you have a CSV file on disk, you can load it with `pd.read_csv`.

Uncomment and update the path to use this in your project.

In [10]:
# df_from_csv = pd.read_csv("data.csv")
# df_from_csv.head()

## 6. Selecting Columns

Select one column (returns a Series) or multiple columns (returns a DataFrame).

In [11]:
df["age"]

0    25
1    30
2    35
Name: age, dtype: int64

In [12]:
df[["name", "score"]]

Unnamed: 0,name,score
0,Alice,85
1,Bob,90
2,Charlie,95


## 7. Selecting Rows

Use `.iloc` for position-based selection and `.loc` for label-based selection.

In [13]:
df.iloc[0]

name     Alice
age         25
score       85
Name: 0, dtype: object

In [14]:
df.iloc[0:2]

Unnamed: 0,name,age,score
0,Alice,25,85
1,Bob,30,90


In [15]:
df.loc[0]

name     Alice
age         25
score       85
Name: 0, dtype: object

## 8. Filtering Rows by Conditions

Filtering is one of the most useful Pandas features.

In [16]:
df[df["age"] > 30]

Unnamed: 0,name,age,score
2,Charlie,35,95


In [17]:
df[(df["age"] > 25) & (df["score"] >= 90)]

Unnamed: 0,name,age,score
1,Bob,30,90
2,Charlie,35,95


## 9. Basic Statistics and Descriptive Analysis

In [18]:
df.mean(numeric_only=True)

age      30.0
score    90.0
dtype: float64

In [19]:
df.describe()

Unnamed: 0,age,score
count,3.0,3.0
mean,30.0,90.0
std,5.0,5.0
min,25.0,85.0
25%,27.5,87.5
50%,30.0,90.0
75%,32.5,92.5
max,35.0,95.0


In [20]:
df["score"].mean(), df["score"].max()

(np.float64(90.0), np.int64(95))

## 10. Handling Missing Data

Missing values are common in real datasets. Pandas represents missing values as `NaN` in many cases.

Here we create a small example with missing values and demonstrate typical operations.

In [21]:
df_missing = df.copy()
df_missing.loc[1, "score"] = None
df_missing

Unnamed: 0,name,age,score
0,Alice,25,85.0
1,Bob,30,
2,Charlie,35,95.0


In [22]:
df_missing.isna()

Unnamed: 0,name,age,score
0,False,False,False
1,False,False,True
2,False,False,False


In [23]:
df_missing.dropna()

Unnamed: 0,name,age,score
0,Alice,25,85.0
2,Charlie,35,95.0


In [24]:
df_missing.fillna(0)

Unnamed: 0,name,age,score
0,Alice,25,85.0
1,Bob,30,0.0
2,Charlie,35,95.0


## 11. Modifying Data

Create new columns and update values using `.loc`.

In [25]:
df2 = df.copy()
df2["passed"] = df2["score"] >= 60
df2

Unnamed: 0,name,age,score,passed
0,Alice,25,85,True
1,Bob,30,90,True
2,Charlie,35,95,True


In [26]:
df2.loc[df2["score"] < 60, "passed"] = False
df2

Unnamed: 0,name,age,score,passed
0,Alice,25,85,True
1,Bob,30,90,True
2,Charlie,35,95,True


## 12. Jupyter Tip: Verify Which Python Executable Runs This Notebook

This is helpful for confirming that your notebook is using the intended Conda environment.

In [27]:
import sys
sys.executable

'/Users/mehdimir/miniforge3/envs/jupyterbook/bin/python'

## Summary

- Use **Series** for 1D labeled data
- Use **DataFrame** for 2D tabular data
- Inspect with `head`, `shape`, `dtypes`
- Select with `[]`, `.iloc`, `.loc`
- Filter with boolean conditions
- Handle missing values with `isna`, `dropna`, `fillna`
