# Day 15 — Pandas Basics

---

## Objectives
- Understand what Pandas is and why it’s useful
- Learn about Pandas data structures: Series and DataFrame
- Create Series and DataFrames from lists, dictionaries, and CSV files
- Perform basic operations: indexing, selecting, slicing
- Explore simple data exploration functions

---

## 1. What is Pandas?

Pandas is a powerful Python library for data manipulation and analysis.  
It is built on top of NumPy and is designed to work with tabular data, such as spreadsheets or SQL tables.

---

## 2. Pandas Data Structures

### Series  
One-dimensional labeled array.

### DataFrame  
Two-dimensional labeled data structure with rows and columns.

---

# Code cells start here


In [None]:
import pandas as pd

# Creating a Series from a list
data = [10, 20, 30, 40]
s = pd.Series(data)
print(s)


### Pandas DataFrame

#### Two-dimensional labeled data structure

#### Like a spreadsheet or SQL table, with rows and columns

In [None]:
# Creating a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['NY', 'LA', 'Chicago']
}
df = pd.DataFrame(data)
print(df)


### 3️⃣ Loading Data from CSV

In [None]:
# Load dataset from CSV file (e.g., employee_data.csv)
df = pd.read_csv('datasets/employee_data.csv')
print(df.head())  # Preview first 5 rows


### 4️⃣ Basic Data Exploration

In [None]:
print(df.shape)        # Dimensions (rows, columns)
print(df.columns)      # Column names
print(df.info())       # Data types and non-null counts
print(df.describe())   # Summary statistics of numeric columns
print(df.head())       # First 5 rows
print(df.tail())       # Last 5 rows


### 5️⃣ Selecting Columns and Rows

In [None]:
# Select a single column (returns a Series)
ages = df['Age']
print(ages.head())

# Select multiple columns (returns a DataFrame)
subset = df[['Name', 'City']]
print(subset.head())

# Select rows by index label with .loc
print(df.loc[0])  # first row

# Select rows by position with .iloc
print(df.iloc[0:3])  # first 3 rows


### 6️⃣ Basic Filtering

In [None]:
# Filter rows where Age > 30
older_than_30 = df[df['Age'] > 30]
print(older_than_30)


### 7️⃣ Adding New Columns

In [None]:
# Add a new column based on existing data
df['Age_in_5_years'] = df['Age'] + 5
print(df.head())


### 8️⃣ Saving DataFrames

In [None]:
# Save modified DataFrame to CSV
df.to_csv('datasets/employee_data_modified.csv', index=False)
