# üêº Pandas - Class 2: DataFrame Basics
Welcome to **Class 2**. Today we‚Äôll learn how to **inspect**, **select**, and **organize** data inside a DataFrame.

## 1 Inspecting Data
Quick ways to understand your dataset:
- `head(n)` ‚Üí first *n* rows (default 5)
- `tail(n)` ‚Üí last *n* rows
- `info()` ‚Üí columns, non-null counts, and dtypes
- `shape` ‚Üí (rows, columns)
- `dtypes` ‚Üí data type of each column

In [2]:
import pandas as pd

# Create a small DataFrame
a = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Emma"],
    "Age": [25, 30, 35, 40, 22],
    "Score": [88, 92, 79, 85, 95]
}

b = pd.DataFrame(a)
b
b.head(2)
b.tail(2)
b.info()
b.shape
b.dtypes

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    5 non-null      object
 1   Age     5 non-null      int64 
 2   Score   5 non-null      int64 
dtypes: int64(2), object(1)
memory usage: 252.0+ bytes


Name     object
Age       int64
Score     int64
dtype: object

## 2 Selecting Columns & Rows
Ways to access parts of a DataFrame:
- Single column ‚Üí `df['col']`
- Multiple columns ‚Üí `df[['col1','col2']]`
- Label-based selection ‚Üí `df.loc[row_label, col_label]`
- Integer-index selection ‚Üí `df.iloc[row_idx, col_idx]`
Tip: `:` means "all rows/columns" in that dimension.

In [3]:
print(b)
# b['Name']   #Single column
# b[['Name','Age']] # Multiple columns
b.loc[0:3,["Name","Score"]]   #Label-based selection

      Name  Age  Score
0    Alice   25     88
1      Bob   30     92
2  Charlie   35     79
3    David   40     85
4     Emma   22     95


Unnamed: 0,Name,Score
0,Alice,88
1,Bob,92
2,Charlie,79
3,David,85


## 3 Index & Columns Overview
- `df.index` ‚Üí row labels
- `df.columns` ‚Üí column names
- `df.values` ‚Üí underlying NumPy array (read-only; prefer using DataFrame ops)

In [4]:
# b.index  # row labels
# b.columns  # columns labels
b.values  # underlying NumPy array

array([['Alice', 25, 88],
       ['Bob', 30, 92],
       ['Charlie', 35, 79],
       ['David', 40, 85],
       ['Emma', 22, 95]], dtype=object)

## 4 Renaming Columns & Setting an Index
- Rename columns ‚Üí `df.rename(columns={'old':'new'}, inplace=False)`
- Set a column as index ‚Üí `df.set_index('column_name', inplace=False)`
- Reset index ‚Üí `df.reset_index(inplace=False)`
 If you don't use `inplace=True`, remember to assign the result back to a variable.

In [5]:
c = b.rename(columns={"Score":"Marks"}) #Rename columns
c


Unnamed: 0,Name,Age,Marks
0,Alice,25,88
1,Bob,30,92
2,Charlie,35,79
3,David,40,85
4,Emma,22,95


## 5 Basic Attributes Recap
- `df.columns` ‚Üí names of all columns
- `df.index` ‚Üí row labels
- `df.values` ‚Üí NumPy array (data only)
These are handy for quick checks and loops (though vectorized ops are preferred).

In [6]:
# b.columns
# b.index
b.values

array([['Alice', 25, 88],
       ['Bob', 30, 92],
       ['Charlie', 35, 79],
       ['David', 40, 85],
       ['Emma', 22, 95]], dtype=object)

## Mini Practice
1. Create a DataFrame with at least 5 rows and 4 columns (mix of numeric + text).
2. Show the first 3 rows, last 2 rows, and the info.
3. Select a sub-DataFrame with only 2 columns.
4. Rename one column and set any column as index, then reset it back.
5. Print `df.shape`, `df.dtypes`, and `df.columns`.

In [7]:
import pandas as pd

# Create a DataFrame with at least 5 rows & 4 columns (mix of numbers + text)
data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Emma"],
    "Age": [25, 30, 35, 40, 22],
    "City": ["Delhi", "Mumbai", "Pune", "Bangalore", "Chennai"],
    "Score": [85, 91, 78, 88, 95]
}

df = pd.DataFrame(data)

# Show the first 3 rows, last 2 rows, and info
# your code here


# Select a sub-DataFrame with only 2 columns
# your code here

# Rename one column and set any column as index, then reset it back
# your code here

# Print df.shape, df.dtypes, and df.columns
# your code here


---
## ‚úÖ Summary
- You learned to **inspect** a DataFrame (`head`, `tail`, `info`, `shape`, `dtypes`).
- You practiced **selecting** columns/rows using `[]`, `.loc`, and `.iloc`.
- You explored **index/columns** and basic attributes.
- You **renamed** columns and **set/reset** an index.

Next up: **Data Cleaning Essentials** (missing values, types, duplicates).

In [8]:

# Show the first 3 rows, last 2 rows, and info
df.head(3)

Unnamed: 0,Name,Age,City,Score
0,Alice,25,Delhi,85
1,Bob,30,Mumbai,91
2,Charlie,35,Pune,78


In [9]:
# Select a sub-DataFrame with only 2 columns
sub_df = df[['Name',"City"]]
sub_df

Unnamed: 0,Name,City
0,Alice,Delhi
1,Bob,Mumbai
2,Charlie,Pune
3,David,Bangalore
4,Emma,Chennai
