# Pandas DataFrame
A DataFrame is like a dictionary of Series — multiple columns with labels.

In [1]:
import pandas as pd, numpy as np

## Creating DataFrame

### from Python dictionary

In [2]:
data = {
"name": ["Ajay", "Khushi", "Rohit"],
"age": [15, 18, 16],
"city": ["Delhi", "Mumbai", "Bangalore"]
}
df = pd.DataFrame(data)
df

Unnamed: 0,name,age,city
0,Ajay,15,Delhi
1,Khushi,18,Mumbai
2,Rohit,16,Bangalore


### From Python Lists

In [3]:
data = [
["Ajay", 15],
["Khushi", 18],
["Rohit", 16]
]
df = pd.DataFrame(data)
df

Unnamed: 0,0,1
0,Ajay,15
1,Khushi,18
2,Rohit,16


### From NumPy Array

In [4]:
arr = np.array([[12, 15, 16], [13, 18, 21]])
df1 = pd.DataFrame(arr)
df1

Unnamed: 0,0,1,2
0,12,15,16
1,13,18,21


### From CSV File 

In [None]:
df = pd.read_csv("3.1_data.csv")
df

Unnamed: 0,name,city,gender,marks
0,Pooja,Mumbai,Female,96
1,Ankit,Mumbai,Female,93
2,Rahul,Pune,Female,92
3,Rahul,Delhi,Male,90
4,Priya,Mumbai,Female,85
5,Amit,Delhi,Male,81
6,Ankit,Delhi,Male,78
7,Neha,Delhi,Female,75
8,Sneha,Pune,Female,71


In real projects, DataFrames are usually created by reading
CSV, Excel, or database tables, which we will cover next.

## Index and Labels

Every Series and DataFrame has an Index — it helps with:

- Fast lookups
- Aligning data
- Merging & joining
- Time series operations

In [6]:
df.index # Row labels

RangeIndex(start=0, stop=9, step=1)

In [7]:
df.columns # Column label

Index(['name', 'city', 'gender', 'marks'], dtype='object')

### define Row and Column label

In [8]:
df1.index = [1, 2] # define Row labels
df1.columns = ["X", "Y", "Z"] # define Column label
df1

Unnamed: 0,X,Y,Z
1,12,15,16
2,13,18,21


## Basic inspection

### First 5 rows

In [9]:
df.head() 

Unnamed: 0,name,city,gender,marks
0,Pooja,Mumbai,Female,96
1,Ankit,Mumbai,Female,93
2,Rahul,Pune,Female,92
3,Rahul,Delhi,Male,90
4,Priya,Mumbai,Female,85


### Last 5 rows

In [10]:
df.tail() 

Unnamed: 0,name,city,gender,marks
4,Priya,Mumbai,Female,85
5,Amit,Delhi,Male,81
6,Ankit,Delhi,Male,78
7,Neha,Delhi,Female,75
8,Sneha,Pune,Female,71


### Column info: types, non-nulls

In [11]:
df.info() 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   name    9 non-null      object
 1   city    9 non-null      object
 2   gender  9 non-null      object
 3   marks   9 non-null      int64 
dtypes: int64(1), object(3)
memory usage: 420.0+ bytes


### Stats for numeric 
describe() summarizes only numerical columns by default.

In [12]:
df.describe() 

Unnamed: 0,marks
count,9.0
mean,84.555556
std,8.790778
min,71.0
25%,78.0
50%,85.0
75%,92.0
max,96.0


### List of column names

In [13]:
df.columns 

Index(['name', 'city', 'gender', 'marks'], dtype='object')

### Shape of DataFrame (rows, columns)

In [14]:
df.shape 

(9, 4)

## Summary
- Series = 1D array with labels
- DataFrame = 2D table with rows + columns
- Columns are Series objects
- We can create DataFrames from lists, dicts, arrays, files, web, and SQL
- Use .head() , .info() , .describe() to quickly explore any dataset