# Introduction to Pandas DataFrame 

Imagine you're in a classroom. The Pandas DataFrame is like a class register (attendance book). In that register:

- The rows are like students in your class.
- The columns are like the subjects each student has. For example: one column for "Name", another for "Math marks", one for "Science marks", etc.
- The index is like the roll numbers assigned to each student.
So, the Pandas DataFrame is nothing but a table that helps you organize your data in rows and columns, exactly like how you maintain school or college records.

### What is a DataFrame?

A DataFrame is basically a 2D (two-dimensional) data structure in Pandas, meaning it has both rows and columns. It's one of the most important parts of Pandas, which is why we focus on it so much.

- Just like your class register has names and marks of students, a DataFrame holds labeled data.
- You can store different types of data in one DataFrame: numbers, strings (like names), dates, etc.

## Creating a Simple DataFrame:

Now, let's say you want to create a DataFrame for a class with details of 3 students, their marks in Math, and the city they are from. In Pandas, you can do this using Python, and it’s as simple as writing the names and marks in the code.

In [2]:
import pandas as pd  # First, import the Pandas library

# Now, let's create a dictionary to store the data like a class register
data = {
    'Name': ['Rahul', 'Priya', 'Siddharth'],  # These are the students' names
    'Math Marks': [85, 92, 78],               # Their respective Math marks
    'City': ['Delhi', 'Mumbai', 'Bangalore']  # The city they belong to
}

# Let's convert this dictionary into a DataFrame (class register form)
df = pd.DataFrame(data)

# Now, display the DataFrame (class register)
print(df)

        Name  Math Marks       City
0      Rahul          85      Delhi
1      Priya          92     Mumbai
2  Siddharth          78  Bangalore


- We created a dictionary first, where the keys ('Name', 'Math Marks', 'City') are the columns in our DataFrame.
- The values (the list of names, marks, and cities) are like the rows in our DataFrame.
- d.DataFrame() is the function that takes this data and converts it into a DataFrame structure, just like organizing your school records into a table.

 # Important DataFrame Characteristics

When you create a DataFrame, it automatically comes with some useful properties that help us understand the data better. Just like how you check:

- How many students are there (the size of the register),
- What subjects are being recorded (the columns),
- What each student scored (the actual data inside).

 1. `.shape` – Size of the DataFrame

This tells you how many rows and columns are there in your DataFrame, just like counting the number of students and subjects.

In [3]:
df.shape

(3, 3)

2. `.columns` – Column Names
This will list all the column names, like asking, "What subjects are recorded in this class register?"

In [4]:
df.columns

Index(['Name', 'Math Marks', 'City'], dtype='object')

3. `.head()` and `.tail()` – Viewing the Data
- .head() shows the first 5 rows of the DataFrame (useful to check the beginning of the data).
- .tail() shows the last 5 rows (useful to check the end of the data).

In [5]:
df.head()

Unnamed: 0,Name,Math Marks,City
0,Rahul,85,Delhi
1,Priya,92,Mumbai
2,Siddharth,78,Bangalore


# Why is this useful?
Pandas DataFrames make it super easy to manage large amounts of data. Whether you're a teacher managing class records, a shopkeeper managing stock details, or an engineer analyzing a large dataset, DataFrames simplify the work by organizing everything in a structured way.

## 1. Accessing Columns
Columns in a DataFrame are like the subjects in your class register. You can access them in multiple ways.

> 1.1 Access a Single Column
You can access a specific column by using its name.

In [6]:
df['Name']

0        Rahul
1        Priya
2    Siddharth
Name: Name, dtype: object

Example: Extract only the students' names (like finding all names in the register).

> 1.2 Access Multiple Columns
If you want to access more than one column, you can pass a list of column names.

In [7]:
df[['Name', 'Math Marks']] 

Unnamed: 0,Name,Math Marks
0,Rahul,85
1,Priya,92
2,Siddharth,78


Example: Extract names and their marks in Math.

## 2. Accessing Rows
Rows are like the students in your class register. There are different methods to get specific rows or ranges of rows.

> 2.1 Access Rows by Index (Using .loc[])
`.loc[]` is used to access rows by label/index.

In [8]:
df.loc[0] 

Name          Rahul
Math Marks       85
City          Delhi
Name: 0, dtype: object

Example: Get the first student’s details.

> 2.2 Access Rows by Position (Using .iloc[])
`.iloc[]` is used to access rows by position.

In [9]:
df.iloc[0] 

Name          Rahul
Math Marks       85
City          Delhi
Name: 0, dtype: object

Example: Similar to roll numbers – get the student in the 1st position in the register.

> 2.3 Access Multiple Rows (Using `.loc[]`)
You can use `.loc[]` to get multiple rows at once by providing a range of labels.

In [10]:
df.loc[0:2]

Unnamed: 0,Name,Math Marks,City
0,Rahul,85,Delhi
1,Priya,92,Mumbai
2,Siddharth,78,Bangalore


Example: Get the details of the first 3 students.

> 2.4 Access Multiple Rows by Position (Using `.iloc[]`)
You can also access multiple rows using `.iloc[]` by providing range of positions.

In [11]:
df.iloc[0:2]  

Unnamed: 0,Name,Math Marks,City
0,Rahul,85,Delhi
1,Priya,92,Mumbai


Example: Get the first two students’ details by their position.

## 3. Accessing Specific Elements
> 3.1 Access a Specific Element Using `.at[]`
`.at[]` is used to access a single scalar value by providing label-based indexing.

In [12]:
df.at[0, 'Name'] 

'Rahul'

Example: Get Rahul's name by specifying his row and column.

> 3.2 Access a Specific Element Using `.iat[]`
`.iat[]` is used to access a single scalar value by position-based indexing.

In [13]:
df.iat[0, 1] 

85

Example: Get Rahul’s Math marks by specifying the row and column positions.

## 4. Conditional Access
You can access rows and columns based on certain conditions, just like filtering data.

> 4.1 Access Rows Based on a Condition
This is like checking which students have marks greater than 80 in Math.

In [14]:
df[df['Math Marks'] > 80]

Unnamed: 0,Name,Math Marks,City
0,Rahul,85,Delhi
1,Priya,92,Mumbai


Example: Find out which students scored more than 80 marks.

> 4.2 Access Specific Columns Based on a Condition
You can access only specific columns of the rows that meet a condition.

In [15]:
df.loc[df['Math Marks'] > 80, 'Name'] 

0    Rahul
1    Priya
Name: Name, dtype: object

Example: Get the names of students who scored more than 80.

## 5. Accessing Slices
Just like how you might want to check a part of your register, Pandas allows you to slice your data.

> 5.1 Slice DataFrame Rows
Slice based on row positions using .iloc[].

In [16]:
df.iloc[0:2]

Unnamed: 0,Name,Math Marks,City
0,Rahul,85,Delhi
1,Priya,92,Mumbai


Example: Get a range of students' details.

> 5.2 Slice DataFrame Columns
Slice based on columns.

In [17]:
df.iloc[:, 1:3]

Unnamed: 0,Math Marks,City
0,85,Delhi
1,92,Mumbai
2,78,Bangalore


Example: Get all rows but only columns from 1st to 3rd.

> 5.3 Slice Rows and Columns Together

In [18]:
df.iloc[0:2, 1:3] 

Unnamed: 0,Math Marks,City
0,85,Delhi
1,92,Mumbai


Example: Get the Math marks and city of the first two students.

## 6. Boolean Indexing
You can use True/False values to filter data.

> 6.1 Boolean Indexing

In [19]:
df[df['Math Marks'] > 80] 

Unnamed: 0,Name,Math Marks,City
0,Rahul,85,Delhi
1,Priya,92,Mumbai


Example: Get only the students who scored more than 80.

## 7. Accessing Random Samples
Sometimes you want a random sample of rows or columns.

> 7.1 Get a Random Row Using .sample()

In [21]:
df.sample()

Unnamed: 0,Name,Math Marks,City
2,Siddharth,78,Bangalore


Example: Pick a random student’s record.

> 7.2 Get Multiple Random Rows

In [22]:
df.sample(n=2)

Unnamed: 0,Name,Math Marks,City
2,Siddharth,78,Bangalore
0,Rahul,85,Delhi


Example: Randomly select 2 students from the class.

## 8. Accessing Top/Bottom Rows
You can access the first or last few rows, just like looking at the beginning or end of a register.

> 8.1 Access the First Rows Using `.head()`

In [23]:
df.head(2)

Unnamed: 0,Name,Math Marks,City
0,Rahul,85,Delhi
1,Priya,92,Mumbai


Example: Get the first 2 students in the register.

8.2 Access the Last Rows Using `.tail()`

In [24]:
df.tail(2)

Unnamed: 0,Name,Math Marks,City
1,Priya,92,Mumbai
2,Siddharth,78,Bangalore


Example: Get the last 2 students in the register.

## 9. Access Using `.iloc[]` and `.loc[]`
`.iloc[]` (position-based) and `.loc[]` (label-based) are used for advanced indexing and slicing.
> 9.1 Accessing Rows and Columns by Index with .iloc[]

In [25]:
df.iloc[0:2, 0:2] 

Unnamed: 0,Name,Math Marks
0,Rahul,85
1,Priya,92


Example: Get a portion of the data based on positions.

> 9.2 Accessing Rows and Columns by Label with `.loc[]`

In [26]:
df.loc[0:2, ['Name', 'City']] 

Unnamed: 0,Name,City
0,Rahul,Delhi
1,Priya,Mumbai
2,Siddharth,Bangalore


Example: Get the first 3 students’ names and cities.

## 10. Accessing DataFrame Attributes
You can access metadata about your DataFrame.

> 10.1 Get Column Names Using .columns

In [27]:
df.columns 

Index(['Name', 'Math Marks', 'City'], dtype='object')

Example: Check which subjects (columns) are being recorded.

> 10.2 Get Row Index Using `.index`

In [28]:
df.index 

RangeIndex(start=0, stop=3, step=1)

Example: Check the roll numbers of students.

## 11. Transposing Data
You can swap rows and columns, just like flipping your register sideways.

> 11.1 Transpose Using .T

In [29]:
df.T

Unnamed: 0,0,1,2
Name,Rahul,Priya,Siddharth
Math Marks,85,92,78
City,Delhi,Mumbai,Bangalore


Example: Convert rows into columns and vice versa.

## 12. Accessing Data Types
You can check the data type of each column.

> 12.1 Access Data Types Using .dtypes

In [30]:
df.dtypes

Name          object
Math Marks     int64
City          object
dtype: object

Example: Check if columns are integers, strings, etc.

## 13. Accessing Info About DataFrame
You can get a summary of your DataFrame, like a quick report of your class register.

> 13.1 Get Info Using .info()

In [31]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Name        3 non-null      object
 1   Math Marks  3 non-null      int64 
 2   City        3 non-null      object
dtypes: int64(1), object(2)
memory usage: 204.0+ bytes


Example: Get a summary of your student data (how many rows, data types, etc.).

## 14. Accessing Statistical Summary
You can get basic statistics for numerical columns.

> 14.1 Get Statistical Summary Using .describe()

In [32]:
df.describe()

Unnamed: 0,Math Marks
count,3.0
mean,85.0
std,7.0
min,78.0
25%,81.5
50%,85.0
75%,88.5
max,92.0


Example: Find the average, minimum, and maximum marks of students.

## 15. Access DataFrame Memory Usage
Check how much memory your DataFrame is using.

> 15.1 Memory Usage Using .memory_usage()

In [33]:
df.memory_usage()

Index         132
Name           24
Math Marks     24
City           24
dtype: int64