
---

# 📊 **Data Selection & Filtering in Pandas (Theory)**

## 🧾 Overview

Accessing the right rows and columns is a key step in data analysis. Pandas offers powerful and versatile tools to make data selection and filtering efficient and intuitive.

---

## 📌 **Column Selection**

* Retrieve one or more columns from a DataFrame.
* Selecting a **single column** returns a **Series**.
* Selecting **multiple columns** returns a **DataFrame**.

---

## 📌 **Row Selection**

Pandas provides two main indexing methods:

* **`.loc[]`** – Select data using **labels** (row/column names).
* **`.iloc[]`** – Select data by **position** (integer index).

You can use these to:

* Select a **specific row**.
* Retrieve a **particular value** by combining row and column selection.
* Extract **ranges** of rows and/or specific columns using slicing.

---

## ⚡ **Accessing Individual Elements Quickly**

For fast access to single elements:

* **`.at[]`** – Optimized for label-based access.
* **`.iat[]`** – Optimized for position-based access.

---

## 🎯 **Filtering Data with Conditions**

### ✅ **Basic Filtering**

* Filter rows using conditional expressions (e.g., `Age > 30`).

### ✅ **Combining Multiple Conditions**

* Combine conditions using:

  * `&` (AND)
  * `|` (OR)
* Always **enclose each condition in parentheses** to avoid errors.

---

## 🔍 **Using `.query()` for Filtering**

`.query()` allows SQL-like syntax for row filtering, improving readability.

### 📝 **Important Notes:**

1. Column names can be used directly as variables.
2. Use **quotes** for string values.
3. Wrap column names with **backticks** if they include spaces or special characters.
4. Use `@` to access external Python variables within the query.
5. Replace `&`, `|`, `~` with **`and`**, **`or`**, **`not`** inside queries.
6. Supports **chained comparisons** (e.g., `25 < age <= 40`).
7. Avoid Python **keywords** as column names; if needed, wrap them in backticks.
8. **Case sensitivity** applies to column names and string values.
9. `.query()` returns a **copy** of the data, not a direct view.

---

## 🧠 **Summary**

* Use selection methods like `df[col]`, `.loc[]`, `.iloc[]`, `.at[]`, and `.iat[]` to access data efficiently.
* Apply **conditions** directly or use `.query()` for a more expressive approach.
* Mastering these techniques streamlines all aspects of data manipulation with Pandas.

---


In [1]:
import pandas as pd
df = pd.read_csv("data2.csv")
df

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9


In [2]:
df["Actor"]


0         Shah Rukh Khan
1            Salman Khan
2             Aamir Khan
3          Ranbir Kapoor
4          Ranveer Singh
5     Ayushmann Khurrana
6          Rajkummar Rao
7         Hrithik Roshan
8           Akshay Kumar
9          Kartik Aaryan
10          Varun Dhawan
11         Vicky Kaushal
Name: Actor, dtype: object

In [3]:
type(df["Actor"])


pandas.core.series.Series

In [4]:
df[["Actor","IMDb"]]

Unnamed: 0,Actor,IMDb
0,Shah Rukh Khan,7.2
1,Salman Khan,6.0
2,Aamir Khan,8.4
3,Ranbir Kapoor,5.6
4,Ranveer Singh,7.0
5,Ayushmann Khurrana,8.3
6,Rajkummar Rao,7.5
7,Hrithik Roshan,6.5
8,Akshay Kumar,7.0
9,Kartik Aaryan,5.9


In [5]:
df.loc[9]  # First row (by label)

Actor                       Kartik Aaryan
Film                    Bhool Bhulaiyaa 2
Year                                 2022
Genre                       Horror Comedy
BoxOffice(INR Crore)                  266
IMDb                                  5.9
Name: 9, dtype: object

In [6]:
df.iloc[9]  # First row (by position)

Actor                       Kartik Aaryan
Film                    Bhool Bhulaiyaa 2
Year                                 2022
Genre                       Horror Comedy
BoxOffice(INR Crore)                  266
IMDb                                  5.9
Name: 9, dtype: object

In [7]:
df.loc[9,"Film"]

'Bhool Bhulaiyaa 2'

In [8]:
df.iloc[9,4]

266

In [9]:
df.loc[0:4,["Year","Film","Actor"]]

Unnamed: 0,Year,Film,Actor
0,2023,Pathaan,Shah Rukh Khan
1,2017,Tiger Zinda Hai,Salman Khan
2,2016,Dangal,Aamir Khan
3,2022,Brahmastra,Ranbir Kapoor
4,2018,Padmaavat,Ranveer Singh


In [10]:
df.iloc[0:4,0:3]

Unnamed: 0,Actor,Film,Year
0,Shah Rukh Khan,Pathaan,2023
1,Salman Khan,Tiger Zinda Hai,2017
2,Aamir Khan,Dangal,2016
3,Ranbir Kapoor,Brahmastra,2022


In [11]:
df.at[0,"Actor"]

'Shah Rukh Khan'

In [12]:
df.iat[0,1]

'Pathaan'

In [13]:
df

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9


In [14]:
df[df['IMDb']>6]

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
10,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2


In [16]:
df[(df['Year']>2019) & (df['Genre']=='Action')]

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2


In [17]:
df[(df['IMDb']>7)|(df['Year']>2018)]

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2


In [19]:
df.query("Year >2021")

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9


In [22]:
col = 'IMDb'
df.query(f"{col}>8")


Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2


In [25]:
df2 = df[['Actor','Film','Year']]
df2

Unnamed: 0,Actor,Film,Year
0,Shah Rukh Khan,Pathaan,2023
1,Salman Khan,Tiger Zinda Hai,2017
2,Aamir Khan,Dangal,2016
3,Ranbir Kapoor,Brahmastra,2022
4,Ranveer Singh,Padmaavat,2018
5,Ayushmann Khurrana,Andhadhun,2018
6,Rajkummar Rao,Stree,2018
7,Hrithik Roshan,War,2019
8,Akshay Kumar,Good Newwz,2019
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022


In [28]:
df2.loc[0,'Actor'] = 'Mudabbir'
df2

Unnamed: 0,Actor,Film,Year
0,Mudabbir,Pathaan,2023
1,Salman Khan,Tiger Zinda Hai,2017
2,Aamir Khan,Dangal,2016
3,Ranbir Kapoor,Brahmastra,2022
4,Ranveer Singh,Padmaavat,2018
5,Ayushmann Khurrana,Andhadhun,2018
6,Rajkummar Rao,Stree,2018
7,Hrithik Roshan,War,2019
8,Akshay Kumar,Good Newwz,2019
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022
