In [43]:
Here is the cleaned-up and well-organized theoretical content (without code) about Data Selection & Filtering in Pandas for your GitHub documentation:

---

# üìä Data Selection & Filtering in Pandas (Theory)

## Overview

Selecting the appropriate rows and columns is the first step in analyzing any dataset. Pandas provides flexible and powerful methods for this purpose.

---

## üìå Selecting Columns

* You can retrieve single or multiple columns from a DataFrame.
* Single column selection returns a Series.
* Multiple column selection returns a DataFrame.

---

## üìå Selecting Rows

Pandas offers two primary methods:

* `.loc[]`: Label-based indexing.
* `.iloc[]`: Position-based indexing.

You can select:

* A single row.
* A specific value by combining row and column selection.
* A range of rows and/or specific columns via slicing.

---

## ‚ö° Fast Element Access

For accessing individual elements efficiently:

* `.at[]`: Label-based fast access.
* `.iat[]`: Position-based fast access.

---

## üéØ Filtering with Conditions

### ‚úÖ Basic Filtering

* You can filter rows based on a condition applied to a column (e.g., all rows where Age > 30).

### ‚úÖ Multiple Conditions

* Combine conditions using:

  * `&` (AND)
  * `|` (OR)
* Always use parentheses around individual conditions.

---

## üîç Querying with `.query()`

A more readable, SQL-like method for filtering rows using expressions in string format.

### üìù Key Rules:

1. Column names are treated as variables.
2. Use quotes around string values.
3. Use backticks for column names with spaces or special characters.
4. Use `@` to reference external Python variables inside the query.
5. Use logical operators: and, or, not (instead of &, |, \~).
6. Supports chained comparisons (e.g., 25 < age <= 40).
7. Avoid Python reserved keywords as column names; use backticks if necessary.
8. Case sensitivity matters for both column names and string values.
9. Returns a copy of the DataFrame ‚Äî not a view.

---

## üß† Summary

* Use indexing methods like `df[col]`, `.loc[]`, `.iloc[]`, `.at[]`, and `.iat[]` to access data.
* Filter using logical conditions or `.query()` for cleaner, readable syntax.
* Mastering these methods makes the rest of data manipulation in pandas easier and more efficient.

---

Let me know if you'd like this in Markdown format for direct use in your GitHub README or Jupyter Notebook.


SyntaxError: invalid syntax (3816958193.py, line 1)

In [2]:
import pandas as pd 

In [3]:
df = pd.read_csv("train_outliers_preprocessed.csv")

In [4]:
df

Unnamed: 0,age,sex,gender,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,52.0,Female,Female,120.0,125.0,0.0,0.0,152.0,0.0,0.0,1.0,2.0,2.0,1.0,0.0
1,53.0,0,Female,140.0,203.0,0.0,0.0,155.0,0.0,3.1,0.0,0.0,3.0,0.0,0.0
2,70.0,0,Female,145.0,174.0,0.0,1.0,125.0,0.0,2.6,0.0,0.0,3.0,0.0,0.0
3,61.0,0,Female,148.0,203.0,0.0,1.0,161.0,0.0,0.0,2.0,1.0,3.0,0.0,0.0
4,62.0,0,Female,138.0,294.0,0.0,1.0,152.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1020,59.0,1,Male,140.0,204.0,0.0,1.0,152.0,0.0,0.0,2.0,0.0,2.0,1.0,1.0
1021,60.0,0,Female,120.0,204.0,0.0,1.0,152.0,0.0,2.8,1.0,1.0,3.0,0.0,0.0
1022,47.0,0,Female,120.0,204.0,0.0,0.0,118.0,0.0,1.0,1.0,1.0,2.0,0.0,0.0
1023,50.0,0,Female,120.0,204.0,0.0,0.0,159.0,0.0,0.0,2.0,0.0,2.0,1.0,1.0


In [5]:
df['sex']

0       Female
1            0
2            0
3            0
4            0
         ...  
1020         1
1021         0
1022         0
1023         0
1024         0
Name: sex, Length: 1025, dtype: object

In [6]:
type(df['sex'])

pandas.core.series.Series

In [7]:
df[['sex','gender']]

Unnamed: 0,sex,gender
0,Female,Female
1,0,Female
2,0,Female
3,0,Female
4,0,Female
...,...,...
1020,1,Male
1021,0,Female
1022,0,Female
1023,0,Female


In [8]:
df.loc[1]

age           53.0
sex              0
gender      Female
cp           140.0
trestbps     203.0
chol           0.0
fbs            0.0
restecg      155.0
thalach        0.0
exang          3.1
oldpeak        0.0
slope          0.0
ca             3.0
thal           0.0
target         0.0
Name: 1, dtype: object

In [10]:
df.loc[1,'gender']

'Female'

In [12]:
df.loc[0:1,["gender","age"]]

Unnamed: 0,gender,age
0,Female,52.0
1,Female,53.0


In [18]:
df.at[0,'age']

52.0

In [19]:
df.iat[0,3]#3row and coloum

120.0

In [24]:
df[df['age']>53]['age']

2       70.0
3       61.0
4       62.0
5       58.0
6       58.0
        ... 
1015    58.0
1016    65.0
1020    59.0
1021    60.0
1024    54.0
Name: age, Length: 679, dtype: float64

In [31]:
df[(df['age']>53) & (df['gender'])]['age']

2       70.0
3       61.0
4       62.0
5       58.0
6       58.0
        ... 
1015    58.0
1016    65.0
1020    59.0
1021    60.0
1024    54.0
Name: age, Length: 679, dtype: float64

In [38]:
df.query("age > 54 and gender == 'Female'")[['age', 'gender']]


Unnamed: 0,age,gender
2,70.0,Female
3,61.0,Female
4,62.0,Female
5,58.0,Female
6,58.0,Female
...,...,...
1007,56.0,Female
1013,58.0,Female
1015,58.0,Female
1016,65.0,Female


In [39]:
col = "age"
df.query(f"{col} > 25")

Unnamed: 0,age,sex,gender,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,52.0,Female,Female,120.0,125.0,0.0,0.0,152.0,0.0,0.0,1.0,2.0,2.0,1.0,0.0
1,53.0,0,Female,140.0,203.0,0.0,0.0,155.0,0.0,3.1,0.0,0.0,3.0,0.0,0.0
2,70.0,0,Female,145.0,174.0,0.0,1.0,125.0,0.0,2.6,0.0,0.0,3.0,0.0,0.0
3,61.0,0,Female,148.0,203.0,0.0,1.0,161.0,0.0,0.0,2.0,1.0,3.0,0.0,0.0
4,62.0,0,Female,138.0,294.0,0.0,1.0,152.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1020,59.0,1,Male,140.0,204.0,0.0,1.0,152.0,0.0,0.0,2.0,0.0,2.0,1.0,1.0
1021,60.0,0,Female,120.0,204.0,0.0,1.0,152.0,0.0,2.8,1.0,1.0,3.0,0.0,0.0
1022,47.0,0,Female,120.0,204.0,0.0,0.0,118.0,0.0,1.0,1.0,1.0,2.0,0.0,0.0
1023,50.0,0,Female,120.0,204.0,0.0,0.0,159.0,0.0,0.0,2.0,0.0,2.0,1.0,1.0


In [44]:
df.query("gender == 'male'")

Unnamed: 0,age,sex,gender,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target


In [47]:
age_limit = 61
df.query("age > @age_limit")

Unnamed: 0,age,sex,gender,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
2,70.0,0,Female,145.0,174.0,0.0,1.0,125.0,0.0,2.6,0.0,0.0,3.0,0.0,0.0
4,62.0,0,Female,138.0,294.0,0.0,1.0,152.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0
10,71.0,0,Female,112.0,149.0,0.0,1.0,125.0,0.0,1.6,1.0,0.0,2.0,1.0,1.0
21,67.0,0,Female,106.0,223.0,0.0,1.0,142.0,0.0,0.3,2.0,2.0,2.0,1.0,1.0
23,63.0,2,Female,135.0,252.0,0.0,0.0,172.0,0.0,0.0,2.0,0.0,2.0,1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
989,71.0,1,Male,120.0,302.0,0.0,1.0,152.0,0.0,0.4,2.0,2.0,2.0,1.0,1.0
999,67.0,0,Female,120.0,254.0,0.0,1.0,152.0,0.0,0.2,1.0,2.0,3.0,0.0,0.0
1000,64.0,0,Female,120.0,212.0,0.0,0.0,152.0,0.0,2.0,1.0,2.0,1.0,0.0,0.0
1002,66.0,0,Female,120.0,212.0,0.0,0.0,152.0,0.0,0.1,2.0,1.0,2.0,0.0,0.0


In [51]:
col = "age"
df.query(f"{col} > 25")[['age'],['gender']])

[['age'], ['gender']]


In [52]:
df.query(f"{col} > 25")[['age', 'gender']]


Unnamed: 0,age,gender
0,52.0,Female
1,53.0,Female
2,70.0,Female
3,61.0,Female
4,62.0,Female
...,...,...
1020,59.0,Male
1021,60.0,Female
1022,47.0,Female
1023,50.0,Female


In [59]:
df.loc[df["age"] > 52, "gender"] = "male"


In [60]:
df

Unnamed: 0,age,sex,gender,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,52.0,Female,Female,120.0,125.0,0.0,0.0,152.0,0.0,0.0,1.0,2.0,2.0,1.0,0.0
1,53.0,0,male,140.0,203.0,0.0,0.0,155.0,0.0,3.1,0.0,0.0,3.0,0.0,0.0
2,70.0,0,male,145.0,174.0,0.0,1.0,125.0,0.0,2.6,0.0,0.0,3.0,0.0,0.0
3,61.0,0,male,148.0,203.0,0.0,1.0,161.0,0.0,0.0,2.0,1.0,3.0,0.0,0.0
4,62.0,0,male,138.0,294.0,0.0,1.0,152.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1020,59.0,1,male,140.0,204.0,0.0,1.0,152.0,0.0,0.0,2.0,0.0,2.0,1.0,1.0
1021,60.0,0,male,120.0,204.0,0.0,1.0,152.0,0.0,2.8,1.0,1.0,3.0,0.0,0.0
1022,47.0,0,Female,120.0,204.0,0.0,0.0,118.0,0.0,1.0,1.0,1.0,2.0,0.0,0.0
1023,50.0,0,Female,120.0,204.0,0.0,0.0,159.0,0.0,0.0,2.0,0.0,2.0,1.0,1.0


In [61]:
d2 = df.copy()


In [62]:
d2 = df[df["age"] > 54].copy()


In [64]:
df[['age', 'gender']]


Unnamed: 0,age,gender
0,52.0,Female
1,53.0,male
2,70.0,male
3,61.0,male
4,62.0,male
...,...,...
1020,59.0,male
1021,60.0,male
1022,47.0,Female
1023,50.0,Female


In [65]:
d2 = df[['age', 'gender']].copy()


In [68]:
d2

Unnamed: 0,age,gender
0,52.0,Female
1,53.0,male
2,70.0,male
3,61.0,male
4,62.0,male
...,...,...
1020,59.0,male
1021,60.0,male
1022,47.0,Female
1023,50.0,Female


In [69]:
d2["gender"] = "Male"


In [70]:
d2

Unnamed: 0,age,gender
0,52.0,Male
1,53.0,Male
2,70.0,Male
3,61.0,Male
4,62.0,Male
...,...,...
1020,59.0,Male
1021,60.0,Male
1022,47.0,Male
1023,50.0,Male


In [71]:
df

Unnamed: 0,age,sex,gender,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,52.0,Female,Female,120.0,125.0,0.0,0.0,152.0,0.0,0.0,1.0,2.0,2.0,1.0,0.0
1,53.0,0,male,140.0,203.0,0.0,0.0,155.0,0.0,3.1,0.0,0.0,3.0,0.0,0.0
2,70.0,0,male,145.0,174.0,0.0,1.0,125.0,0.0,2.6,0.0,0.0,3.0,0.0,0.0
3,61.0,0,male,148.0,203.0,0.0,1.0,161.0,0.0,0.0,2.0,1.0,3.0,0.0,0.0
4,62.0,0,male,138.0,294.0,0.0,1.0,152.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1020,59.0,1,male,140.0,204.0,0.0,1.0,152.0,0.0,0.0,2.0,0.0,2.0,1.0,1.0
1021,60.0,0,male,120.0,204.0,0.0,1.0,152.0,0.0,2.8,1.0,1.0,3.0,0.0,0.0
1022,47.0,0,Female,120.0,204.0,0.0,0.0,118.0,0.0,1.0,1.0,1.0,2.0,0.0,0.0
1023,50.0,0,Female,120.0,204.0,0.0,0.0,159.0,0.0,0.0,2.0,0.0,2.0,1.0,1.0
