# Pandas Data Selection, Indexing, and Assignment

## 1. Selecting Columns

You can select columns using **bracket** notation (recommended for all cases) or **dot** notation (only for simple column names without spaces/special characters)

**Syntax:**  
df['colname'] # Always works  
df.colname # Only if colname is a valid Python identifier

**Example:**

In [1]:
import seaborn as sns
import pandas as pd
#sns.get_dataset_names() #- This will print a list of dataset names you can use

ds= sns.load_dataset('titanic')
df= pd.DataFrame(ds)
#display(df)
df['embark_town']           #df['colname']

0      Southampton
1        Cherbourg
2      Southampton
3      Southampton
4      Southampton
          ...     
886    Southampton
887    Southampton
888    Southampton
889      Cherbourg
890     Queenstown
Name: embark_town, Length: 891, dtype: object

In [2]:
df.sex      #df.colname

0        male
1      female
2      female
3      female
4        male
        ...  
886      male
887    female
888    female
889      male
890      male
Name: sex, Length: 891, dtype: object

## 2. Accessing Specific Values

For efficient access to single values, use `.at[]` (label-based) or `.iat[]` (integer position). You can also use `.loc[]` and `.iloc[]` for more general selection.

**Syntax:**  
By label  
value = df.at[row_label, 'colname']  

By integer position  
value = df.iat[row_index, col_index]  


**Example:**

In [3]:
df.at[1, 'sex'] # Value at row label 1, column 'sex'

'female'

In [4]:
df.iat[1,2] # Value at 1st row, 2nd column (0-based)

'female'

## 3. Index-Based Selection with `iloc`

Use `iloc` for integer-location based indexing (row, column order).

**Syntax:**  
df.iloc[row_start:row_end, col_start:col_end]

**Examples:**

In [5]:
df.iloc # First row
df.iloc[:, 0] # All rows, first column
df.iloc[:3, 0] # First three rows, first column
df.iloc[1:4, 2:5] # Rows 1-3, columns 2-4
df.iloc[-5:] # Last five rows

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
886,0,2,male,27.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.45,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.0,0,0,30.0,C,First,man,True,C,Cherbourg,yes,True
890,0,3,male,32.0,0,0,7.75,Q,Third,man,True,,Queenstown,no,True


## 4. Label-Based Selection with `loc`

Use `loc` for label-based indexing (row label, column label).

**Syntax:**  
df.loc[row_label, col_label]

**Examples:**

In [6]:
df.loc[10, 'embark_town'] # Value at row label 10, column 'country'
df.loc[5:10, ['sex', 'embark_town']] # Rows 5 to 10, specific columns

Unnamed: 0,sex,embark_town
5,male,Queenstown
6,male,Southampton
7,male,Southampton
8,female,Southampton
9,female,Cherbourg
10,female,Southampton


**Note:**  
- `iloc[0:10]` selects rows 0–9 (Python-style, exclusive).
- `loc[0:10]` selects rows 0–10 (inclusive).

## 5. Setting and Resetting Index

Set a column as the index for easier selection, then reset if needed.

**Syntax:**  
df = df.set_index('colname')  
df = df.reset_index()  

**Example:**

In [7]:
df.set_index('survived')

Unnamed: 0_level_0,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
survived,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False
1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


In [8]:
df.reset_index()

Unnamed: 0,index,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
888,888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False
889,889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


## 6. Conditional Selection (Boolean Indexing)

Filter rows based on conditions using boolean masks.

**Syntax:**  
mask = df['colname'] == value  
filtered_df = df.loc[mask]  

**Examples:**

In [9]:
#df.loc[df['sex'] == 'female']
df.loc[df['age'] >= 18]

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
885,0,3,female,39.0,0,5,29.1250,Q,Third,woman,False,,Queenstown,no,False
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


## 7. Combining Multiple Conditions

Use `&` for AND, `|` for OR, and `~` for NOT. Always wrap each condition in parentheses.

**Examples:**

In [10]:
#df.loc[(df['sex'] == 'female') & (df['age'] >= 18)] # AND
df.loc[(df['sex'] == 'male') | (df['embark_town'] == 'Cherbourg')] # OR
#df.loc[~(df['embark_town'] == 'Cherbourg')] # NOT

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
5,0,3,male,,0,0,8.4583,Q,Third,man,True,,Queenstown,no,True
6,0,1,male,54.0,0,0,51.8625,S,First,man,True,E,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
883,0,2,male,28.0,0,0,10.5000,S,Second,man,True,,Southampton,no,True
884,0,3,male,25.0,0,0,7.0500,S,Third,man,True,,Southampton,no,True
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


## 8. Using `isin` for Multiple Values

Check if column values are in a list, set, or other iterable.

**Syntax:**  
mask = df['colname'].isin([value1, value2])  
filtered_df = df.loc[mask]  

**Example:**

In [11]:
df.loc[df['embark_town'].isin(['Cherbourg', 'Southampton'])]

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
884,0,3,male,25.0,0,0,7.0500,S,Third,man,True,,Southampton,no,True
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False


## 9. Handling Missing Data

Use `isnull()` and `notnull()` to filter missing or non-missing values.

**Examples:**

In [12]:
df.loc[df['age'].isnull()]

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
5,0,3,male,,0,0,8.4583,Q,Third,man,True,,Queenstown,no,True
17,1,2,male,,0,0,13.0000,S,Second,man,True,,Southampton,yes,True
19,1,3,female,,0,0,7.2250,C,Third,woman,False,,Cherbourg,yes,True
26,0,3,male,,0,0,7.2250,C,Third,man,True,,Cherbourg,no,True
28,1,3,female,,0,0,7.8792,Q,Third,woman,False,,Queenstown,yes,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
859,0,3,male,,0,0,7.2292,C,Third,man,True,,Cherbourg,no,True
863,0,3,female,,8,2,69.5500,S,Third,woman,False,,Southampton,no,False
868,0,3,male,,0,0,9.5000,S,Third,man,True,,Southampton,no,True
878,0,3,male,,0,0,7.8958,S,Third,man,True,,Southampton,no,True


In [13]:
df.loc[df['age'].notnull()]  #177 rows + 714 rows = 891 total

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
885,0,3,female,39.0,0,5,29.1250,Q,Third,woman,False,,Queenstown,no,False
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


## 10. Assigning Values

Assign values to entire columns or create new columns based on conditions.

**Syntax:**  
df['colname'] = value  
df.loc[condition, 'colname'] = value  

**Examples:**

In [14]:
df['Male_Under18'] = 'default_value'
display(df)
#Delete a column - 

df.drop(columns=['Male_Under18'], inplace=True)
display(df)

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone,Male_Under18
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False,default_value
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False,default_value
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True,default_value
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False,default_value
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True,default_value
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True,default_value
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True,default_value
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False,default_value
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True,default_value


Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


#### Delete a single column by name (returns a new DataFrame)
df = df.drop('column_name', axis=1)

#### Delete multiple columns by name
df = df.drop(['col1', 'col2'], axis=1)

#### Delete a column in-place (modifies the original DataFrame)
df.drop('column_name', axis=1, inplace=True)  

Key Points:  
- Use axis=1 (or axis='columns') to specify columns.  
- By default, drop() does not change the original DataFrame unless inplace=True is used.  
- You can also use the columns parameter:  

In [15]:
df.loc[df['age'] <= 18, 'under_18'] = 'Young'
df.loc[df['under_18'] == 'Young']


Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone,under_18
7,0,3,male,2.0,3,1,21.0750,S,Third,child,False,,Southampton,no,False,Young
9,1,2,female,14.0,1,0,30.0708,C,Second,child,False,,Cherbourg,yes,False,Young
10,1,3,female,4.0,1,1,16.7000,S,Third,child,False,G,Southampton,yes,False,Young
14,0,3,female,14.0,0,0,7.8542,S,Third,child,False,,Southampton,no,True,Young
16,0,3,male,2.0,4,1,29.1250,Q,Third,child,False,,Queenstown,no,False,Young
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
852,0,3,female,9.0,1,1,15.2458,C,Third,child,False,,Cherbourg,no,False,Young
853,1,1,female,16.0,0,1,39.4000,S,First,woman,False,D,Southampton,yes,False,Young
855,1,3,female,18.0,0,1,9.3500,S,Third,woman,False,,Southampton,yes,False,Young
869,1,3,male,4.0,1,1,11.1333,S,Third,child,False,,Southampton,yes,False,Young


**Notes:**
- Use `axis=1` or `axis='columns'` to specify you are dropping columns (not rows)[1][2][3][4][6][8].
- The `drop()` method can remove one or multiple columns.
- `del` and `pop()` can only remove one column at a time and always modify the DataFrame in place.
- If you want to keep the original DataFrame unchanged, do not use `inplace=True` or use assignment (`df = df.drop(...)`).

### Create a deep copy (recommended for most use cases)
df_copy = df.copy()

### Create a shallow copy (shares data with original, use with caution)
df_shallow = df.copy(deep=False)
By default, deep=True, so df.copy() creates a new DataFrame with its own data and indices. Changes to df_copy will NOT affect df and vice versa.

With deep=False, the new DataFrame shares data with the original, so changes to one may affect the other. However, with the new "copy-on-write" behavior in recent pandas versions, even shallow copies are protected against accidental modification, but it's safest to use the default deep copy.

## An exercise to practice using .copy

### Adding a column to the copy of the df Dataframe

In [16]:
df_copy = df.copy()
df_copy.loc[df_copy['age'] > 18, 'over_18'] = 'Adult'

df_copy

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone,under_18,over_18
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False,,Adult
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False,,Adult
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True,,Adult
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False,,Adult
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True,,Adult
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True,,Adult
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True,,Adult
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False,,
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True,,Adult


### Original DataFrame without the added column over_18

In [17]:
df

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone,under_18
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False,
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False,
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True,
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False,
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True,
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True,
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False,
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True,
