# Pandas Data Selection, Indexing, and Assignment

## 1. Selecting Columns

You can select columns using **bracket** notation (recommended for all cases) or **dot** notation (only for simple column names without spaces/special characters)

**Syntax:**  
df['colname'] # Always works  
df.colname # Only if colname is a valid Python identifier

**Example:**

In [None]:
import seaborn as sns
import pandas as pd
#sns.get_dataset_names() #- This will print a list of dataset names you can use

ds= sns.load_dataset('titanic')
df= pd.DataFrame(ds)
#display(df)
df['embark_town']           #df['colname']

In [None]:
df.sex      #df.colname

## 2. Accessing Specific Values

For efficient access to single values, use `.at[]` (label-based) or `.iat[]` (integer position). You can also use `.loc[]` and `.iloc[]` for more general selection.

**Syntax:**  
By label  
value = df.at[row_label, 'colname']  

By integer position  
value = df.iat[row_index, col_index]  


**Example:**

In [None]:
df.at[1, 'sex'] # Value at row label 1, column 'sex'

In [None]:
df.iat[1,2] # Value at 1st row, 2nd column (0-based)

## 3. Index-Based Selection with `iloc`

Use `iloc` for integer-location based indexing (row, column order).

**Syntax:**  
df.iloc[row_start:row_end, col_start:col_end]

**Examples:**

In [None]:
df.iloc # First row
df.iloc[:, 0] # All rows, first column
df.iloc[:3, 0] # First three rows, first column
df.iloc[1:4, 2:5] # Rows 1-3, columns 2-4
df.iloc[-5:] # Last five rows

## 4. Label-Based Selection with `loc`

Use `loc` for label-based indexing (row label, column label).

**Syntax:**  
df.loc[row_label, col_label]

**Examples:**

In [None]:
df.loc[10, 'embark_town'] # Value at row label 10, column 'country'
df.loc[5:10, ['sex', 'embark_town']] # Rows 5 to 10, specific columns

**Note:**  
- `iloc[0:10]` selects rows 0–9 (Python-style, exclusive).
- `loc[0:10]` selects rows 0–10 (inclusive).

## 5. Setting and Resetting Index

Set a column as the index for easier selection, then reset if needed.

**Syntax:**  
df = df.set_index('colname')  
df = df.reset_index()  

**Example:**

In [None]:
df.set_index('survived')

In [None]:
df.reset_index()

## 6. Conditional Selection (Boolean Indexing)

Filter rows based on conditions using boolean masks.

**Syntax:**  
mask = df['colname'] == value  
filtered_df = df.loc[mask]  

**Examples:**

In [None]:
#df.loc[df['sex'] == 'female']
df.loc[df['age'] >= 18]

## 7. Combining Multiple Conditions

Use `&` for AND, `|` for OR, and `~` for NOT. Always wrap each condition in parentheses.

**Examples:**

In [None]:
#df.loc[(df['sex'] == 'female') & (df['age'] >= 18)] # AND
df.loc[(df['sex'] == 'male') | (df['embark_town'] == 'Cherbourg')] # OR
#df.loc[~(df['embark_town'] == 'Cherbourg')] # NOT

## 8. Using `isin` for Multiple Values

Check if column values are in a list, set, or other iterable.

**Syntax:**  
mask = df['colname'].isin([value1, value2])  
filtered_df = df.loc[mask]  

**Example:**

In [None]:
df.loc[df['embark_town'].isin(['Cherbourg', 'Southampton'])]

## 9. Handling Missing Data

Use `isnull()` and `notnull()` to filter missing or non-missing values.

**Examples:**

In [None]:
df.loc[df['age'].isnull()]

In [None]:
df.loc[df['age'].notnull()]  #177 rows + 714 rows = 891 total

## 10. Assigning Values

Assign values to entire columns or create new columns based on conditions.

**Syntax:**  
df['colname'] = value  
df.loc[condition, 'colname'] = value  

**Examples:**

In [None]:
df['Male_Under18'] = 'default_value'
display(df)
#Delete a column - 

df.drop(columns=['Male_Under18'], inplace=True)
display(df)

#### Delete a single column by name (returns a new DataFrame)
df = df.drop('column_name', axis=1)

#### Delete multiple columns by name
df = df.drop(['col1', 'col2'], axis=1)

#### Delete a column in-place (modifies the original DataFrame)
df.drop('column_name', axis=1, inplace=True)  

Key Points:  
- Use axis=1 (or axis='columns') to specify columns.  
- By default, drop() does not change the original DataFrame unless inplace=True is used.  
- You can also use the columns parameter:  

In [None]:
df.loc[df['age'] <= 18, 'under_18'] = 'Young'
df.loc[df['under_18'] == 'Young']


**Notes:**
- Use `axis=1` or `axis='columns'` to specify you are dropping columns (not rows)[1][2][3][4][6][8].
- The `drop()` method can remove one or multiple columns.
- `del` and `pop()` can only remove one column at a time and always modify the DataFrame in place.
- If you want to keep the original DataFrame unchanged, do not use `inplace=True` or use assignment (`df = df.drop(...)`).

### Create a deep copy (recommended for most use cases)
df_copy = df.copy()

### Create a shallow copy (shares data with original, use with caution)
df_shallow = df.copy(deep=False)
By default, deep=True, so df.copy() creates a new DataFrame with its own data and indices. Changes to df_copy will NOT affect df and vice versa.

With deep=False, the new DataFrame shares data with the original, so changes to one may affect the other. However, with the new "copy-on-write" behavior in recent pandas versions, even shallow copies are protected against accidental modification, but it's safest to use the default deep copy.

## An exercise to practice using .copy

### Adding a column to the copy of the df Dataframe

In [None]:
df_copy = df.copy()
df_copy.loc[df_copy['age'] > 18, 'over_18'] = 'Adult'

df_copy

### Original DataFrame without the added column over_18

In [None]:
df