### Integer Indexing Pitfalls in Pandas

Integer indexing pitfalls arise in **Pandas DataFrames** or **Series** where the index contains **integers**. Pandas can sometimes treat integers as **labels** or **positions** depending on the context, leading to **unexpected behavior**.

---

### **Understanding the Problem**
In Pandas:
- **Labels** are explicit names or identifiers for rows/columns.
- **Positions** refer to the zero-based location of rows/columns.

When an index consists of integers, it can cause ambiguity because **Pandas cannot distinguish between labels and positions** in some cases. 

---

### **Common Pitfalls**

#### **1. Ambiguity with Regular `[]` Indexing**

If the DataFrame or Series index contains integers, Pandas treats the number inside `[]` as a **label**, not as a **position**.


#### **2. Slicing Behavior Differences**

When slicing, integers are treated as **labels** with `.loc` and **positions** with `.iloc`.

```python
# Using .loc
result = data.loc[1:3]  # Slicing by labels
print(result)
```
**Output:**
```
    A
1  20
2  30
3  40
```
Here, slicing with `.loc` includes the endpoint (`1:3` includes `3`).

```python
# Using .iloc
result = data.iloc[1:3]  # Slicing by positions
print(result)
```
**Output:**
```
    A
1  20
2  30
```
Here, slicing with `.iloc` excludes the endpoint (`1:3` excludes `3`).

---

#### **3. Adding Columns with Integer Names**
If you use integers as column labels, they might conflict with positional indexing.

```python
data[1] = [5, 6, 7, 8]  # Adds a column with label '1'
print(data)
```
**Output:**
```
    A  1
0  10  5
1  20  6
2  30  7
3  40  8
```
Now, `data[1]` refers to the **column with label `1`**, not positional indexing for rows.

---

### **How to Avoid Integer Indexing Pitfalls?**

1. **Use Explicit Indexing with `.loc` and `.iloc`**
   - `.loc` for **label-based indexing**.
   - `.iloc` for **position-based indexing**.
   - This makes your code unambiguous and avoids errors.

2. **Avoid Integer Indexes (if possible)**
   - If the index contains integers but does not represent row positions, consider resetting or renaming the index.
   ```python
   data.reset_index(drop=True, inplace=True)  # Convert index to positions
   ```

3. **Understand Column vs Row Access**
   - Avoid using `data[1]` directly when the index or column names are integers. Instead, be explicit:
     ```python
     data.loc[:, 1]  # Access column '1'
     data.iloc[:, 1]  # Access the second column by position
     ```

4. **Debug Using Index Type**
   - Check whether an index is integer-based using:
     ```python
     print(data.index)  # Will show Int64Index if the index is integer-based
     ```

---

### **Summary Table: Common Pitfalls**

| **Operation**                  | **Ambiguity**                                            | **Solution**                      |
|--------------------------------|---------------------------------------------------------|-----------------------------------|
| `data[1]`                      | Is `1` a column label or row index position?            | Use `.loc[1]` or `.iloc[1]`.      |
| Slicing with `data[1:3]`       | Is it slicing by label or position?                     | Use `.loc[1:3]` or `.iloc[1:3]`.  |
| Integer index in rows/columns  | Are integers labels or positions?                       | Avoid integer-based indexes.      |
| Adding integer-labeled columns | Conflicts with row-based integer indexing.              | Use explicit column names.        |


In [1]:
import numpy as np 
import pandas as pd
from pandas import Series, DataFrame

In [2]:
ser = pd.Series(np.arange(3.))
ser

0    0.0
1    1.0
2    2.0
dtype: float64

In [3]:
# 
ser[-1]

KeyError: -1

With a non-integer index, there is no such ambiguity

In [5]:
ser2 = pd.Series(np.arange(3.), index=['a', 'b', 'c'])
ser2[-1]

  ser2[-1]


np.float64(2.0)

- If you have an axis index containing integers, data selection will always be label oriented.
- If you use loc (for labels) or iloc (for integers) you will get exactly what you want

In [7]:
ser.iloc[-1]

np.float64(2.0)

In [9]:
# on the other hand, slicing with integers is always integer oriented

ser[:2]

# As a result of these pilfalls, it is best to always prefer indexing with loc and iloc to avoid ambiguity

0    0.0
1    1.0
dtype: float64

### Pitfalls with chained Indexing
loc and iloc indexing attributes can also be used to modify DataFrame objects in place, but doing so requires some care

In [12]:
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
                  index=["Ohio", "Colorado", "Utah", "New York"],
                  columns=["one", "two", "three", "four"])
data.loc[:, 'one'] = 1
data

Unnamed: 0,one,two,three,four
Ohio,1,1,2,3
Colorado,1,5,6,7
Utah,1,9,10,11
New York,1,13,14,15


In [14]:
data.iloc[2] = 5
data

Unnamed: 0,one,two,three,four
Ohio,1,1,2,3
Colorado,1,5,6,7
Utah,5,5,5,5
New York,1,13,14,15


In [17]:
data.loc[data['four'] > 5] = 3
data

Unnamed: 0,one,two,three,four
Ohio,1,1,2,3
Colorado,3,3,3,3
Utah,5,5,5,5
New York,3,3,3,3


In [18]:
data

Unnamed: 0,one,two,three,four
Ohio,1,1,2,3
Colorado,3,3,3,3
Utah,5,5,5,5
New York,3,3,3,3


In [19]:
data.loc[data.three == 5, "three"] = 6
data

Unnamed: 0,one,two,three,four
Ohio,1,1,2,3
Colorado,3,3,3,3
Utah,5,5,6,5
New York,3,3,3,3


---- 

The **`SettingWithCopyWarning`** is a warning in Pandas that occurs when you attempt to modify a value in a **subset of a DataFrame** but Pandas isn't sure whether you're working on a **copy** of the data or a reference (view) to the original data. This warning is designed to alert you that your changes **might not affect the original DataFrame**.


### Why does this happen?

In Pandas, operations like slicing or filtering a DataFrame can create a **view** or a **copy**:
- A **view** shares data with the original DataFrame, so changes to the view will reflect in the original DataFrame.
- A **copy** is independent, so changes to the copy won't affect the original DataFrame.

Pandas issues the `SettingWithCopyWarning` when it can't determine whether the subset is a view or a copy.



In [22]:
# Example

# create a sample DataFrame
data = pd.DataFrame({'A': [1,2,3], 'B': [4,5,6]})
print(data)

# slice the dataFrame
subset = data[data['A'] > 1 ]
print(subset)

   A  B
0  1  4
1  2  5
2  3  6
   A  B
1  2  5
2  3  6


In [23]:
# attempt to modify a value in the slice
subset['B'] = 99

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  subset['B'] = 99


### How to Avoid SettingWithCopyWarning

#### 1. Use `.loc` for Explicit Modification
When working with subsets, use `.loc` to ensure you're directly modifying the original DataFrame.

Example:
```python
data.loc[data["A"] > 1, "B"] = 99
```

This ensures that the original `data` is updated without ambiguity.


#### 2. Avoid Chained Indexing
Chained indexing occurs when you access and modify data in multiple steps, such as `data[...]["B"]`. This can cause confusion about whether a view or a copy is being modified.

**Problematic:**
```python
subset = data[data["A"] > 1]
subset["B"] = 99  # Triggers the warning
```

**Solution:**
Combine the operations using `.loc`:
```python
data.loc[data["A"] > 1, "B"] = 99
```



#### 3. Check if the Subset is a Copy or a View
Use `.is_copy` to check if a DataFrame is a copy:
```python
subset = data[data["A"] > 1]
print(subset.is_copy)  # Check if it's a copy
```

### Key Takeaways

- **`SettingWithCopyWarning`** means you might be modifying a **copy** of data instead of the original DataFrame.
- Always use **`.loc`** to make modifications to avoid ambiguity.
- Avoid **chained indexing** like `data[...]["B"]`—it’s harder for Pandas to interpret your intent.
- The warning is just a **reminder** that you need to be careful about how your data is being handled.

By being explicit with `.loc` or `.iloc`, you can ensure your changes behave as expected!