In [24]:
import pandas as pd
import numpy as np

# Example of a SettingWithCopyWarning

In [25]:
data = {"x": 2**np.arange(5),
        "y": 3**np.arange(5),
        "z": np.array([45, 98, 24, 11, 64])}

index = ["a", "b", "c", "d", "e"]

df = pd.DataFrame(data=data, index=index)
df

Unnamed: 0,x,y,z
a,1,1,45
b,2,3,98
c,4,9,24
d,8,27,11
e,16,81,64


![Pandas Dataframe](dataframe.webp)

In [26]:
mask = df['z']<50
df[mask]

Unnamed: 0,x,y,z
a,1,1,45
c,4,9,24
d,8,27,11


## Wijzigen van een subset van de data

In [27]:
# Onderstaand statement zou een warning moeten geven volgens de cursus, maar dat gebeurt niet.
# df blijft ongewijzigd!!
df[mask]["z"] = 0


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


![SettingWithCopyWarning](settingwithcopywarning.webp)

In this case, the proper way to modify df is to apply one of the accessors .loc[], .iloc[], .at[], or .iat[]:
Door gebruik te maken van een accessor (.loc, .at, ..) wordt met onderstaand statement df wel gewijzigd

In [28]:

# 
df = pd.DataFrame(data=data, index=index)

df.loc[mask, "z"] = 0
df

Unnamed: 0,x,y,z
a,1,1,0
b,2,3,98
c,4,9,0
d,8,27,0
e,16,81,64


Alternatief: aanpassen van de volgorde

In [29]:
df = pd.DataFrame(data=data, index=index)

df["z"]

df["z"][mask] = 0
df

Unnamed: 0,x,y,z
a,1,1,0
b,2,3,98
c,4,9,0
d,8,27,0
e,16,81,64


# Views and Copies in NumPy and Pandas


In [30]:
arr = np.array([1, 2, 4, 8, 16, 32])
arr.flags # Returns the flags for the array

  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

In [31]:
arr[1:4:2].base # No idea what .base is

array([ 1,  2,  4,  8, 16, 32])

Both statements below return the same result. But there is a difference!

In [32]:
arr[1:4:2] # returns shallow copy (view)
arr[[1,3]] # returns deep copy

array([2, 8])

In [33]:
arr[1:4:2].flags.owndata # False! 

False

In [34]:
arr[[1,3]].flags.owndata

True

## Views

In [35]:
# Explicitly create a view
view_of_array = arr.view()

In [36]:
view_of_array

array([ 1,  2,  4,  8, 16, 32])

In [37]:
view_of_array.base is arr

True

In [38]:
view_of_array.flags.owndata

False

![view_of_array](view_of_array.webp)

## Copies

In [39]:
copy_of_arr = arr.copy()

![copy_of_array](copy_of_array.webp)

## Understanding Views and Copies in Pandas

In [46]:
df = pd.DataFrame(data=data, index=index)

In [47]:
view_of_df = df.copy(deep=False)

In [48]:
copy_of_df = df.copy(deep=True)

In [49]:
view_of_df.to_numpy().base is df.to_numpy().base

True

In [50]:
copy_of_df.to_numpy().base is df.to_numpy().base

False

# Indices and Slices in NumPy and Pandas


## Indexing in NumPy: Copies and Views
NumPy has a strict set of rules related to copies and views when indexing arrays. Whether you get views or copies of the original data depends on the approach you use to index your arrays: slicing, integer indexing, or Boolean indexing.



### One-Dimensional Arrays
Slicing is a well-known operation in Python for getting particular data from arrays, lists, or tuples. When you slice a NumPy array, you get a view of the array:



In [51]:
arr = np.array([1, 2, 4, 8, 16, 32])

a = arr[1:3]

In [52]:
a.flags.owndata

False

![one-dimensional-arrays](one-dimensional-arrays.webp)

Though slicing returns a view, there are other cases where creating one array from another actually makes a copy.

Indexing an array with a list of integers returns a copy of the original array. The copy contains the elements from the original array whose indices are present in the list

In [53]:
c = arr[[1, 3]]
c

c.base is None

c.flags.owndata

True

### Chained Indexing in NumPy

In [54]:
arr = np.array([1, 2, 4, 8, 16, 32])
arr[1:4:2][0] = 64
arr



array([ 1, 64,  4,  8, 16, 32])

In [55]:

arr = np.array([1, 2, 4, 8, 16, 32])
arr[[1, 3]][0] = 64
arr

array([ 1,  2,  4,  8, 16, 32])

This example illustrates the difference between copies and views when using chained indexing in NumPy.

In the first case, arr[1:4:2] returns a view that references the data of arr and contains the elements 2 and 8. The statement arr[1:4:2][0] = 64 modifies the first of these elements to 64. The change is visible in both arr and the view returned by arr[1:4:2].

In the second case, arr[[1, 3]] returns a copy that also contains the elements 2 and 8. But these aren’t the same elements as in arr. They’re new ones. arr[[1, 3]][0] = 64 modifies the copy returned by arr[[1, 3]] and leaves arr unchanged.

## Multidimensional Arrays

Referencing multidimensional arrays follows the same principles:

Slicing arrays returns views.
Using index and mask arrays returns copies.
Combining index and mask arrays with slicing is also possible. In such cases, you get copies.

In [56]:
arr = np.array([[  1,   2,    4,    8],
                [ 16,  32,   64,  128],
                [256, 512, 1024, 2048]])
arr




a = arr[:, 1:3]  # Take columns 1 and 2
a



a.base



a.base is arr


b = arr[:, 1:4:2]  # Take columns 1 and 3
b



b.base



b.base is arr


c = arr[:, [1, 3]]  # Take columns 1 and 3
c



c.base


c.base is arr


d = arr[:, [False, True, False, True]]  # Take columns 1 and 3
d



d.base


d.base is arr

False

In [57]:
arr[0, 1] = 100
arr

array([[   1,  100,    4,    8],
       [  16,   32,   64,  128],
       [ 256,  512, 1024, 2048]])

## Indexing in Pandas: Copies and Views

In [59]:
df = pd.DataFrame(data=data, index=index)

In [61]:
df["a":"c"]

Unnamed: 0,x,y,z
a,1,1,45
b,2,3,98
c,4,9,24


In [65]:
#Returns all rows, as a result of the .base reference. 
#This means that the slice operations returns a view on the same base data as the data frame 
df["a":"c"].to_numpy().base 

array([[ 1,  2,  4,  8, 16],
       [ 1,  3,  9, 27, 81],
       [45, 98, 24, 11, 64]])

In [63]:
df["a":"c"].to_numpy().base is df.to_numpy().base

True

In [67]:
#On the other hand, accessing the first two columns of df with a list of labels returns a copy:

df = pd.DataFrame(data=data, index=index)
df[["x", "y"]].to_numpy().base

array([[ 1,  2,  4,  8, 16],
       [ 1,  3,  9, 27, 81]])

In [68]:
df[["x", "y"]].to_numpy().base is df.to_numpy().base

False

# Use of Views and Copies in Pandas 

## Chained Indexing and SettingWithCopyWarning

In [69]:
df = pd.DataFrame(data = data, index=index)

In [70]:
mask = df["z"]<50

In [72]:
# The statement gives a warning, because the statement df[mask]["z"] returns a copy of the original
# If your intention is to modify the original, then use an alternative method. 
# The preferred method is using accessors
df[mask]["z"] = 0

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [75]:
# Using accessors can also cause warnings: see below
# Again, the cause is that the statement returns a copy of the data
df.loc[mask]["z"] = 0

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [76]:
# Sometimes Python does not detect the problem
df = pd.DataFrame(data=data, index=index)
df.loc[["a", "c", "e"]]["z"] = 0  # Assignment fails, no warning
df

Unnamed: 0,x,y,z
a,1,1,45
b,2,3,98
c,4,9,24
d,8,27,11
e,16,81,64


In [77]:
# In the two cases below the code works, but still a warning is given

In [78]:
df = pd.DataFrame(data=data, index=index)
df[:3]["z"] = 0  # Assignment succeeds, with warning

df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,x,y,z
a,1,1,0
b,2,3,0
c,4,9,0
d,8,27,11
e,16,81,64


In [79]:
df = pd.DataFrame(data=data, index=index)
df.loc["a":"c"]["z"] = 0  # Assignment succeeds, with warning
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,x,y,z
a,1,1,0
b,2,3,0
c,4,9,0
d,8,27,11
e,16,81,64


The recommended way of performing such operations is to avoid chained indexing. Accessors can be of great help with that:



In [81]:
df = pd.DataFrame(data=data, index=index)
df.loc[mask, "z"] = 0 # Notice the difference with df.loc[mask]["z"]=0
df

Unnamed: 0,x,y,z
a,1,1,0
b,2,3,98
c,4,9,0
d,8,27,0
e,16,81,64


## Impact of Data Types on Views, Copies, and the SettingWithCopyWarning
In Pandas, the difference between creating views and creating copies also depends on the data types used. When deciding if it’s going to return a view or copy, Pandas handles DataFrames that have a single data type differently from ones with multiple types.

In [82]:
df = pd.DataFrame(data=data, index=index)
df.dtypes

x    int64
y    int64
z    int64
dtype: object

the fact that all columns have the same data type is important here!! 
In the statement below a view is returned by Pandas, because of this fact.

In [83]:
df["b":"d"]["z"] = 0

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


If the DataFrame contains columns of different types, then you might get a copy instead of a view, in which case the same assignment will fail:



In [84]:
df = pd.DataFrame(data=data, index=index).astype(dtype={"z": float})
df

Unnamed: 0,x,y,z
a,1,1,45.0
b,2,3,98.0
c,4,9,24.0
d,8,27,11.0
e,16,81,64.0


In [85]:
df["b":"d"]["z"]=0

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


## Hierarchical Indexing and SettingWithCopyWarning

Below a dataframe is created, with columns on two levels.

In [87]:
df = pd.DataFrame(
    data={("powers", "x"): 2**np.arange(5),
          ("powers", "y"): 3**np.arange(5),
          ("random", "z"): np.array([45, 98, 24, 11, 64])},
    index=["a", "b", "c", "d", "e"]
)

df

Unnamed: 0_level_0,powers,powers,random
Unnamed: 0_level_1,x,y,z
a,1,1,45
b,2,3,98
c,4,9,24
d,8,27,11
e,16,81,64


In [89]:
df["powers"] # returns all columns of level 2 under level "powers"

Unnamed: 0,x,y
a,1,1
b,2,3
c,4,9
d,8,27
e,16,81


In [90]:
df["powers", "x"]

a     1
b     2
c     4
d     8
e    16
Name: (powers, x), dtype: int64

In [91]:
df["powers", "x"]=0 # Changes the data, because the statement returns a view of the original data

In [92]:
df

Unnamed: 0_level_0,powers,powers,random
Unnamed: 0_level_1,x,y,z
a,0,1,45
b,0,3,98
c,0,9,24
d,0,27,11
e,0,81,64


In [93]:
df = pd.DataFrame(
    data={("powers", "x"): 2**np.arange(5),
          ("powers", "y"): 3**np.arange(5),
          ("random", "z"): np.array([45, 98, 24, 11, 64])},
    index=["a", "b", "c", "d", "e"]
)

df

Unnamed: 0_level_0,powers,powers,random
Unnamed: 0_level_1,x,y,z
a,1,1,45
b,2,3,98
c,4,9,24
d,8,27,11
e,16,81,64


In [94]:
df.loc[["a", "b"], "powers"]

Unnamed: 0,x,y
a,1,1
b,2,3


In [95]:
df.loc[["a", "b"], ("powers", "x")] # Get rows "a" and "b", and columns "powers" on first level, and "x" on second

a    1
b    2
Name: (powers, x), dtype: int64

In [97]:
# Using accessors, avoiding chained indexing
df.loc[["a", "b"], ("powers", "x")] = 0 # changes the original dataframe without warning
df

Unnamed: 0_level_0,powers,powers,random
Unnamed: 0_level_1,x,y,z
a,0,1,45
b,0,3,98
c,4,9,24
d,8,27,11
e,16,81,64


In [98]:
df = pd.DataFrame(
    data={("powers", "x"): 2**np.arange(5),
          ("powers", "y"): 3**np.arange(5),
          ("random", "z"): np.array([45, 98, 24, 11, 64])},
    index=["a", "b", "c", "d", "e"]
)

df


Unnamed: 0_level_0,powers,powers,random
Unnamed: 0_level_1,x,y,z
a,1,1,45
b,2,3,98
c,4,9,24
d,8,27,11
e,16,81,64


In [99]:

df["powers"]



Unnamed: 0,x,y
a,1,1
b,2,3
c,4,9
d,8,27
e,16,81


In [101]:
#Here, df["powers"] returns a DataFrame with the columns x and y. 
#This is just a view that points to the data from df, so the assignment is successful and df is modified. 
#But Pandas still issues a SettingWithCopyWarning.


df["powers"]["x"] = 0 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


# Change the Default SettingWithCopyWarning Behavior


In [102]:
pd.set_option("mode.chained_assignment", "raise") #raises a SettingWithCopyException.
pd.set_option("mode.chained_assignment", "warn") #issues a SettingWithCopyWarning. This is the default behavior.
pd.set_option("mode.chained_assignment", None) #suppresses both the warning and the error.