# Deep vs Shallow Copy of Pandas Dataframe

### Introduction

While working with Pandas, you may encounter an API call like this: `DataFrame.copy(deep=True)`. The `True` flag makes a copy of the object's indices and data. But why would you just not declare a new variable, and utilize the `copy()` API?

First we need to discuss few ways to make copies of Pandas Dataframe.

1. Declare a new variable
2. Utilize `copy(deep=True)`
3. Utilize `copy(deep=False)`

The first one is a self-explanatory, and will not be discussed in depth. The second one, using `True` flag will create a new object. The modification on the new object will not reflect the original dataframe. Using the `False` flag will create a new object, and any modifications on the new and the old object will reflect each other.

In [None]:
import pandas as pd

# Create using deep=True


In [None]:
df = pd.Series([1, 2], index=["a", "b"])
deep_df = df.copy(deep=True) #or df.copy() works the same 


In [None]:
shallow_df = df.copy(deep=False)

So why would we use this? What scenario would we want to use either function? The idea is that if you're taking an extra precaution to not modify the original dataframe.

Let's try another example.

In [None]:
df = pd.DataFrame({'string_col':['aaa','bbb','ccc'],'integer_col':[1,2,3]})
s1 = df['string_col']
df

In [None]:
# now try to modify value in string_col and see the error message
s1[0] = 111

In [None]:
df

Now, let's try something else. We'll try to modify string_col using loc.

In [None]:
s2 = df.loc[0]
s2['string_col'] = 222
df

What did we observe here? This time, s2 was a copy and did not affect the values in df. We can see now why we want to utilize the `copy()` function to reduce this ambiguity in programming - as a data scientist and a programmer, you have to be explicit and have an intention when you're making copies to the dataset which you will be heavily using it to explore and extract data. Usually during this exploratory analysis step, you could make unintentional changes to the original dataframe without realizing and hence affect the result of your model.