<a href="https://colab.research.google.com/github/sureshmecad/Google-Colab/blob/master/2_inplace%3DTrue_dropna_drop_values.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

- By default, the **dropna** method **does not directly modify** the DataFrame you’re working with.

- By default, dropna actually **creates a new DataFrame** and keeps the **original DataFrame unchanged.**

- Syntactically, the **default** for the inplace parameter is **inplace = False.** This causes the method to leave the **original DataFrame unchanged,** and **create a new DataFrame as an output.**

- If you set **inplace = True**, the dropna method will **directly modify your original DataFrame (and won’t produce a new output).** That means that if you set **inplace = True**, dropna will **drop all missing values** from your **original dataset**. It will overwrite your data, so be careful with it!

In [None]:
import numpy as np
import pandas as pd

In [None]:
sales_data = pd.DataFrame({"name":["William","Emma","Sofia","Markus","Edward","Thomas","Ethan",np.nan,"Arun","Anika","Paulo"],
                           "region":[np.nan,"North","East","South","West","West","South",np.nan,"West","East","South"],
                           "sales":[50000,52000,90000,np.nan,42000,72000,49000,np.nan,67000,65000,67000],
                           "expenses":[42000,43000,np.nan,44000,38000,39000,42000,np.nan,39000,44000,45000]})

print(sales_data)

       name region    sales  expenses
0   William    NaN  50000.0   42000.0
1      Emma  North  52000.0   43000.0
2     Sofia   East  90000.0       NaN
3    Markus  South      NaN   44000.0
4    Edward   West  42000.0   38000.0
5    Thomas   West  72000.0   39000.0
6     Ethan  South  49000.0   42000.0
7       NaN    NaN      NaN       NaN
8      Arun   West  67000.0   39000.0
9     Anika   East  65000.0   44000.0
10    Paulo  South  67000.0   45000.0


- The **rows** have **missing values:** rows 0, 2, 3, and 7 all contain missing values.

- Some of the **rows** only contain **one missing value**

- but in **row 7**, **all** of the values are **missing.**

In [None]:
sales_data.dropna()

Unnamed: 0,name,region,sales,expenses
1,Emma,North,52000.0,43000.0
4,Edward,West,42000.0,38000.0
5,Thomas,West,72000.0,39000.0
6,Ethan,South,49000.0,42000.0
8,Arun,West,67000.0,39000.0
9,Anika,East,65000.0,44000.0
10,Paulo,South,67000.0,45000.0


- Remember when we created our DataFrame, **rows 0, 2, 3, and 7** all contained **missing values.**

- After using **dropna()**, **rows 0, 2, 3, and 7** have all been **removed**.

- It removes rows with missing values (it understands that NaN is a missing value).

- Notice that the code **removed every row** that contained any **missing value.** If **even one** of the values was missing, the **whole row was deleted**. That’s the **default behavior.** By default the how parameter is set to **how = 'any'**, so with this code, if any of the values are missing, the whole row is removed.

- This code did **not directly change** the **sales_data** DataFrame. It only **created a new DataFrame.**

In [None]:
sales_data

Unnamed: 0,name,region,sales,expenses
0,William,,50000.0,42000.0
1,Emma,North,52000.0,43000.0
2,Sofia,East,90000.0,
3,Markus,South,,44000.0
4,Edward,West,42000.0,38000.0
5,Thomas,West,72000.0,39000.0
6,Ethan,South,49000.0,42000.0
7,,,,
8,Arun,West,67000.0,39000.0
9,Anika,East,65000.0,44000.0


##### **MODIFY THE DATAFRAME “IN PLACE”**

- Finally, we’re going to modify the DataFrame **“in place”**.

- That means that we’re going to directly delete rows from the input DataFrame.

- I’m going to create a copy of sales_data. Just in case you want to keep playing with the original sales_data DataFrame, we’ll copy it an use the copy in this example.

In [None]:
sales_data_copy = sales_data.copy()

sales_data_copy.dropna(inplace = True)

- Notice that when you run the code, it **doesn’t send any output to the console.**

- That’s because when you use **inplace = True**, **dropna doesn’t create a new DataFrame.** It directly **modifies** the **original DataFrame.** In this case, dropna directly deleted rows from sales_data_copy.

In [None]:
print(sales_data_copy)

      name region    sales  expenses
1     Emma  North  52000.0   43000.0
4   Edward   West  42000.0   38000.0
5   Thomas   West  72000.0   39000.0
6    Ethan  South  49000.0   42000.0
8     Arun   West  67000.0   39000.0
9    Anika   East  65000.0   44000.0
10   Paulo  South  67000.0   45000.0


- Notice that the **missing values** have been **removed** from **sales_data_copy.**

- Just remember, when you use **inplace = True**, the dropna is going to delete any rows from your data with missing values.

#### **WHY DIDN’T DROPNA DROP THE VALUES?**


- Remember: by **default**, the dropna method **does not** modify the **original DataFrame.**

- **Dropna** creates a **new DataFrame** as an output.

- If you’re working in an IDE, this **output will not be saved**, and will instead be **sent to the console.**

- If you want to save the output, you need to **save the output** (typically with a **new variable name**) like this:

In [None]:
sales_data_noNA = sales_data.dropna()

- In this case, the new DataFrame **sales_data_noNA** will be the **new DataFrame without missing values**, and **sales_data will remain unchanged.**

- Alternatively, you could also use the inplace parameter and set **inplace = True** to directly **modify the original DataFrame.**

In [None]:
sales_data.dropna(inplace = True)