# Pandas Dropping rows with specified condition

Dataframe can be subsetted using Vectorization or `DataFrame.drop()` function.



## How to drop rows from the dataframe based on a certain condition applied on a column.

In this article, you will learn how to drop the rows from dataframe using -

*  Vectorization method.
*  Pandas Drop function.

Let's create a dataframe first. 

In [None]:
# Importing library
import pandas as pd 

data = {
    "Roll No": [1,2,3,4,5,6,7,8,9,10],
    "Name": ['A','B','C','D','E','F','G','H','I','J'],
    "Age": [10,11,10,12,11,10,11,10,10,10],
    "Gender": ['M','F','F','M','M','M','F','F','M','F'],
    "Percentage": [78,83,96,62,89,74,68,80,94,91]
}

# Creating DataFrame
df = pd.DataFrame(data)

print("Shape of Dataframe: {}\n".format(df.shape))
df

Shape of Dataframe: (10, 5)



Unnamed: 0,Roll No,Name,Age,Gender,Percentage
0,1,A,10,M,78
1,2,B,11,F,83
2,3,C,10,F,96
3,4,D,12,M,62
4,5,E,11,M,89
5,6,F,10,M,74
6,7,G,11,F,68
7,8,H,10,F,80
8,9,I,10,M,94
9,10,J,10,F,91


## 1.Vectorization to filter out rows from the data

In [None]:
# Selecting Female students with age as 10 having percentage more than 80
df1 = df[(df['Gender'] == 'F') & (df['Percentage'] > 80) & (df['Age'] == 10)]
  
print("Shape of Dataframe: {}\n".format(df1.shape))
df1

Shape of Dataframe: (2, 5)



Unnamed: 0,Roll No,Name,Age,Gender,Percentage
2,3,C,10,F,96
9,10,J,10,F,91


You can observe there are 2 girls with Age = 10 having percentage more than 80.

## 2.`DataFrame.drop()` function to drop the rows from the data

Let's see the Syntax of `df.drop()` function.

### DataFrame.drop()

__DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')__

__Purpose:__ This function is used to remove rows or columns by specifying condition.

__Parameters:__

 __- labels:__ single label or list. Index or column labels to drop.

 __- axis:__ specify the row or column. 0 or ‘index’ applies function to each column. 1 or ‘columns’ applies function to each row.

 __- index:__ single label or list. Alternative to specifying axis. (`labels, axis = 0` is same as `index = labels`)

 __- columns:__ single label or list. Alternative to specifying axis. (`labels, axis = 1` is same as `columns = labels`)

 __- level:__ level from which the labels will be removed.

 __- inplace:__ If `False`, return a copy, else return `None`.

 __- errors:__ raise or ignore errors.
 

__Returns:__ Dataframe, None if `inplace = True`.

In [None]:
# Selecting students with percentage less than 80% 
df2 = df.drop(df[df['Percentage'] >= 80].index)

print("Shape of Dataframe: {}\n".format(df2.shape))
df2

Shape of Dataframe: (4, 5)



Unnamed: 0,Roll No,Name,Age,Gender,Percentage
0,1,A,10,M,78
3,4,D,12,M,62
5,6,F,10,M,74
6,7,G,11,F,68


You can observe that 4 students scored below 80%. 

This is how subsetting of data takes places using `df.drop()` function.

## 3.Practical Tips

* If you have a large dataframe then use `df.drop(condition, inplace = True)` instead of `df1 = df.drop()`. Make sure the condition given is correct because inplace makes the changes in current dataframe itself.

* Use `&`, `|` instead of `and`, `or` on multiple conditions.

* Use `df.dropna()` to drop the Null values.

* While subsetting the data for more than one condition make sure you use the conditions in `()` like `df = df[(Cond1) & (Cond2) & (Cond3)]`.


## 4.Test Your Knowledge

__Q1.__ Which method is used to remove the missing values from dataframe ?

`A. dropna()`, `B. dropnull()`, `C. drop()`, `D. dropnan()`

__Ans.__ A. `dropna()` method is used to remove the missing values.



__Q2.__ What type of error is there in the following code ? How to overcome it? 


In [None]:
df1 = df[(df['Gender'] == 'M') and (df['Percentage'] > 90)]

__Ans.__ `ValueError`. Use `&` instead of `and`.

In [None]:
df1 = df[(df['Gender'] == 'M') & (df['Percentage'] > 90)]

__Q3.__ What is output of following code ?

In [None]:
import numpy as np
import pandas as pd 

data = {"Roll No": [1,2,3,4,5], "Marks": [np.nan,85,np.inf,60,90]}
df = pd.DataFrame(data)
df.dropna(inplace = True)

print(df.shape)

(4, 2)
