# Dropping and deleting rows and columns


In [37]:
# importing the libraries
import numpy as np
import pandas as pd

## Using .drop

In [64]:
# Creating a dataframe using dictionary
store_data = pd.DataFrame({'CustomerID': ['CustID00','CustID01','CustID02','CustID03','CustID04']
                           ,'location': ['Chicago', 'Boston', 'Seattle', 'San Francisco', 'Austin']
                           ,'gender': ['M','M','F','M','F']
                           ,'type': ['Electronics','Food&Beverages','Food&Beverages','Medicine','Beauty']
                           ,'quantity':[1,3,4,2,1],'total_bill':[100,75,125,50,80]})
store_data

Unnamed: 0,CustomerID,location,gender,type,quantity,total_bill
0,CustID00,Chicago,M,Electronics,1,100
1,CustID01,Boston,M,Food&Beverages,3,75
2,CustID02,Seattle,F,Food&Beverages,4,125
3,CustID03,San Francisco,M,Medicine,2,50
4,CustID04,Austin,F,Beauty,1,80


**Removing a column from a DataFrame**

* The CustomerID column is a unique identifier of each customer. This unique identifier will not help 24/7 Stores in getting useful insights about their customers. So, they have decided to remove this column from the data frame.

In [65]:
store_data.drop('CustomerID',axis=1) # axis = 1 is for columns and axis = 0 is for rows 

Unnamed: 0,location,gender,type,quantity,total_bill
0,Chicago,M,Electronics,1,100
1,Boston,M,Food&Beverages,3,75
2,Seattle,F,Food&Beverages,4,125
3,San Francisco,M,Medicine,2,50
4,Austin,F,Beauty,1,80


* We sucessfully removed the 'CustomerID' from dataframe. But this change is not permanent in the dataframe, let's have a look at the store_data again.

In [66]:
store_data

Unnamed: 0,CustomerID,location,gender,type,quantity,total_bill
0,CustID00,Chicago,M,Electronics,1,100
1,CustID01,Boston,M,Food&Beverages,3,75
2,CustID02,Seattle,F,Food&Beverages,4,125
3,CustID03,San Francisco,M,Medicine,2,50
4,CustID04,Austin,F,Beauty,1,80


* We see that store_data still has column 'CustomerID' in it.
* To make permanent changes to a dataframe there are two methods will have to use a parameter `inplace` and set its value to `True`.

In [41]:
store_data.drop('CustomerID',axis=1,inplace=True)
store_data 

Unnamed: 0,location,gender,type,quantity,total_bill
0,Chicago,M,Electronics,1,100
1,Boston,M,Food&Beverages,3,75
2,Seattle,F,Food&Beverages,4,125
3,San Francisco,M,Medicine,2,50
4,Austin,F,Beauty,1,80


In [67]:
# we can also remove multiple columns simultaneously
# it is always a good idea to store the new/updated data frames in new variables to avoid changes to the existing data frame

# creating a copy of the existing data frame
new_store_data = store_data.copy()
store_data

Unnamed: 0,CustomerID,location,gender,type,quantity,total_bill
0,CustID00,Chicago,M,Electronics,1,100
1,CustID01,Boston,M,Food&Beverages,3,75
2,CustID02,Seattle,F,Food&Beverages,4,125
3,CustID03,San Francisco,M,Medicine,2,50
4,CustID04,Austin,F,Beauty,1,80


In [68]:
# dropping location and rating columns simultaneously
new_store_data.drop(['location','quantity'],axis=1,inplace=True)
new_store_data

Unnamed: 0,CustomerID,gender,type,total_bill
0,CustID00,M,Electronics,100
1,CustID01,M,Food&Beverages,75
2,CustID02,F,Food&Beverages,125
3,CustID03,M,Medicine,50
4,CustID04,F,Beauty,80


**Removing rows from a dataframe**

In [69]:
store_data.drop(1,axis=0)

Unnamed: 0,CustomerID,location,gender,type,quantity,total_bill
0,CustID00,Chicago,M,Electronics,1,100
2,CustID02,Seattle,F,Food&Beverages,4,125
3,CustID03,San Francisco,M,Medicine,2,50
4,CustID04,Austin,F,Beauty,1,80


* Notice that we used **`axis=0`** to drop a row from a data frame, while we were using **`axis=1`** for dropping a column from the data frame.
* Also, to make permanent changes to the data frame we will have to use `inplace=True` parameter.
* We also see that the index are not correct now as first row has been removed. So, we will have to reset the index of the data frame. Let's see how this can be done.

In [70]:
# creating a new dataframe
store_data_new  = store_data.drop(1,axis=0)
store_data_new

Unnamed: 0,CustomerID,location,gender,type,quantity,total_bill
0,CustID00,Chicago,M,Electronics,1,100
2,CustID02,Seattle,F,Food&Beverages,4,125
3,CustID03,San Francisco,M,Medicine,2,50
4,CustID04,Austin,F,Beauty,1,80


In [71]:
# resetting the index of data frame
store_data_new.reset_index()

Unnamed: 0,index,CustomerID,location,gender,type,quantity,total_bill
0,0,CustID00,Chicago,M,Electronics,1,100
1,2,CustID02,Seattle,F,Food&Beverages,4,125
2,3,CustID03,San Francisco,M,Medicine,2,50
3,4,CustID04,Austin,F,Beauty,1,80


* We see that the index of the data frame is now resetted but the index has become a column in the data frame. We do not need the index to become a column so we can simply set the parameter **`drop=True`** in reset_index() function.

In [72]:
# setting inplace = True to make the changes permanent
store_data_new.reset_index(drop=True,inplace=True)
store_data_new

Unnamed: 0,CustomerID,location,gender,type,quantity,total_bill
0,CustID00,Chicago,M,Electronics,1,100
1,CustID02,Seattle,F,Food&Beverages,4,125
2,CustID03,San Francisco,M,Medicine,2,50
3,CustID04,Austin,F,Beauty,1,80


## Using .pop

* pop() is a list method that removes the item at the given index and returns the removed item . If the index is not specified, by default, it removes the last element. To confirm we can check the list again.

In [73]:
# turn into list

list_CustID = list(store_data_new['CustomerID'])

list_CustID



['CustID00', 'CustID02', 'CustID03', 'CustID04']

In [74]:
# Remove third element
list_CustID.pop(2) 

'CustID03'