# Slicing and Dicing Dataframes

You have seen how to do indexing of dataframes using ```df.iloc``` and ```df.loc```. Now, let's see how to subset dataframes based on certain conditions. You will now study how to:

* Subset rows based on certain conditions
* Subset rows and columns based on conditions 
* Add and remove rows and columns from dataframes

In [None]:
# loading libraries and reading the data
import numpy as np
import pandas as pd

df = pd.read_csv("../global_sales_data/market_fact.csv")
df.head()

### Subsetting Rows Based on Conditions

Often, you want to select rows and columns which satisfy some given conditions. For e.g., select all the orders where the ```Sales``` > 3000, or all the orders where 2000 < ```Sales``` < 3000 and ```Profit``` < 100.

The best way to do these operations is using ```df.loc[]```, since ```df.iloc[]``` would require you to remember the column indices, which is rarely easy.

Let's see some examples.

In [None]:
# Select all rows where Sales > 3000
# First, we get a boolean array where True corresponds to rows having Sales > 3000
df.Sales > 3000

In [None]:
# Then, we pass this boolean array inside df.loc
df.loc[df.Sales > 3000]

In [None]:
# An alternative to df.Sales is df['Sales]
# You may want to put the : to indicate that you want all columns
# It is totally optional, but is more explicit 
df.loc[df['Sales'] > 3000, :]

In [None]:
# We combine multiple conditions using the & operator
# E.g. all orders having 2000 < Sales < 3000 and Profit > 100
df.loc[(df.Sales > 2000) & (df.Sales < 3000) & (df.Profit > 100) ]

In [None]:
# E.g. all orders having 2000 < Sales < 3000 and Profit > 100
# Also, this time, you only need the Cust_id, Sales and Profit columns
df.loc[(df.Sales > 2000) & (df.Sales < 3000) & (df.Profit > 100), ['Cust_id', 'Sales', 'Profit']]

In [None]:
# You can use the == and != operators 
df.loc[(df.Sales == 4233.15)]
df.loc[(df.Sales != 1000)]

In [None]:
# You may want to select rows whose column value is in an iterable
# For instance, say a colleague gives you a list of customer_ids from a certain region

customers_in_bangalore = ['Cust_1798', 'Cust_1519', 'Cust_637', 'Cust_851']

# To get all the orders from these customers, use the isin function
# It returns a boolean, which you can use to select rows
df.loc[df['Cust_id'].isin(customers_in_bangalore)]