# Understanding Filtering

    Filtering allows us to focus our data so we can either work with a smaller group of data or remove it
    if it does not meet our needs. You will find as you continuing your path with working with data each
    tool you pick up has its own ways of doing it.
    
    We are going to focus on one tool called loc. Loc allows you to find rows of a data set that meet a condition
    that you specify. To create a condition you can use operators to compare the values you are interest in.

# What Operators Can I Use?

    This is a list of operators we will use to go through some filtering examples.
    
    == means Equal to
    != means Not Equal to
    > means Greater Than
    >= means Greater Than or Equal to
    < means Less Than
    <= means Less Than or Equal to

# Using Loc to Filter Data

In [1]:
# Import Pandas

import pandas as pd

# Create Customer Data Set

customer=pd.DataFrame({
    'id':[1,2,3,4,5,6,7,8,9],
    'name':['Jess','Tanya','Pete','Mel','Heather','Arlene','Benny','Daniel','Jeremy'],
    'age':[20,25,15,10,30,65,35,18,23],
    'Product_ID':[101,0,106,0,103,104,0,0,107],
    'Purchased_Product':['Watch','NA','Oil','NA','Shoes','Smartphone','NA','NA','Laptop'],
    'City':['Santa Cruz','Sacramento','San Deigo','San Fransico','San Fransico','Sacramento','Los Angeles','Santa Cruz','Sacramento']
})

# View the Data

customer

Unnamed: 0,id,name,age,Product_ID,Purchased_Product,City
0,1,Jess,20,101,Watch,Santa Cruz
1,2,Tanya,25,0,,Sacramento
2,3,Pete,15,106,Oil,San Deigo
3,4,Mel,10,0,,San Fransico
4,5,Heather,30,103,Shoes,San Fransico
5,6,Arlene,65,104,Smartphone,Sacramento
6,7,Benny,35,0,,Los Angeles
7,8,Daniel,18,0,,Santa Cruz
8,9,Jeremy,23,107,Laptop,Sacramento


    Our Sales team accidently added some customers to our database incorrectly.
    Instead of leaving the Purchased Product collumn blank they entered NA
    
    We are going to use loc to find the Customers that were entered in incorrectly


# Using == to Find Rows That Match a Value

    How to do it
    
    The syntax for the loc command is the following
    
    loc[Data Set["Name of the collumn that has the value you want to look for"] == "Value you are searching for"]
    
    let do an example

In [2]:
# Creating a variable named na that will find all the data that has NA in the Purchased Product collumn

na = customer.loc[customer["Purchased_Product"]== "NA"]

# Display the data

na

Unnamed: 0,id,name,age,Product_ID,Purchased_Product,City
1,2,Tanya,25,0,,Sacramento
3,4,Mel,10,0,,San Fransico
6,7,Benny,35,0,,Los Angeles
7,8,Daniel,18,0,,Santa Cruz


# What Do We See?

    Our customer data set is now only showing the entries that need to be removed. All rows are hidden from view

# Using != to Find Rows That Do Not Match a Value

    How to do it
    
    The syntax to find all the data that is not equal to a value looks like this
    
    loc[Data Set["Name of the collumn that has the value you want to look for"] != "Value you want to Exclude"]
    
    Lets do an example

In [3]:
# Creating a variable named notna that will find all the data that does not have NA in the Purchased Product collumn

notna = customer.loc[customer["Purchased_Product"]!= "NA"]

# Display the data

notna

Unnamed: 0,id,name,age,Product_ID,Purchased_Product,City
0,1,Jess,20,101,Watch,Santa Cruz
2,3,Pete,15,106,Oil,San Deigo
4,5,Heather,30,103,Shoes,San Fransico
5,6,Arlene,65,104,Smartphone,Sacramento
8,9,Jeremy,23,107,Laptop,Sacramento


#  What Do We See?

    All of the customers that had NA under Purchased_Product are now gone. 

# For You to Try 1

    Using the examples we have learned above filter our customer data set to show the customer older than 20 (>) 
    The answer will be below.

In [4]:
# Place for you to practice



# How Do We Filter on Multiple Values?

    We now know how to use loc to return all the data that meets a criteria based on a value. What if we
    want to return data if it equals one of a list of values?
    
    Columns in pandas have an option called isin. It compares all of the values in that collumn against a list of values. This
    is useful when you need to find data where a value of a collumn is one of a collection. For example
    How would you get the data of customers from Santa Cruz and Sacramento?

# Using isin

    How to do it
    
    Create a variable which contains all the values you want to look for it should look like this
        
        variable name = ["value1", "value2", "value3"]
        
    Then use the is in option on the column you want to filter on
    
    loc[Data Set["Name of the collumn that has the value you want to look for"].isin(variable containing your list)]
    
    lets use it in an example

In [5]:
# create a variable called cities which contains two cities Santa Cruz and Sacramento

cities = ["Sacramento", "Santa Cruz"]

# Filter our customer data set looking only for customers from these cities. We will store this in a variable named
# fromcities

fromcities = customer.loc[customer["City"].isin(cities)]

# Show the data

fromcities

Unnamed: 0,id,name,age,Product_ID,Purchased_Product,City
0,1,Jess,20,101,Watch,Santa Cruz
1,2,Tanya,25,0,,Sacramento
5,6,Arlene,65,104,Smartphone,Sacramento
7,8,Daniel,18,0,,Santa Cruz
8,9,Jeremy,23,107,Laptop,Sacramento


# What Do We See?

    Our customer data set has been filtered to just show people from Sacramento or Santa Cruz.

# How Do We Filter Using Multiple Criteria?

    We now know how to filter based on one column for either one value or multiple values. What if we wanted to search
    based on if one column has a value as well as another collumn has a value. For example show the customers who bought
    a watch from Santa Cruz?
    
   
    
    

# For You to Try 2

    Use the examples above filter our customer data set to show people who bought a watch or a smartphone.
    The answer will be below

In [6]:
# Place for you to practice



# Using &

    You can add conditions by using the & operator which means and. This allows you combine conditions. The syntax looks
    like this.
    
    How to do it
    
    loc[(Data Set["Name of the collumn that has the value you want to look for"] condition 1) & (Data Set["Name of the collumn     that has the value you also want to look for"] condition 2)]
    
    
    lets use it in an example so it will make more sense

In [7]:
# Creating a variable named laptopbuyersfromsc that will find all the data where a customer bought a watch
# and has the city of Santa Cruz
laptopbuyersfromsc = customer.loc[(customer["Purchased_Product"]== "Watch") & (customer["City"]== "Santa Cruz")]

# Display the data

laptopbuyersfromsc

Unnamed: 0,id,name,age,Product_ID,Purchased_Product,City
0,1,Jess,20,101,Watch,Santa Cruz


# What Do We See?

    We can see our query worked and we just got one result. Each condition is wrapped in () and then the & says run 
    the first condition (give me all the customers who bought a watch) and then take that data and filter it again
    to just return the customers whose city is Santa Cruz

# For You to Try 3

    Use the examples above filter our customer data set to show people who bought a Smartphone and are also from Sacramento.
    The answer will be below

In [8]:
# Place for you to practice

# Where Can I Practice?

An exerise using loc can be found 04-Pandas\Day_2\Activities\01-Ins_LocAndIloc.

I could not find a class example using isin

# For You to Try 1 Answer

In [9]:
# Creating a variable named over20 that will find all the data for customers whose age is greater than twenty

over20 = customer.loc[customer["age"]> 20]

# Display the data

over20

Unnamed: 0,id,name,age,Product_ID,Purchased_Product,City
1,2,Tanya,25,0,,Sacramento
4,5,Heather,30,103,Shoes,San Fransico
5,6,Arlene,65,104,Smartphone,Sacramento
6,7,Benny,35,0,,Los Angeles
8,9,Jeremy,23,107,Laptop,Sacramento


# For You to Try 2 Answer

In [10]:
# create a variable called products which contains two products Smartphone and Watch

products = ["Smartphone", "Watch"]

# Filter our customer data set looking only for customers who bought these products. We will store this in a variable named
# boughtproducts

boughtproducts = customer.loc[customer["Purchased_Product"].isin(products)]

# Show the data

boughtproducts

Unnamed: 0,id,name,age,Product_ID,Purchased_Product,City
0,1,Jess,20,101,Watch,Santa Cruz
5,6,Arlene,65,104,Smartphone,Sacramento


# For You to Try 3 Answer

In [11]:
# create a variable called scphonecustomers which contains data from our customer data set where the Purchased Product
# is Smartphone and the City is Sacramento

scphonecustomers = customer.loc[(customer["Purchased_Product"]== "Smartphone") & (customer["City"]== "Sacramento")]

# Show the data

scphonecustomers

Unnamed: 0,id,name,age,Product_ID,Purchased_Product,City
5,6,Arlene,65,104,Smartphone,Sacramento
