# Filtering and Finding Data
Once you have a dataset in Pandas, you can use Pandas to quickly and easily find, filter and summarize information (frequency, distribution, etc.). 

### We'll load in the Iris dataset again: 

In [3]:
import pandas as pd

# url to get file from
url = "http://mlr.cs.umass.edu/ml/machine-learning-databases/iris/iris.data"

# read the file into a dataframe, notice you can update the columnn names here as well
iris = pd.read_csv(url, 
                   header=None, 
                   names=['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal Width', 'Class'])

In [4]:
iris.head()

Unnamed: 0,Sepal Length,Sepal Width,Petal Length,Petal Width,Class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


## Select Rows Based on a Column Value
If we need to find the average petal width of all Iris-setosa, we first need to subset our dataset to include only the Iris-setosa Iris class. 

We can this as follows: 
1. Tell python we want to pull something out of our dataframe using standard python syntax:`iris[]`
2. Inside of the brackets we tell python what column we want to search in: `iris.Class`
3. Then we give python the search parameters: `=="Iris-setosa"`

Putting this all together: 

In [5]:
iris_setosa_df = iris[iris.Class == "Iris-setosa"]

In [10]:
iris_setosa_df["Class"].unique()

array(['Iris-setosa'], dtype=object)

## Select Rows Based on Two Column Values
What if we need to count all Iris-setosas that have a petal width greater than .3? We can use python logical operators to chain multiple filters together, but we must place parenthesis arounde each filter: 

In [13]:
iris[(iris.Class == "Iris-setosa") & (iris["Petal Width"] > .3)] # notice how I dealt with the space in colname

Unnamed: 0,Sepal Length,Sepal Width,Petal Length,Petal Width,Class
5,5.4,3.9,1.7,0.4,Iris-setosa
15,5.7,4.4,1.5,0.4,Iris-setosa
16,5.4,3.9,1.3,0.4,Iris-setosa
21,5.1,3.7,1.5,0.4,Iris-setosa
23,5.1,3.3,1.7,0.5,Iris-setosa
26,5.0,3.4,1.6,0.4,Iris-setosa
31,5.4,3.4,1.5,0.4,Iris-setosa
43,5.0,3.5,1.6,0.6,Iris-setosa
44,5.1,3.8,1.9,0.4,Iris-setosa


And how do we count these values? 

In [17]:
len(iris[(iris.Class == "Iris-setosa") & (iris["Petal Width"] > .3)])

9

Yep, that's right, we can just toss it in a length function. 

## Select rows where a value does not equal something
Remember when we talkeda about python operaters? We can use those to help us filter Pandas dataframes: 
- ">" 
- ">=" 
- "<"
- "<="
- "== "
- "!="

What if we want to find all rows where the flower class does not equal Iris Setosa? 

In [20]:
iris[(iris.Class != "Iris-setosa")].head()

Unnamed: 0,Sepal Length,Sepal Width,Petal Length,Petal Width,Class
50,7.0,3.2,4.7,1.4,Iris-versicolor
51,6.4,3.2,4.5,1.5,Iris-versicolor
52,6.9,3.1,4.9,1.5,Iris-versicolor
53,5.5,2.3,4.0,1.3,Iris-versicolor
54,6.5,2.8,4.6,1.5,Iris-versicolor


## Select Rows based on a list of values
If we want to find all rows where a column matches any certain number of values, we can check to see if each row "is in" a list of values: 

In [22]:
val_list = ["Iris-versicolor", "Iris-setosa"]
iris.Class.isin(val_list)

0       True
1       True
2       True
3       True
4       True
5       True
6       True
7       True
8       True
9       True
10      True
11      True
12      True
13      True
14      True
15      True
16      True
17      True
18      True
19      True
20      True
21      True
22      True
23      True
24      True
25      True
26      True
27      True
28      True
29      True
       ...  
120    False
121    False
122    False
123    False
124    False
125    False
126    False
127    False
128    False
129    False
130    False
131    False
132    False
133    False
134    False
135    False
136    False
137    False
138    False
139    False
140    False
141    False
142    False
143    False
144    False
145    False
146    False
147    False
148    False
149    False
Name: Class, Length: 150, dtype: bool

Now that we know how to check, we can use this information to return all rows where the condition is true: 

In [23]:
iris[iris.Class.isin(val_list)]

Unnamed: 0,Sepal Length,Sepal Width,Petal Length,Petal Width,Class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa
9,4.9,3.1,1.5,0.1,Iris-setosa


## Find rows where a column value is not in a list
We can use another python operater, ~ (tilde) to say "do the opposite of this": 

In [26]:
iris[~iris.Class.isin(val_list)].head()

Unnamed: 0,Sepal Length,Sepal Width,Petal Length,Petal Width,Class
100,6.3,3.3,6.0,2.5,Iris-virginica
101,5.8,2.7,5.1,1.9,Iris-virginica
102,7.1,3.0,5.9,2.1,Iris-virginica
103,6.3,2.9,5.6,1.8,Iris-virginica
104,6.5,3.0,5.8,2.2,Iris-virginica


You can mix and match all of the above to filter and subset a dataframe based on multiple conditions. 