### Common Filter Operators

Next, we look at the final topics on our list of filters, namely

* Is Null/is not null
* And/or


In [4]:
import pandas as pd
from dfply import *
import seaborn as sns
%matplotlib inline

## Set up

Let's read in a data set 

In [5]:
from more_dfply import fix_names
heroes_raw = pd.read_csv('./data/heroes_information.csv', na_values=['-', '-99.0', '', 'None'])
heroes = (heroes_raw >> fix_names)
heroes.head()

Unnamed: 0,Unnamed_0,name,Gender,Eye_color,Race,Hair_color,Height,Publisher,Skin_color,Alignment,Weight
0,0,A-Bomb,Male,yellow,Human,No Hair,203.0,Marvel Comics,,good,441.0
1,1,Abe Sapien,Male,blue,Icthyo Sapien,No Hair,191.0,Dark Horse Comics,blue,good,65.0
2,2,Abin Sur,Male,blue,Ungaran,No Hair,185.0,DC Comics,red,good,90.0
3,3,Abomination,Male,green,Human / Radiation,No Hair,203.0,Marvel Comics,,bad,441.0
4,4,Abraxas,Male,blue,Cosmic Entity,Black,,Marvel Comics,,bad,


## Checking for `IS NULL` and `IS NOT NULL`

`SQL` has `IS NULL` and `IS NOT NULL`, which are used to check for missing values.

## Using  `IS NULL`/`IS NOT NULL` in `pandas +dfply`

`pandas` and `pyspark` use the column `isnull`/`isNull` method

#### `IS NULL`

In [6]:
(heroes
 >> filter_by(X.Skin_color.isnull())
 >> head(2))

Unnamed: 0,Unnamed_0,name,Gender,Eye_color,Race,Hair_color,Height,Publisher,Skin_color,Alignment,Weight
0,0,A-Bomb,Male,yellow,Human,No Hair,203.0,Marvel Comics,,good,441.0
3,3,Abomination,Male,green,Human / Radiation,No Hair,203.0,Marvel Comics,,bad,441.0


#### `IS NOT NULL` 

In [8]:
(heroes
 >> filter_by(~X.Skin_color.isnull())
 >> head(2))

Unnamed: 0,Unnamed_0,name,Gender,Eye_color,Race,Hair_color,Height,Publisher,Skin_color,Alignment,Weight
1,1,Abe Sapien,Male,blue,Icthyo Sapien,No Hair,191.0,Dark Horse Comics,blue,good,65.0
2,2,Abin Sur,Male,blue,Ungaran,No Hair,185.0,DC Comics,red,good,90.0


## Combining expressions with `AND` and `OR`

`SQL` has `AND` and `OR`, which are used to check for missing values.

## Using  `AND`/`OR` in each platform  `pandas` and `pyspark` 

 `pandas` and `pyspark` use `&` and `|` to combine column expressions.
 

In [9]:
(heroes
 >> filter_by((X.Hair_color == 'No Hair') & (X.Eye_color == 'blue'))
 >> head(2))

Unnamed: 0,Unnamed_0,name,Gender,Eye_color,Race,Hair_color,Height,Publisher,Skin_color,Alignment,Weight
1,1,Abe Sapien,Male,blue,Icthyo Sapien,No Hair,191.0,Dark Horse Comics,blue,good,65.0
2,2,Abin Sur,Male,blue,Ungaran,No Hair,185.0,DC Comics,red,good,90.0


In [10]:
(heroes
 >> filter_by((X.Hair_color == 'No Hair') | (X.Eye_color == 'blue'))
 >> head(2))

Unnamed: 0,Unnamed_0,name,Gender,Eye_color,Race,Hair_color,Height,Publisher,Skin_color,Alignment,Weight
0,0,A-Bomb,Male,yellow,Human,No Hair,203.0,Marvel Comics,,good,441.0
1,1,Abe Sapien,Male,blue,Icthyo Sapien,No Hair,191.0,Dark Horse Comics,blue,good,65.0


#### Important NOTE
You need to break the Python habit of using `and` and `or` (hard to do)

In [11]:
(heroes
 >> filter_by((X.Hair_color == 'No Hair') and (X.Eye_color == 'blue'))
 >> head(2))

TypeError: __index__ returned non-int (type Intention)

## <font color="red"> Exercise 3.3.1 - The Super Hero Dating Game - Part 3</font>

Yesterday, you notice one more singles add in the local paper, which read

> W4A (Woman for Androgynous) looking for super hero.  Must be either God/Eternal/Cosmic Entity; or have no body hair.  Bad heroes need not reply.

Write a query in all three frameworks to help find candidates for this personal add.  You should complete each query with **exactly one filter_by/where**.

In [None]:
# Your dfply solution here