# Filtering a Query with WHERE
Sometimes, you’ll want to only check the rows returned by a query, where one or more columns meet certain criteria. This can be done with a WHERE statement. The WHERE clause is an optional clause of the SELECT statement. It appears after the FROM clause as the following statement:

- SELECT column_list FROM table_name WHERE search_condition;

In [1]:
import pandas as pd
import mysql.connector as sql
import os

In [3]:
connection = sql.connect(
    host = os.environ.get('mysql_host'),
    user = os.environ.get('mysql_user'),
    password = os.environ.get('mysql_password')
)

cursor = connection.cursor()


If you do not remember the tables in the world data, you can always use the follow command to query.

In [6]:
pd.read_sql_query("""
    SHOW TABLES
    FROM world""",
    connection)

Unnamed: 0,Tables_in_world
0,city
1,country
2,countrylanguage


### 1. Retrieving data with WHERE
Take the table of city as an example.

2.1 Check the table columns firstly.

In [9]:
pd.read_sql_query("""
    DESCRIBE world.city
    """,
    connection)

Unnamed: 0,Field,Type,Null,Key,Default,Extra
0,ID,b'int',NO,PRI,,auto_increment
1,Name,b'char(35)',NO,,b'',
2,CountryCode,b'char(3)',NO,MUL,b'',
3,District,b'char(20)',NO,,b'',
4,Population,b'int',NO,,b'0',


2.2 Check the number of rows.

In [11]:
pd.read_sql_query("""
    SELECT COUNT(*) AS nrow
    FROM world.city""",
    connection)

Unnamed: 0,nrow
0,4079


2.3 Use WHERE to retrieve data

Let’s say we are interested in records for only the year 1981. Using a WHERE is pretty straightforward for a simple criterion like this: I want to see the population of New York city.

In [14]:
pd.read_sql_query("""
    SELECT *
    FROM world.city
    WHERE Name='New York'
    """,
    connection)

Unnamed: 0,ID,Name,CountryCode,District,Population
0,3793,New York,USA,New York,8008278


2.4 Use AND to further filter data

In [19]:
pd.read_sql_query("""
    SELECT Name
    FROM world.city
    WHERE CountryCode='USA' AND Population > 5000000""",
    connection)

Unnamed: 0,Name
0,New York


2.5 More combinations of filters
We also can further filter data with the operators of != or <> to get data except USA.

In [20]:
pd.read_sql_query("""
    SELECT COUNT(*)
    FROM world.city
    WHERE CountryCode!='USA' AND Population > 5000000""",
    connection)

Unnamed: 0,COUNT(*)
0,23


We can further filter the data to spefic months using OR statement. For example, we'd like check the data in the months of 3, 6 and 9. However, we have to use () to make them as one condition. It is a trick. You can try!

Selecting USA and Chinese cities with more than 5M of inhabitants:

In [27]:
pd.read_sql_query("""
    SELECT *
    FROM world.city
    WHERE Population > 5000000 AND (CountryCode='USA' OR CountryCode='CHN')""",
    connection)

Unnamed: 0,ID,Name,CountryCode,District,Population
0,1890,Shanghai,CHN,Shanghai,9696300
1,1891,Peking,CHN,Peking,7472000
2,1892,Chongqing,CHN,Chongqing,6351600
3,1893,Tianjin,CHN,Tianjin,5286800
4,3793,New York,USA,New York,8008278


Or we can simplify the above filter using the IN statement.

In [28]:
pd.read_sql_query("""
    SELECT *
    FROM world.city
    WHERE Population > 5000000 AND CountryCode IN ('USA', 'CHN')""",
    connection)

Unnamed: 0,ID,Name,CountryCode,District,Population
0,1890,Shanghai,CHN,Shanghai,9696300
1,1891,Peking,CHN,Peking,7472000
2,1892,Chongqing,CHN,Chongqing,6351600
3,1893,Tianjin,CHN,Tianjin,5286800
4,3793,New York,USA,New York,8008278


Or the cities with 5M+ inhabitants NOT IN USA or China

In [29]:
pd.read_sql_query("""
    SELECT *
    FROM world.city
    WHERE Population > 5000000 AND CountryCode NOT IN ('USA', 'CHN')""",
    connection)

Unnamed: 0,ID,Name,CountryCode,District,Population
0,206,SÃ£o Paulo,BRA,SÃ£o Paulo,9968485
1,207,Rio de Janeiro,BRA,Rio de Janeiro,5598953
2,456,London,GBR,England,7285000
3,608,Cairo,EGY,Kairo,6789479
4,939,Jakarta,IDN,Jakarta Raya,9604900
5,1024,Mumbai (Bombay),IND,Maharashtra,10500000
6,1025,Delhi,IND,Delhi,7206704
7,1380,Teheran,IRN,Teheran,6758845
8,1532,Tokyo,JPN,Tokyo-to,7980230
9,2257,SantafÃ© de BogotÃ¡,COL,SantafÃ© de BogotÃ¡,6260862


You can filter with math operators too!

### Summary

Summary
In the WHERE statement, we can the combinations of NOT, IN, <>, !=, >=, >, <, <=, AND, OR, () and even some of math operators (such as %, *, /, +, -)to retrieve the data we want easily and efficiently.

# References
- [Chonghua Yin notebook](https://github.com/royalosyin/Practice-SQL-with-SQLite-and-Jupyter-Notebook/blob/master/ex05-Filtering%20a%20Query%20with%20WHERE.ipynb)