In [1]:
import sqlite3
import pandas as pd

In [2]:
pd.options.display.max_columns = None
#pd.options.display.max_rows = None

In [3]:
conn = sqlite3.connect('parch-and-posey.db')

In [4]:
cursor = conn.cursor()
cursor.execute('''
select * from sqlite_master where type = "table";
''')
columns = [col[0] for col in cursor.description]
data = cursor.fetchall()
cursor.close()

In [5]:
pd.DataFrame(data, columns=columns)

Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,web_events,web_events,2,"CREATE TABLE web_events (\tid integer,\taccoun..."
1,table,sales_reps,sales_reps,7,"CREATE TABLE sales_reps (\tid integer,\tname b..."
2,table,region,region,222,"CREATE TABLE region (\tid integer,\tname bpchar)"
3,table,orders,orders,223,"CREATE TABLE orders (\tid integer,\taccount_id..."
4,table,accounts,accounts,583,"CREATE TABLE accounts (\tid integer,\tname bpc..."


# CASE - Expert Tip

The CASE statement always goes in the SELECT clause.

CASE must include the following components: WHEN, THEN, and END. ELSE is an optional component to catch cases that didn’t meet any of the other previous CASE conditions.

You can make any conditional statement using any conditional operator (like WHERE) between WHEN and THEN. This includes stringing together multiple conditional statements using AND and OR.

You can include multiple WHEN statements, as well as an ELSE statement again, to deal with any unaddressed conditions.

In [6]:
pd.read_sql_query(sql='''
SELECT id, account_id, occurred_at, channel,
CASE WHEN channel = 'facebook' THEN 'yes' END AS is_facebook
FROM web_events;
''', con=conn)

Unnamed: 0,id,account_id,occurred_at,channel,is_facebook
0,1,1001,2015-10-06 17:13:58,direct,
1,2,1001,2015-11-05 03:08:26,direct,
2,3,1001,2015-12-04 03:57:24,direct,
3,4,1001,2016-01-02 00:55:03,direct,
4,5,1001,2016-02-01 19:02:33,direct,
...,...,...,...,...,...
9068,9069,4491,2016-10-04 15:43:29,facebook,yes
9069,9070,4491,2016-10-04 23:42:41,twitter,
9070,9071,4491,2016-11-06 07:23:45,organic,
9071,9072,4491,2016-12-18 03:21:31,organic,


In [7]:
pd.read_sql_query(sql='''
SELECT id, account_id, occurred_at, channel,
CASE WHEN channel = 'facebook' THEN 'yes' ELSE 'no' END AS is_facebook
FROM web_events;
''', con=conn)

Unnamed: 0,id,account_id,occurred_at,channel,is_facebook
0,1,1001,2015-10-06 17:13:58,direct,no
1,2,1001,2015-11-05 03:08:26,direct,no
2,3,1001,2015-12-04 03:57:24,direct,no
3,4,1001,2016-01-02 00:55:03,direct,no
4,5,1001,2016-02-01 19:02:33,direct,no
...,...,...,...,...,...
9068,9069,4491,2016-10-04 15:43:29,facebook,yes
9069,9070,4491,2016-10-04 23:42:41,twitter,no
9070,9071,4491,2016-11-06 07:23:45,organic,no
9071,9072,4491,2016-12-18 03:21:31,organic,no


# Example

In a quiz question in the previous Basic SQL lesson, you saw this question:

Create a column that divides the standard_amt_usd by the standard_qty to find the unit price for standard paper for each order. Limit the results to the first 10 orders, and include the id and account_id fields. NOTE - you will be thrown an error with the correct solution to this question. This is for a division by zero. You will learn how to get a solution without an error to this query when you learn about CASE statements in a later section.

Let's see how we can use the CASE statement to get around this error.

In [8]:
pd.read_sql_query(sql='''
SELECT id, account_id, standard_amt_usd/standard_qty AS unit_price
FROM orders
LIMIT 10;
''', con=conn)

Unnamed: 0,id,account_id,unit_price
0,1,1001,4.99
1,2,1001,4.99
2,3,1001,4.99
3,4,1001,4.99
4,5,1001,4.99
5,6,1001,4.99
6,7,1001,4.99
7,8,1001,4.99
8,9,1001,4.99
9,10,1001,4.99


Now, let's use a CASE statement. This way any time the standard_qty is zero, we will return 0, and otherwise we will return the unit_price.

In [9]:
pd.read_sql_query(sql='''
SELECT account_id, CASE WHEN standard_qty = 0 OR standard_qty IS NULL THEN 0
                        ELSE standard_amt_usd/standard_qty END AS unit_price
FROM orders
LIMIT 10;
''', con=conn)

Unnamed: 0,account_id,unit_price
0,1001,4.99
1,1001,4.99
2,1001,4.99
3,1001,4.99
4,1001,4.99
5,1001,4.99
6,1001,4.99
7,1001,4.99
8,1001,4.99
9,1001,4.99


Now the first part of the statement will catch any of those division by zero values that were causing the error, and the other components will compute the division as necessary. You will notice, we essentially charge all of our accounts 4.99 for standard paper. It makes sense this doesn't fluctuate, and it is more accurate than adding 1 in the denominator like our quick fix might have been in the earlier lesson.

In [10]:
pd.read_sql_query(sql='''
SELECT CASE WHEN total > 500 THEN 'Over 500'
                        ELSE '500 or under' END AS total_group, COUNT (*) AS order_count
FROM orders
GROUP BY 1;
''', con=conn)

Unnamed: 0,total_group,order_count
0,500 or under,3716
1,Over 500,3196


This one is pretty tricky. Try running the query yourself to make sure you understand what is happening. The next concept will give you some practice writing CASE statements on your own. In this video, we showed that getting the same information using a WHERE clause means only being able to get one set of data from the CASE at a time.

There are some advantages to separating data into separate columns like this depending on what you want to do, but often this level of separation might be easier to do in another programming language - rather than with SQL.