# Case When

### Introduction

In this lesson, we'll see another way to convert data and that is with the CASE WHEN command.  Let's get started.

### Loading our Data

For this lesson, let's use data from yelp on restaurants in NYC.

In [None]:
import sqlite3
conn = sqlite3.connect('yelp.db')

In [None]:
import pandas as pd
url = "https://raw.githubusercontent.com/data-eng-10-21/case-when/main/lunches.csv"
df = pd.read_csv(url, index_col = 0)
df[:1]

Unnamed: 0,Name,Address,City,Category,Rating,URL
0,Rambling House,4292 Katonah Ave,Bronx,Pubs,4.0,http://www.yelp.com/biz/rambling-house-bronx


In [None]:
df.to_sql('restaurants', conn, index = False, if_exists = 'replace')

### Coercing Data

Let's get started by looking at some of our data.

In [None]:
query = """
SELECT *
FROM restaurants LIMIT 5;
"""

pd.read_sql(query, conn)

Unnamed: 0,Name,Address,City,Category,Rating,URL
0,Rambling House,4292 Katonah Ave,Bronx,Pubs,4.0,http://www.yelp.com/biz/rambling-house-bronx
1,Curry Spot,4268 Katonah Ave,Bronx,Indian,4.0,http://www.yelp.com/biz/curry-spot-bronx
2,Eileens Country Kitchen,964 McLean Ave,Yonkers,American (Traditional),3.5,http://www.yelp.com/biz/eileens-country-kitche...
3,Ali's Roti Shop,4220 White Plains Rd,Bronx,Trinidadian,4.0,http://www.yelp.com/biz/alis-roti-shop-bronx
4,HIM Ital Health Food Market,4374b White Plains Rd,Bronx,Health Markets,4.5,http://www.yelp.com/biz/him-ital-health-food-m...


We can see that one of the first categories is `American (Traditional)`.  Let's use a CASE when statement to change `American (Traditional)` to just American.

In [None]:
query = """
SELECT 
CASE WHEN category = 'American (Traditional)'
THEN 'American'
ELSE category
END as category
FROM restaurants 
LIMIT 5
"""

pd.read_sql(query, conn)

Unnamed: 0,category
0,Pubs
1,Indian
2,American
3,Trinidadian
4,Health Markets


Our case when statement above looks like the following:

```SQL
CASE WHEN category = 'American (Traditional)'
THEN 'American'
ELSE category
END as category
```

So above, the `CASE WHEN ... = ` essentially acts as our `if` statement.  And we are saying that when the category is `American (Traditional)` change it to just be `American` otherwise, keep the `category` value.  

Finally, we end the case when statement with the ELSE value and give the resulting column an alias of category. 

Notice that if we do not provide an ELSE value, SQL will set this value to null.

In [None]:
query = """
SELECT 
CASE WHEN category = 'American (Traditional)'
THEN 'American'
END as category
FROM restaurants 
LIMIT 3
"""

pd.read_sql(query, conn)

Unnamed: 0,category
0,
1,
2,American


### Case When in Aggregates

So above, we saw how we can use case when to coerce our data.  It turns out, it is also common to use case when to perform certain calculations with SQL.  

For example, let's say that we want to calculate the number of Chinese restaurants in each borough.  We can do this with the following.

In [None]:
query = """
SELECT City, 
COUNT(CASE WHEN Category = 'Chinese' THEN 1 END) as num_chinese_restaurants 
FROM restaurants 
GROUP BY City 
ORDER BY num_chinese_restaurants DESC
LIMIT 2
"""

pd.read_sql(query, conn)

Unnamed: 0,City,num_chinese_restaurants
0,Staten Island,73
1,Brooklyn,68


Let's focus on our CASE WHEN statement.  

```sql
COUNT(CASE WHEN Category = 'Chinese' THEN 1 END) as num_chinese_restaurants 
```

The above, first replaces any occurrences of `'Chinese'` with the number 1, and any other category with a null value - as there is no ELSE statement.  Then because we have a 1 when we have a matching category and a null otherwise, we are only counting the matching categories.

Note that we can use this technique to calculate across multiple categories.

In [None]:
query = """
SELECT City, 
COUNT(CASE WHEN Category = 'Chinese' THEN 1 END) as num_chinese_restaurants,
COUNT(CASE WHEN Category = 'Italian' THEN 1 END) as num_italian_restaurants 
FROM restaurants 
GROUP BY City 
ORDER BY num_chinese_restaurants DESC
LIMIT 2
"""

pd.read_sql(query, conn)

Unnamed: 0,City,num_chinese_restaurants,num_italian_restaurants
0,Staten Island,73,121
1,Brooklyn,68,56


And we can use case when for more than just counting.  For example, let's use case when to find the average amount of rating of chinese restaurants in each neighborhood.  We can do so with the following:

In [None]:
query = """
SELECT City, 
SUM(CASE WHEN Category = 'Chinese' THEN rating ELSE 0 END)/
COUNT(CASE WHEN Category = 'Chinese' THEN 1 END) as avg_ratings
FROM restaurants 
GROUP BY City 
ORDER BY avg_ratings DESC
LIMIT 2
"""

pd.read_sql(query, conn)

Unnamed: 0,City,avg_ratings
0,New York,4.230769
1,Woodside,4.0


Take a moment to try to understand the above.  In the first part, we add up the reviews of all chinese restaurants by neighborhood, and then in the second part we divide by the total number of chinese restaurants.

```sql
SUM(CASE WHEN Category = 'Chinese' THEN rating ELSE 0 END)/
COUNT(CASE WHEN Category = 'Chinese' THEN 1 END) as avg_ratings
```

### Summary 

In this lesson, we practiced using the CASE WHEN command.  We saw that two use cases for case when.  The first is to coerce our data.   
```sql
SELECT 
CASE WHEN category = 'American (Traditional)'
THEN 'American'
ELSE catgeory
END as category
```

So the above coerces the `American (Traditional)` values into `American`. 

The second usecase is with aggregates to only count or add when certain conditions are met.  We were able to do this to count the number of rows that met a certain condition.  

```sql
COUNT(CASE WHEN Category = 'Chinese' THEN 1 END) as num_chinese_restaurants 
```

So the above sets rows with category of `Chinese` to `1` and otherwise sets the value to null.  And then count only counts the non-null values, thus only counting up restaurants with a category of `Chinese`. 

### Resources

[Case When dataschool](https://dataschool.com/how-to-teach-people-sql/how-case-when-works/)