Copyright Jana Schaich Borg/Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

# MySQL Exercise 10: Useful Logical Operators



## 1. IF expressions

IF expressions are used to return one of two results based on whether inputs to the expressions meet the conditions you specify.  ld then use a GROUP BY statement to count the number of unique early or late users:

```sql 
SELECT IF(cleaned_users.first_account<'2014-06-01','early_user','late_user') AS user_type,
       COUNT(cleaned_users.first_account)
FROM (SELECT user_guid, MIN(created_at) AS first_account 
      FROM users
      GROUP BY user_guid) AS cleaned_users
GROUP BY user_type
```

**Try it yourself:**

In [None]:
%%sql
SELECT IF(cleaned_users.first_account<'2014-06-01','early_user','late_user') AS user_type,
       COUNT(cleaned_users.first_account)
FROM (SELECT user_guid, MIN(created_at) AS first_account 
      FROM users
      GROUP BY user_guid) AS cleaned_users
GROUP BY user_type

**Question 1: Write a query that will output distinct user_guids and their associated country of residence from the users table, excluding any user_guids or countries that have NULL values.  You should get 16,261 rows in your result.**

In [None]:
%%sql
SELECT DISTINCT user_guid, country
FROM users
WHERE user_guid is not NULL and country is not NULL;

**Question 2: Use an IF expression and the query you wrote in Question 1 as a subquery to determine the number of unique user_guids who reside in the United States (abbreviated "US") and outside of the US.**

In [None]:
%%sql
SELECT IF(user_country.country ='US', 'United States', 'Outside US') AS country_type, COUNT(user_country.user_guid)
FROM (SELECT DISTINCT user_guid, country
      FROM users
      WHERE user_guid is not NULL and country is not NULL) AS user_country
GROUP BY country_type;

Single IF expressions can only result in one of two specified outputs, but multiple IF expressions can be nested to result in more than two possible outputs.  

The full query to output the number of unique users in each of the three groups would be:

```sql 
SELECT IF(cleaned_users.country='US','In US', 
          IF(cleaned_users.country='N/A','Not Applicable','Outside US')) AS US_user, 
      count(cleaned_users.user_guid)   
FROM (SELECT DISTINCT user_guid, country 
      FROM users
      WHERE country IS NOT NULL) AS cleaned_users
GROUP BY US_user
```

**Try it yourself. You should get 5,642 unique user_guids in the "Not Applicable" category, and 1,263 users in the "Outside US" category.**

In [None]:
%%sql
SELECT IF(cleaned_users.country='US','In US', 
          IF(cleaned_users.country='N/A','Not Applicable','Outside US')) AS US_user, 
       count(cleaned_users.user_guid)   
FROM (SELECT DISTINCT user_guid, country 
      FROM users
      WHERE country IS NOT NULL) AS cleaned_users
GROUP BY US_user;

 
## 2. CASE expressions

If you need to manipulate values in a current column of your data, you would use this syntax:

<img src="https://duke.box.com/shared/static/bvyvscvvg9d1rjnov340gqyu85mhch9i.jpg" width=600 alt="CASE_Expression" />

Using this syntax, our nested IF statement from above could be written as:

```sql
SELECT CASE WHEN cleaned_users.country="US" THEN "In US"
            WHEN cleaned_users.country="N/A" THEN "Not Applicable"
            ELSE "Outside US"
            END AS US_user, 
      count(cleaned_users.user_guid)   
FROM (SELECT DISTINCT user_guid, country 
      FROM users
      WHERE country IS NOT NULL) AS cleaned_users
GROUP BY US_user
```

**Go ahead and try it:**

In [None]:
%%sql
SELECT CASE WHEN cleaned_users.country="US" THEN "In US"
            WHEN cleaned_users.country="N/A" THEN "Not Applicable"
            ELSE "Outside US"
            END AS US_user, 
        count(cleaned_users.user_guid)   
FROM (SELECT DISTINCT user_guid, country 
      FROM users
      WHERE country IS NOT NULL) AS cleaned_users
GROUP BY US_user;


Since our query does not require manipulation of any of the values in the country column, though, we could also take advantage of this syntax, which is slightly more compact:

<img src="https://duke.box.com/shared/static/z9fezozm55wj5pz6slxscouxrcpq7bpz.jpg" width=600 alt="CASE_Value" />

Our query written in this syntax would look like this:

```sql
SELECT CASE cleaned_users.country
            WHEN "US" THEN "In US"
            WHEN "N/A" THEN "Not Applicable"
            ELSE "Outside US"
            END AS US_user, 
      count(cleaned_users.user_guid)   
FROM (SELECT DISTINCT user_guid, country 
      FROM users
      WHERE country IS NOT NULL) AS cleaned_users
GROUP BY US_user
```

**Try this query as well:**


In [None]:
%%sql
SELECT CASE cleaned_users.country
            WHEN "US" THEN "In US"
            WHEN "N/A" THEN "Not Applicable"
            ELSE "Outside US"
            END AS US_user, 
        count(cleaned_users.user_guid)   
FROM (SELECT DISTINCT user_guid, country 
      FROM users
      WHERE country IS NOT NULL) AS cleaned_users
GROUP BY US_user;


**Question 3: Write a query using a CASE statement that outputs 3 columns: dog_guid, dog_fixed, and a third column that reads "neutered" every time there is a 1 in the "dog_fixed" column of dogs, "not neutered" every time there is a value of 0 in the "dog_fixed" column of dogs, and "NULL" every time there is a value of anything else in the "dog_fixed" column.  Limit your results for troubleshooting purposes.**


In [None]:
%%sql
SELECT dog_guid, dog_fixed,
        CASE dog_fixed
        WHEN '1' THEN 'neutered'
        WHEN '0' THEN 'not neutered'
        ELSE 'NULL' 
        END AS fixed
FROM dogs;

You can also use CASE statements to standardize or combine several values into one.  

**Question 4: We learned that NULL values should be treated the same as "0" values in the exclude columns of the dogs and users tables.  Write a query using a CASE statement that outputs 3 columns: dog_guid, exclude, and a third column that reads "exclude" every time there is a 1 in the "exclude" column of dogs and "keep" every time there is any other value in the exclude column. Limit your results for troubleshooting purposes.**



In [None]:
%%sql
SELECT dog_guid, exclude,
    CASE exclude
    WHEN '1' THEN 'exclude'
    ELSE 'keep'
    END AS 'Excluded?'
FROM dogs;

**Question 5: Re-write your query from Question 4 using an IF statement instead of a CASE statement.**

In [None]:
%%sql
SELECT dog_guid, exclude, IF(exclude = '1', 'exclude', 'keep') AS 'Excluded?'
FROM dogs;

Case expressions are also useful for breaking values in a column up into multiple groups that meet specific criteria or that have specific ranges of values.

**Question 6: Write a query that uses a CASE expression to output 3 columns: dog_guid, weight, and a third column that reads...     
"very small" when a dog's weight is 1-10 pounds     
"small" when a dog's weight is greater than 10 pounds to 30 pounds     
"medium" when a dog's weight is greater than 30 pounds to 50 pounds     
"large" when a dog's weight is greater than 50 pounds to 85 pounds     
"very large" when a dog's weight is greater than 85 pounds      
Limit your results for troubleshooting purposes.**

**Remember that when you use AND to define values between two boundaries, you need to include the variable name in all clauses that define the conditions of the values you want to extract.  In other words, you could use this combined clause in your query: 
“WHEN weight>10 AND weight<=30 THEN "small"
…but this combined clause would cause an error:
“WHEN weight>10 AND <=30 THEN "small"**

In [None]:
%%sql
SELECT dog_guid, weight, 
    CASE 
    WHEN weight <= 10 THEN 'very small'
    WHEN weight > 10 and weight <= 30 THEN 'small'
    WHEN weight > 30 and weight <= 50 THEN 'medium'
    WHEN weight > 50 and weight <= 85 THEN 'large'
    WHEN weight > 85 THEN 'very large'
    END AS size
FROM dogs;

## 3. Pay attention to the order of operations within logical expressions


**Question 7: How many distinct dog_guids are found in group 1 using this query?**
    
```sql
SELECT COUNT(DISTINCT dog_guid), 
CASE WHEN breed_group='Sporting' OR breed_group='Herding' AND exclude!='1' THEN "group 1"
     ELSE "everything else"
     END AS groups
FROM dogs
GROUP BY groups
```

In [None]:
%%sql
SELECT COUNT(DISTINCT dog_guid), 
CASE WHEN breed_group='Sporting' OR breed_group='Herding' AND exclude!='1' THEN "group 1"
     ELSE "everything else"
     END AS groups
FROM dogs
GROUP BY groups;

**Question 8: How many distinct dog_guids are found in group 1 using this query?**
    
```sql
SELECT COUNT(DISTINCT dog_guid), 
CASE WHEN exclude!='1' AND breed_group='Sporting' OR breed_group='Herding' THEN "group 1"
     ELSE "everything else"
     END AS group_name
FROM dogs
GROUP BY group_name
```


In [None]:
%%sql
SELECT COUNT(DISTINCT dog_guid), 
CASE WHEN exclude!='1' AND breed_group='Sporting' OR breed_group='Herding' THEN "group 1"
     ELSE "everything else"
     END AS group_name
FROM dogs
GROUP BY group_name;

**Question 9: How many distinct dog_guids are found in group 1 using this query?**

```sql
SELECT COUNT(DISTINCT dog_guid), 
CASE WHEN exclude!='1' AND (breed_group='Sporting' OR breed_group='Herding') THEN "group 1"
     ELSE "everything else"
     END AS group_name
FROM dogs
GROUP BY group_name
```

In [None]:
%%sql
SELECT COUNT(DISTINCT dog_guid), 
CASE WHEN exclude!='1' AND (breed_group='Sporting' OR breed_group='Herding') THEN "group 1"
     ELSE "everything else"
     END AS group_name
FROM dogs
GROUP BY group_name;


**Question 10: For each dog_guid, output its dog_guid, breed_type, number of completed tests, and use an IF statement to include an extra column that reads "Pure_Breed" whenever breed_type equals 'Pure Breed" and "Not_Pure_Breed" whenever breed_type equals anything else. LIMIT your output to 50 rows for troubleshooting.  HINT: you will need to use a join to complete this query.**

In [None]:
%%sql
SELECT DISTINCT d.dog_guid, d.breed_type, count(c.created_at) AS Tests_completed, 
    IF(breed_type = 'Pure Breed', 'Pure_Breed', 'Not_Pure_Breed') AS Pure_breed
FROM dogs d, complete_tests c
WHERE d.dog_guid = c.dog_guid    
GROUP BY d.dog_guid
LIMIT 50;

**Question 11: Write a query that uses a CASE statement to report the number of unique user_guids associated with customers who live in the United States and who are in the following groups of states:**

**Group 1: New York (abbreviated "NY") or New Jersey (abbreviated "NJ")    
Group 2: North Carolina (abbreviated "NC") or South Carolina (abbreviated "SC")    
Group 3: California (abbreviated "CA")    
Group 4: All other states with non-null values**

**You should find 898 unique user_guids in Group1.**



In [None]:
%%sql
SELECT  
    CASE 
    WHEN (state='NY' OR state='NJ') THEN 'Group 1 - NY,NJ'
    WHEN (state='NC' OR state='SC') THEN 'Group 2 - NC,SC'
    WHEN state='CA' THEN 'Group 3 - CA'
    ELSE 'Group 4 - All other' 
    END AS Location_group,
    count(DISTINCT user_guid)
FROM users
WHERE state is not NULL AND country='US'
GROUP BY Location_group;

**Question 12: Write a query that allows you to determine how many unique dog_guids are associated with dogs who are DNA tested and have either stargazer or socialite personality dimensions.  Your answer should be 70.**

In [None]:
%%sql
SELECT count(DISTINCT dog_guid) AS DogID
FROM dogs
WHERE dna_tested = 1 AND (dimension = 'stargazer' OR dimension = 'socialite');

**Feel free to practice any other queries you like here!**

In [None]:
%%sql
SELECT count(DISTINCT user_guid), rating,
    CASE
    WHEN rating<7 THEN 'Detractor'
    WHEN (rating=7 OR rating=8) THEN 'Passive'
    WHEN rating>8 THEN 'Promoter'
    END AS nps
FROM reviews
WHERE rating is not NULL
GROUP BY nps;

