# Finding our best-performing salespeople and products

## Introduction

**Business Context.** You work for AdventureWorks, a company that sells outdoor sporting equipment. The company has many different locations and has been recording the sales of different locations on various products. You, their new data scientist, have been tasked with the question: **"What are our best products and salespeople and how can we use this information to improve our overall performance?"**

You have been given access to the relevant data files with documentation from the IT department. Your job is to extract meaningful insights from these data files to help increase sales. First, you will look at the best products and try to see how different products perform in different categories. Second, you will analyze the best salespeople to see if the commission percentage motivates them to sell more.

**Business Problem.** Your task is to **write queries in SQL to carry out the requested analysis**.

**Analytical Context.** You are given the data as a SQLite database. The company has been pretty vague about how they expect you to extract insights, but you have come up with the following plan of attack:

1. Load the database and ensure you can run basic queries against it
2. Look at how product ratings and total sales are related
3. See how products sell in different subcategories (bikes, helmets, socks, etc.)
4. Calculate which salespeople have performed the best in the past year
5. See if total sales are correlated with commission percentage

Of course, this is only your initial plan. As you explore the database, your strategy will likely change.

## Overview of the data

The data for this case is contained in the [`AdventureWorks.db`](AdventureWorks.db) SQLite database. We will be focusing on the tables that belong to the Sales and Product categories. Complete documentation for the original data (of which you have only a subset) can be found [here](https://dataedo.com/download/AdventureWorks.pdf).

**Product Tables:**
* **Product**: one row per product that the company sells
* **ProductReview**: one row per rating and review left by customers
* **ProductModelProductDescriptionCulture**: a link between products and their longer descriptions also indicating a "culture" - which language and region the product is for
* **ProductDescription**: a longer description of each product, for a specific region
* **ProductCategory**: the broad categories that products fit into
* **ProductSubCategory**: the narrower subcategories that products fit into

**Sales Tables:**
* **SalesPerson**: one row per salesperson, including information on their commission and performance
* **SalesOrderHeader**: one row per sale summarizing the sale
* **SalesOrderDetail**: many rows per sale, detailing each product that forms part of the sale
* **SalesTerritory**: the different territories where products are sold, including performance

**Region Tables:**
* **CountryRegionCurrency**: the currency used by each region
* **CurrencyRate**: the average and closing exchange rates for each currency compared to the USD

Importing the libraries and the `sql` extension:

In [None]:
%load_ext sql
import pandas as pd

Let's now load in the database:

In [None]:
%sql sqlite:///AdventureWorks.db

Remember that in order to run SQL queries from within this notebook, you should write `%%sql` at the start of your code cell, like this:

In [None]:
%%sql
SELECT * FROM product LIMIT 5;

#### To keep in mind

To solve the exercises of this extended case, you will be asked to assign SQL code to Python string variables. Remember that you can use triple quotes to assign multiple-line text blocks to a variable. For example,

~~~python
rating_ranking = """
    SELECT
        product.productid
    FROM
        product
    LIMIT 5
"""
~~~

Note that if you want to verify if your code executes correctly, you must run the code, *not* the variable, after the `%%sql` line. This will run:

~~~python
%%sql
SELECT
    product.productid
FROM
    product
LIMIT 5
~~~

While this won't:

~~~python
%%sql
rating_ranking
~~~

Your answer, however, will be graded using the variable you created, so be sure that the code that you successfully tested with `%%sql` is indeed the code that you assigned to the variable before you submit the extended case.

Do not round your results (ie., leave them with as many decimal digits they have). Also, be sure to name your columns *exactly* as they are in the sample tables in each exercise. Otherwise you won't get any points when we run our tests after you've submitted your homework.

## Finding our most popular products

The company would like to know which of their products is the most popular among customers. You figure that the average rating given in reviews is correlated with the number of sales of a particular product (that products with higher reviews have more sales).

### Exercise 1

Using the `product` and `productreview` tables, `INNER JOIN` them and rank the products according to their average review rating. Save the SQL code in a string variable called `rating_ranking`. Please make *absolutely sure* to name your variable exactly that or otherwise your answer will not be recorded.

Your output should look like this:

| productid 	| NAME 	| avgrating 	| num_ratings 	|
|-:	|-:	|-:	|-	|
| 709 	| Mountain Bike Socks, M 	| 5.0 	| 1 	|
| ... 	| ... 	| ... 	| ... 	|


In [None]:
# Remember to name your variable rating_ranking
# YOUR CODE HERE
raise NotImplementedError() # Remove this line when you enter your solution

### Exercise 2

Much to your disappointment, there are only three products with ratings and only four reviews in total! This is nowhere near enough to perform an analysis of the correlation between reviews and total sales. Since we cannot infer the most popular products from the reviews, we will go with an alternative strategy.

#### 2.1

Get the product model ID and description for each product. Include only descriptions for which `productmodelproductdescriptionculture.cultureid = 'en'`.

Your output should look like this:

<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th>productmodelid</th>      <th>description</th>    </tr>  </thead>  <tbody>    <tr>      <td>1</td>      <td>Light-weight, wind-resistant, packs to fit into a pocket.</td>    </tr>    <tr>      <td>2</td>      <td>Traditional style with a flip-up brim; one-size fits all.</td>    </tr>    <tr>      <td>3</td>      <td>Synthetic palm, flexible knuckles, breathable mesh upper. Worn by the AWC team riders.</td>    </tr>    <tr>      <td>...</td>      <td>...</td>    </tr>  </tbody></table>

In [None]:
# Name your variable productmodelid_description
# YOUR CODE HERE
raise NotImplementedError() # Remove this line when you enter your solution

#### 2.2

Get the model ID, name, description, and total number of sales for each product and display the top-10 selling products. You can infer how often products have been sold by looking at the `salesorderdetail` table (each row might indicate more than one sale, so take note of `OrderQty`).

Your output should look like this:

<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th>productmodelid</th>      <th>description</th>      <th>NAME</th>      <th>total_orders</th>    </tr>  </thead>  <tbody>    <tr>      <td>2</td>      <td>Traditional style with a flip-up brim; one-size fits all.</td>      <td>AWC Logo Cap</td>      <td>8311</td>    </tr>    <tr>      <td>111</td>      <td>AWC logo water bottle - holds 30 oz; leak-proof.</td>      <td>Water Bottle - 30 oz.</td>      <td>6815</td>    </tr>    <tr>      <td>33</td>      <td>Universal fit, well-vented, lightweight , snap-on visor.</td>      <td>Sport-100 Helmet, Blue</td>      <td>6743</td>    </tr>    <tr>      <td>...</td>      <td>...</td>      <td>...</td>      <td>...</td>    </tr>  </tbody></table>

**Hint:** Make the query you wrote in exercise 2.1 a temporary view with the `WITH ... AS` syntax. It will give you the English descriptions of the products as a starting point. Then `INNER JOIN` it with all the other relevant tables.

In [None]:
# Name your variable description_totalorders
# YOUR CODE HERE
raise NotImplementedError() # Remove this line when you enter your solution

### Exercise 3

To get a better sense of the sales, let's look at the correlation between quantity sold and price for each subcategory.

#### 3.1

Write a query that shows how many items were ordered in total for every product in the database. Do not filter by culture.

Your output should look like this:

| productid 	| quantity 	|
|-:	|-:	|
| 707 	| 6266 	|
| 708 	| 6532 	|
| 709 	| 1107 	|
| 710 	| 90 	|
| 711 	| 6743 	|
| 712 	| 8311 	|
| 713 	| 429 	|
| 714 	| 3636 	|
| ... 	| ... 	|

**Hint:** Use the `salesorderdetail` table.

In [None]:
# Name your variable quantities_ordered
# YOUR CODE HERE
raise NotImplementedError() # Remove this line when you enter your solution

#### 3.2

Write a query that shows the list price for each product, alongside its category and subcategory. Your output should look like this:

| productid 	| category 	| subcategory 	| listprice 	|
|-:	|-:	|-:	|-:	|
| 680 	| Components 	| Road Frames 	| 1431.5 	|
| 706 	| Components 	| Road Frames 	| 1431.5 	|
| 707 	| Accessories 	| Helmets 	| 34.99 	|
| 708 	| Accessories 	| Helmets 	| 34.99 	|
| 709 	| Clothing 	| Socks 	| 9.5 	|
| 710 	| Clothing 	| Socks 	| 9.5 	|
| 711 	| Accessories 	| Helmets 	| 34.99 	|
| 712 	| Clothing 	| Caps 	| 8.99 	|
| 713 	| Clothing 	| Jerseys 	| 49.99 	|
| 714 	| Clothing 	| Jerseys 	| 49.99 	|
| 715 	| Clothing 	| Jerseys 	| 49.99 	|
| 716 	| Clothing 	| Jerseys 	| 49.99 	|
| 717 	| Components 	| Road Frames 	| 1431.5 	|
| 718 	| Components 	| Road Frames 	| 1431.5 	|
| 719 	| Components 	| Road Frames 	| 1431.5 	|
| ... 	| ... 	| ... 	| ... 	|

**Hint:** You will find the product categories in the `productcategory` table, and the subcategories in the `productsubcategory` table.

In [None]:
# Name your variable products_prices
# YOUR CODE HERE
raise NotImplementedError() # Remove this line when you enter your solution

#### 3.3

Merge the queries from exercises 3.1 and 3.2 to obtain a table that shows, for each subcategory, the average list price and the total quantity of products sold. Your output should look like this:

| category 	| subcategory 	| average_price_in_subcategory 	| total_items_sold_in_subcategory 	|
|-:	|-:	|-:	|-:	|
| Accessories 	| Bike Racks 	| 120.0 	| 3166 	|
| Accessories 	| Bike Stands 	| 159.0 	| 249 	|
| Accessories 	| Bottles and Cages 	| 7.989999999999999 	| 10552 	|
| Accessories 	| Cleaners 	| 7.95 	| 3319 	|
| Accessories 	| Fenders 	| 21.98 	| 2121 	|
| Accessories 	| Helmets 	| 34.99 	| 19541 	|
| Accessories 	| Hydration Packs 	| 54.99 	| 2761 	|
| Accessories 	| Locks 	| 25.0 	| 1087 	|
| Accessories 	| Pumps 	| 19.99 	| 1130 	|
| Accessories 	| Tires and Tubes 	| 19.482727272727274 	| 18006 	|
| Bikes 	| Mountain Bikes 	| 1683.3649999999982 	| 28321 	|
| Bikes 	| Road Bikes 	| 1597.45 	| 47196 	|
| Bikes 	| Touring Bikes 	| 1425.2481818181814 	| 14751 	|
| Clothing 	| Bib-Shorts 	| 89.99 	| 3125 	|
| ... 	| ... 	| ... 	| ... 	|

**Hint:** To have two `WITH ... AS` statements in the same query, you separate the subqueries with a comma and don't write `WITH` again. Like this:

~~~sql
WITH first_query_alias AS
(
    SELECT ...
),
second_query_alias AS -- Notice we didn't include a second WITH here
(
    SELECT...
)
SELECT ...
~~~

In [None]:
# Name your variable prices_quantities
# YOUR CODE HERE
raise NotImplementedError() # Remove this line when you enter your solution

There is positive correlation between average price and items sold ($\rho=0.68$). This is somewhat unexpected, since common sense tells us that the more expensive an item is, the lower the demand for it. It is possible that we are witnessing an instance of Simpson's Paradox here. To verify if that is indeed the case, we could instead compute the correlation coefficient for each subcategory, possibly evidencing a negative correlation coefficient in some subcategories. We will not do that right now, however, since it would make us deviate too much from our business problem.

## Finding our top salespeople

As mentioned earlier, we want to find our best salespeople and see whether or not we can incentivize them in an appropriate manner. Namely, we want to determine if the commission percentage we give them motivates them to make more and bigger sales.

### Exercise 4

Find the top five performing salespeople by using the `salesytd` (Sales, year-to-date) column.

Your output should look like this:

<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th>businessentityid</th>      <th>salesytd</th>    </tr>  </thead>  <tbody>    <tr>      <td>276</td>      <td>4251368.5497</td>    </tr>    <tr>      <td>289</td>      <td>4116871.2277</td>    </tr>    <tr>      <td>275</td>      <td>3763178.1787</td>    </tr>    <tr>      <td>...</td>      <td>...</td>    </tr>  </tbody></table>

**Hint:** We only need to know the `businessentityid` for each salesperson as this uniquely identifies each. Your query should therefore only have two columns: `businessentityid` and `salesytd`.

In [None]:
# Name your variable salesperson_sales
# YOUR CODE HERE
raise NotImplementedError() # Remove this line when you enter your solution

### Exercise 5

The sales numbers from the previous query are hard-coded into the `salesperson` table, instead of dynamically calculated from each sales record. Currently, we don't know how this number is updated or much about it at all, so it's good to remain skeptical.

Using ```salesorderheader```, find the top 5 salespeople who made the most sales *in the most recent year available* (2014). (There is a column called `subtotal` - use that.) Sales that do not have an associated salesperson should be excluded from your calculations and final output. All orders that were made within the 2014 calendar year should be included.

Your output should look like this:

<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th>salespersonid</th>      <th>totalsales</th>    </tr>  </thead>  <tbody>    <tr>      <td>289</td>      <td>1382996.5839</td>    </tr>    <tr>      <td>276</td>      <td>1271088.5216</td>    </tr>    <tr>      <td>...</td>      <td>...</td>    </tr>  </tbody></table>

**Hint:** You can use the syntax `WHERE column >= '1970-01-01'` to generate an arbitrary date in SQLite and compare this to specific dates in the tables (in this example, dates equal to or later than Jan 1, 1970). Additionally, when you want to make sure that columns with empty or null values are excluded from a query in SQLite, you have to add a line like this one to your `WHERE` statement: `my_column IS NOT NULL AND my_column <> ""`. The `<>` operator is the opposite of `=`, that is, it checks that two values are different from each other.

In [None]:
# Name your variable salesperson_totalsales
# YOUR CODE HERE
raise NotImplementedError() # Remove this line when you enter your solution

We see right away that there are discrepancies between the two sales totals. For the remainder of this case, use this dynamically-calculated total as the authoritative answer.

### Exercise 6

Looking at the documentation, you will see that `subtotal` in the ```salesorderheader``` table is calculated from other tables in the database. To validate this figure (instead of trusting it blindly), it could be a good idea to calculate `subtotal` manually. Using the ```salesorderdetail``` and ```salesorderheader``` tables, let's calculate the sales for each salesperson for **this past year** (2014) and display results for the top 5 salespeople.

#### 6.1

Write a query that shows for each `salesorderid` (find this column in the `salesorderdetail` table) the total amount of money paid. Remember to subtract `unitpricediscount` from each item's price (`unitpricediscount` is a percentage).

Your output should look like this:

| salesorderid 	| ordertotal 	|
|-:	|-:	|
| 43659 	| 20565.6206 	|
| 43660 	| 1294.2529 	|
| 43661 	| 32726.4786 	|
| 43662 	| 28832.5289 	|
| 43663 	| 419.4589 	|
| 43664 	| 24432.608799999995 	|
| 43665 	| 14352.7713 	|
| 43666 	| 5056.4896 	|
| 43667 	| 6107.081999999999 	|
| 43668 	| 35944.156200000005 	|
| 43669 	| 714.7043 	|
| ... 	| ... 	|

In [None]:
# Name your variable order_ordertotal
# YOUR CODE HERE
raise NotImplementedError() # Remove this line when you enter your solution

#### 6.2

Using the previous query as a subquery, find the sales for each salesperson for the year 2014 and display results for the top 5 salespeople.

**Hint:** You can get the `salesorderid` and `salespersonid` pairs from the `salesorderheader` table.

In [None]:
# Name your variable salesperson_ordertotal
# YOUR CODE HERE
raise NotImplementedError() # Remove this line when you enter your solution

The results are the same. We still prefer this query though because it is generated from granular data instead of relying on hard-coded figures.

### Exercise 7

Let's now see whether there is a positive relationship between the total sales of the salespeople and their commission percentages. Join the previous query (remove the `LIMIT` clause) with the `salesperson` table to get a table like this one:

| salespersonid 	| ordertotalsum 	| commissionpct 	|
|-:	|-:	|-:	|
| 274 	| 178584.36250800002 	| 0.0 	|
| 275 	| 1057247.378572 	| 0.012 	|
| 276 	| 1271088.5214610002 	| 0.015 	|
| 277 	| 1040093.406901 	| 0.015 	|
| ... 	| ... 	| ... 	|

**Hint:** Remember that the `businessentityid` column from the `salesperson` is compatible with the `salespersonid` column in the query of exercise 6 (they both represent the salesperson ID).

In [None]:
# Name your variable salesperson_ordertotal_commission
# YOUR CODE HERE
raise NotImplementedError() # Remove this line when you enter your solution

The correlation coefficient between `ordertotalsum` and `commissionpct` is $\rho=0.73$. This suggests that the salespeople who earn a high commission are also those who close the bigger deals.

### Exercise 8

Remember how we mentioned that products were sold in many regions? This is why you had to work with the `culture` value before to get the English language descriptions. To make matters worse, you are told the sales are recorded in *local* currency, so your previous analyses are flawed, and you must convert all amounts to USD if you wish to compare the different salespeople fairly!

Use the `countryregioncurrency` table in combination with the `salesperson` and `salesterritory` ones to figure out the relevant currency symbol for each of the top salespeople.

Your output should look like this:

| businessentityid 	| currencycode 	|
|-:	|-:	|
| 275 	| USD 	|
| 276 	| USD 	|
| 277 	| USD 	|
| 278 	| CAD 	|
| 279 	| USD 	|
| 280 	| USD 	|
| ... 	| ... 	|

In [None]:
# Name your variable salesperson_currency
# YOUR CODE HERE
raise NotImplementedError() # Remove this line when you enter your solution

### Exercise 9

Now that we have the currency codes associated with each salesperson, redo Exercise 7 adding the currency. Order the results by currency (ascending) and total sales (descending) to make it easier to see who the best salespeople are for each currency.

**Hint:** `INNER JOIN` your queries from exercise 7 and exercise 8 (don't forget to use `WITH ... AS`).

In [None]:
# Name your variable salesperson_ranking_currency
# YOUR CODE HERE
raise NotImplementedError() # Remove this line when you enter your solution

## Testing cells

In [None]:
import sqlalchemy
sqlite_engine = sqlalchemy.create_engine("sqlite:///AdventureWorks.db")

In [None]:
# Ex. 1
assert "rating_ranking" in globals(), "Ex. 1 - Remember that your variable's name should be `rating_ranking`!"
rating_ranking_result = pd.read_sql(rating_ranking, con=sqlite_engine)
assert len(rating_ranking_result) > 0, "Ex. 1 - Your code is not producing any output! (ie., a table with lenght zero)"
assert set(rating_ranking_result.columns) == {'NAME', 'avgrating', 'num_ratings', 'productid'}, "Ex. 1 - Your query result doesn't have exactly these columns: 'NAME', 'avgrating', 'num_ratings', 'productid'"
print("Exercise 1 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 2.1
assert "productmodelid_description" in globals(), "Ex. 2.1 - Remember that your variable's name should be `productmodelid_description`!"
productmodelid_description_result = pd.read_sql(productmodelid_description, con=sqlite_engine)
assert len(productmodelid_description_result) == 127, "Ex. 2.1 - There are 127 product models in the database, but your query produces a different number. Make sure that you don't have any LIMIT clauses in this exercise!"
assert set(productmodelid_description_result.columns) == {'description', 'productmodelid'}, "Ex. 2.1 - Your query result doesn't have exactly these columns: 'description', 'productmodelid'"
print("Exercise 2.1 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 2.2
assert "description_totalorders" in globals(), "Ex. 2.2 - Remember that your variable's name should be `description_totalorders`!"
description_totalorders_result = pd.read_sql(description_totalorders, con=sqlite_engine)
assert len(description_totalorders_result) == 10, "Ex. 2.2 - Remember to use LIMIT 10 in your query! This is a top 10!"
assert set(description_totalorders_result.columns) == {'NAME', 'description', 'productmodelid', 'total_orders'}, "Ex. 2.2 - Your query result doesn't have exactly these columns: 'NAME', 'description', 'productmodelid', 'total_orders'"
print("Exercise 2.2 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 3.1
assert "quantities_ordered" in globals(), "Ex. 3.1 - Remember that your variable's name should be `quantities_ordered`!"
quantities_ordered_result = pd.read_sql(quantities_ordered, con=sqlite_engine)
assert len(quantities_ordered_result) == 266, "Ex. 2 - There are 266 products in the database that have associated quantities, but your query produces a different number. Make sure that you don't have any LIMIT clauses in this exercise and don't filter by culture!"
assert set(quantities_ordered_result.columns) == {'productid', 'quantity'}, "Ex. 3.1 - Your query result doesn't have exactly these columns: 'productid', 'quantity'"
print("Exercise 3.1 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 3.2
assert "products_prices" in globals(), "Ex. 3.2 - Remember that your variable's name should be `products_prices`!"
products_prices_result = pd.read_sql(products_prices, con=sqlite_engine)
assert len(products_prices_result) == 295, "Ex. 3.2 - There are 295 products in the database that have prices, but your query produces a different number. Make sure that you don't have any LIMIT clauses in this exercise!"
assert set(products_prices_result.columns) == {'category', 'listprice', 'productid', 'subcategory'}, "Ex. 3.2 - Your query result doesn't have exactly these columns: 'category', 'listprice', 'productid', 'subcategory'"
print("Exercise 3.2 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 3.3
assert "prices_quantities" in globals(), "Ex. 3.3 - Remember that your variable's name should be `prices_quantities`!"
prices_quantities_result = pd.read_sql(prices_quantities, con=sqlite_engine)
assert len(prices_quantities_result) == 35, "Ex. 3.3 - There are 35 subcategories in the database, but your query produces a different number. Make sure that you don't have any LIMIT clauses in this exercise!"
assert set(prices_quantities_result.columns) == {'average_price_in_subcategory', 'category', 'subcategory', 'total_items_sold_in_subcategory'}, "Ex. 3.3 - Your query result doesn't have exactly these columns: 'average_price_in_subcategory', 'category', 'subcategory', 'total_items_sold_in_subcategory'"
print("Exercise 3.3 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 4
assert "salesperson_sales" in globals(), "Ex. 4 - Remember that your variable's name should be `salesperson_sales`!"
salesperson_sales_result = pd.read_sql(salesperson_sales, con=sqlite_engine)
assert len(salesperson_sales_result) == 5, "Ex. 4 - This is a top 5. Remember to use LIMIT!"
assert set(salesperson_sales_result.columns) == {'businessentityid', 'salesytd'}, "Ex. 4 - Your query result doesn't have exactly these columns: 'businessentityid', 'salesytd'"
print("Exercise 4 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 5
assert "salesperson_totalsales" in globals(), "Ex. 5 - Remember that your variable's name should be `salesperson_totalsales`!"
salesperson_totalsales_result = pd.read_sql(salesperson_totalsales, con=sqlite_engine)
assert len(salesperson_totalsales_result) == 5, "Ex. 5 - This is a top 5. Remember to use LIMIT!"
assert set(salesperson_totalsales_result.columns) == {'salespersonid', 'totalsales'}, "Ex. 5 - Your query result doesn't have exactly these columns: 'salespersonid', 'totalsales'"
print("Exercise 5 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 6.1
assert "order_ordertotal" in globals(), "Ex. 6.1 - Remember that your variable's name should be `order_ordertotal`!"
order_ordertotal_result = pd.read_sql(order_ordertotal, con=sqlite_engine)
assert len(order_ordertotal_result) == 31465, "Ex. 6.1 - There are more than 31,000 orders in the database. Remember to NOT use LIMIT here!"
assert set(order_ordertotal_result.columns) == {'ordertotal', 'salesorderid'}, "Ex. 6.1 - Your query result doesn't have exactly these columns: 'ordertotal', 'salesorderid'"
print("Exercise 6.1 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 6.2
assert "salesperson_ordertotal" in globals(), "Ex. 6.2 - Remember that your variable's name should be `salesperson_ordertotal`!"
salesperson_ordertotal_result = pd.read_sql(salesperson_ordertotal, con=sqlite_engine)
assert len(salesperson_ordertotal_result) == 5, "Ex. 6.2 - There are too many or too few rows in your result. Remember to use LIMIT here!"
assert set(salesperson_ordertotal_result.columns) == {'ordertotalsum', 'salespersonid'}, "Ex. 6.2 - Your query result doesn't have exactly these columns: 'ordertotalsum', 'salespersonid'"
print("Exercise 6.2 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 7
assert "salesperson_ordertotal_commission" in globals(), "Ex. 7 - Remember that your variable's name should be `salesperson_ordertotal_commission`!"
salesperson_ordertotal_commission_result = pd.read_sql(salesperson_ordertotal_commission, con=sqlite_engine)
assert len(salesperson_ordertotal_commission_result) == 17, "Ex. 7 - There are too many or too few rows in your result. Remember to NOT use LIMIT here!"
assert set(salesperson_ordertotal_commission_result.columns) == {'commissionpct', 'ordertotalsum', 'salespersonid'}, "Ex. 7 - Your query result doesn't have exactly these columns: 'commissionpct', 'ordertotalsum', 'salespersonid'"
print("Exercise 7 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 8
assert "salesperson_currency" in globals(), "Ex. 8 - Remember that your variable's name should be `salesperson_currency`!"
salesperson_currency_result = pd.read_sql(salesperson_currency, con=sqlite_engine)
assert len(salesperson_currency_result) == 16, "Ex. 8 - There are too many or too few rows in your result. Remember to NOT use LIMIT here!"
assert set(salesperson_currency_result.columns) == {'businessentityid', 'currencycode'}, "Ex. 8 - Your query result doesn't have exactly these columns: 'businessentityid', 'currencycode'"
print("Exercise 8 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 9
assert "salesperson_ranking_currency" in globals(), "Ex. 9 - Remember that your variable's name should be `salesperson_ranking_currency`!"
salesperson_ranking_currency_result = pd.read_sql(salesperson_ranking_currency, con=sqlite_engine)
assert len(salesperson_ranking_currency_result) == 16, "Ex. 9 - There are too many or too few rows in your result. Remember to NOT use LIMIT here!"
assert set(salesperson_ranking_currency_result.columns) == {'commissionpct', 'currencycode', 'ordertotalsum', 'salespersonid'}, "Ex. 9 - Your query result doesn't have exactly these columns: 'commissionpct', 'currencycode', 'ordertotalsum', 'salespersonid'"
print("Exercise 9 looks fine for now. You will get your final grade after we've reviewed your submission.")

## Attribution

"AdventureWorks database", Nov 7, 2017, Microsoft Corporation, [MIT License](https://docs.microsoft.com/en-us/sql/samples/sql-samples-where-are?view=sql-server-ver15), https://github.com/microsoft/sql-server-samples/tree/master/samples/databases/adventure-works