# Discounts Hypothesis Tests
Do discounts have a statistically significant effect on the number of products customers order? If so, at what level(s) of discount?
* On an order as a whole when at least one product has a discount
* For particular products
* ~~For particular products purchased by particular customers with and without a discount~~ (Not enough data)

# Imports and Constants

In [36]:
import sqlite3
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import scipy.stats as stats

In [2]:
DB_NAME = 'Northwind_small.sqlite'
RANDOM_STATE = 42

# Connect to Database

In [3]:
conn = sqlite3.connect(DB_NAME)
cur = conn.cursor()

# Do discounts have a significant effect on the number of products customers order when considering orders that have at least one discount on a product?

## Hypothesis
* H0 = A discount **does not** have a significant effect on the number of products in an order
* HA = A discount **does** have a significant effect on the number of products in an order

## Test Type

Conduct a Two Tailed, One Sample T-Test on the mean of the sample (with discounts) vs. the population mean

## Set Significance Level 

In [62]:
alpha = 0.05

## Query database for Order Data

In [4]:
q = """
    SELECT * from OrderDetail;
    """

In [5]:
df = pd.DataFrame(cur.execute(q).fetchall(),
                  columns=[description[0] for description in cur.description])

In [6]:
df.head()

Unnamed: 0,Id,OrderId,ProductId,UnitPrice,Quantity,Discount
0,10248/11,10248,11,14.0,12,0.0
1,10248/42,10248,42,9.8,10,0.0
2,10248/72,10248,72,34.8,5,0.0
3,10249/14,10249,14,18.6,9,0.0
4,10249/51,10249,51,42.4,40,0.0


## Group Data by OrderId
So that the number of items per order can be calculated
* Quantity will indicate the total number of items in an order
* Discount will be the sum of the discounts on an order
    * 0.0 will indicate no discounts on the order
    * \> 0.0 will indicate at least 1 item had a discount on the order

In [29]:
df_groupby_order_id = df[['OrderId', 'Quantity', 'Discount']].groupby('OrderId').agg(total_qty = ('Quantity', 'sum'),
                                                                                     max_discount = ('Discount', 'max'))

In [30]:
df_groupby_order_id.head()

Unnamed: 0_level_0,total_qty,max_discount
OrderId,Unnamed: 1_level_1,Unnamed: 2_level_1
10248,27,0.0
10249,49,0.0
10250,60,0.15
10251,41,0.05
10252,105,0.05


In [31]:
df_groupby_order_id.describe()

Unnamed: 0,total_qty,max_discount
count,830.0,830.0
mean,61.827711,0.066928
std,50.748158,0.087484
min,1.0,0.0
25%,26.0,0.0
50%,50.0,0.0
75%,81.0,0.15
max,346.0,0.25


## Separate orders with a discount

In [32]:
orders_with_discount_df = df_groupby_order_id[df_groupby_order_id.max_discount > 0.0]

In [33]:
orders_with_discount_df.head()

Unnamed: 0_level_0,total_qty,max_discount
OrderId,Unnamed: 1_level_1,Unnamed: 2_level_1
10250,60,0.15
10251,41,0.05
10252,105,0.05
10254,57,0.15
10258,121,0.2


In [34]:
orders_with_discount_df.describe()

Unnamed: 0,total_qty,max_discount
count,380.0,380.0
mean,72.944737,0.146184
std,51.403927,0.071582
min,2.0,0.05
25%,37.0,0.1
50%,62.5,0.15
75%,95.0,0.2
max,330.0,0.25


## Conduct T-Test

In [45]:
results = stats.ttest_1samp(orders_with_discount_df['total_qty'], 
                            df_groupby_order_id['total_qty'].mean())

In [46]:
p_value = results[1]

In [47]:
p_value

3.114975153738426e-05

## Results

Because the p-value is less than alpha, the null hypothesis can be rejected in favor of the alternative hypothesis.  Hence, with a high confidence, offering a discount leads to a different amount of items purchased by a customer.

# Do discounts have a significant effect on the amount purchased of a particular product?

## Hypothesis

H0 = A discount **does not** have a significant effect on the amount purchased of a particular product
HA = A discount **does**  have a significant effect on the amount purchased of a particular product

## Set Significance Level

In [61]:
alpha = 0.05

## Conduct T-Tests

In [51]:
df.ProductId.value_counts()

59    54
31    51
24    51
60    51
56    50
      ..
66     8
48     6
15     6
37     6
9      5
Name: ProductId, Length: 77, dtype: int64

In [52]:
df_59 = df[df.ProductId == 59]

In [53]:
df_59_qty_mean = df_59.Quantity.mean()

In [54]:
results = stats.ttest_1samp(df_59.Quantity, df_59_qty_mean)

In [55]:
results

Ttest_1sampResult(statistic=0.0, pvalue=1.0)

In [57]:
t_test_results = []

for product_id in df.ProductId.unique():
    results = stats.ttest_1samp(df[(df.ProductId == product_id) & (df.Discount > 0.0)]['Quantity'], 
                                df[(df.ProductId == product_id)]['Quantity'].mean())
    t_test_results.append((product_id, results[1]))

  **kwargs)
  ret = ret.dtype.type(ret / rcount)


In [63]:
sig_results = [result for result in t_test_results if result[1] < alpha]

In [64]:
sig_results

[(57, 0.0216644542070736), (3, 0.0)]

## Results

The null hypothesis can be rejected in favor of the alternative hypothesis only for products 57 and 3.

## Review order details of the products for which discounts had a significant effect

In [65]:
df[df.ProductId == 57]

Unnamed: 0,Id,OrderId,ProductId,UnitPrice,Quantity,Discount
9,10251/57,10251,57,15.6,15,0.05
35,10260/57,10260,57,15.6,50,0.0
90,10282/57,10282,57,15.6,2,0.0
207,10326/57,10326,57,15.6,16,0.0
285,10355/57,10355,57,15.6,25,0.0
322,10368/57,10368,57,15.6,25,0.0
448,10416/57,10416,57,15.6,20,0.0
503,10438/57,10438,57,15.6,15,0.2
763,10535/57,10535,57,19.5,5,0.1
886,10578/57,10578,57,19.5,6,0.0


In [67]:
df[df.ProductId == 57]['Quantity'].mean()

18.869565217391305

In [66]:
df[df.ProductId == 3]

Unnamed: 0,Id,OrderId,ProductId,UnitPrice,Quantity,Discount
109,10289/3,10289,3,8.0,30,0.0
419,10405/3,10405,3,8.0,50,0.0
628,10485/3,10485,3,8.0,20,0.1
780,10540/3,10540,3,10.0,60,0.0
909,10591/3,10591,3,10.0,14,0.0
1196,10702/3,10702,3,10.0,6,0.0
1299,10742/3,10742,3,10.0,20,0.0
1362,10764/3,10764,3,10.0,20,0.1
1577,10849/3,10849,3,10.0,49,0.0
1598,10857/3,10857,3,10.0,30,0.0


In [68]:
df[df.ProductId == 3]['Quantity'].mean()

27.333333333333332

Interestingly offering a discount on these items resulted in quantities lower than the population means.  However there is so little data on a per item basis, that I don't think the results of this test can be truly be considered significant.

# At what discount levels is there a significant effect on the quantity of items that a customer purchases?

## Hypothesis

H0 = A particular discount **does not** have a significant effect on the quantity of items purchased in a order

HA = A particular discount **does** have a significant effect on the quantity of items purchased in a order

## Set Significance Level

In [69]:
alpha = 0.05

## Conduct T-Test

In [73]:
discounts = df_groupby_order_id.max_discount.unique()

In [76]:
discounts

array([0.  , 0.15, 0.05, 0.2 , 0.25, 0.1 ])

In [79]:
# Remove 0. because that is not a discount
discounts = np.delete(discounts, 0)

In [81]:
t_test_results = []
for discount in discounts:
    results = stats.ttest_1samp(df_groupby_order_id[df_groupby_order_id.max_discount == discount]['total_qty'],
                                df_groupby_order_id['total_qty'].mean())
    t_test_results.append((discount, results[1]))

In [82]:
t_test_results

[(0.15, 0.028938655440696546),
 (0.05, 0.11135513584346501),
 (0.2, 0.2342174363274203),
 (0.25, 0.023154173822675702),
 (0.1, 0.05283219160777333)]

In [83]:
sig_results = [result for result in t_test_results if result[1] < alpha]

In [84]:
sig_results

[(0.15, 0.028938655440696546), (0.25, 0.023154173822675702)]

## Results

For discounts of 15% and 25% the null hypothesis can be rejected in favor of the alternative hypothesis, but for discounts of 5%, 10%, and 20% the null hypothesis can not be rejected.

## Recommendations

Offer discounts of 15 or 25 percent in order to increase total order item quantity.