# Question 2: 

### *Does the contact employee impact number and/or value of sales? If so, can this be correlated to the reporting structure within the company?*

## Define Hypotheses

For this, we first need to define our hypotheses:
<br>
<br>
    ***Null Hypothesis:*** It makes no difference which employee took the order.
<br>
    ***Alternate Hypothesis:*** Certain employees generate more (or higher value) sales than others.


For reference, the relational structure of this dataset is provided below.
<img src='Northwind_ERD.png'>

## Importing Data & Libraries

In [1]:
# For SQL & dataframes
import pandas as pd
import sqlite3

# For math & statistics
import numpy as np
import scipy.stats as stats

# For graphing
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style("darkgrid")

In [2]:
conn = sqlite3.connect('Northwind_small.sqlite')
c = conn.cursor()

c.execute("SELECT name FROM sqlite_master WHERE type='table';")
print(c.fetchall())

[('Employee',), ('Category',), ('Customer',), ('Shipper',), ('Supplier',), ('Order',), ('Product',), ('OrderDetail',), ('CustomerCustomerDemo',), ('CustomerDemographic',), ('Region',), ('Territory',), ('EmployeeTerritory',)]


### Make some dataframes

In [5]:
# For employees
c.execute("""SELECT Id, LastName, FirstName, Title, ReportsTo, 
          BirthDate, City, Region FROM 'Employee';""")
employee_df = pd.DataFrame(c.fetchall())
employee_df.columns = [x[0] for x in c.description]
display(employee_df)
print('employee_df\n',employee_df.info())

Unnamed: 0,Id,LastName,FirstName,Title,ReportsTo,BirthDate,City,Region
0,1,Davolio,Nancy,Sales Representative,2.0,1980-12-08,Seattle,North America
1,2,Fuller,Andrew,"Vice President, Sales",,1984-02-19,Tacoma,North America
2,3,Leverling,Janet,Sales Representative,2.0,1995-08-30,Kirkland,North America
3,4,Peacock,Margaret,Sales Representative,2.0,1969-09-19,Redmond,North America
4,5,Buchanan,Steven,Sales Manager,2.0,1987-03-04,London,British Isles
5,6,Suyama,Michael,Sales Representative,5.0,1995-07-02,London,British Isles
6,7,King,Robert,Sales Representative,5.0,1992-05-29,London,British Isles
7,8,Callahan,Laura,Inside Sales Coordinator,2.0,1990-01-09,Seattle,North America
8,9,Dodsworth,Anne,Sales Representative,5.0,1998-01-27,London,British Isles


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 8 columns):
Id           9 non-null int64
LastName     9 non-null object
FirstName    9 non-null object
Title        9 non-null object
ReportsTo    8 non-null float64
BirthDate    9 non-null object
City         9 non-null object
Region       9 non-null object
dtypes: float64(1), int64(1), object(6)
memory usage: 656.0+ bytes
employee_df
 None


In [7]:
# For customers
c.execute("""SELECT Id, CompanyName, ContactName, ContactTitle, 
          City, Country, Region FROM 'Customer';""")
customer_df = pd.DataFrame(c.fetchall())
customer_df.columns = [x[0] for x in c.description]
display(customer_df.head(10))
print('customer_df\n',customer_df.info())

Unnamed: 0,Id,CompanyName,ContactName,ContactTitle,City,Country,Region
0,ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Berlin,Germany,Western Europe
1,ANATR,Ana Trujillo Emparedados y helados,Ana Trujillo,Owner,México D.F.,Mexico,Central America
2,ANTON,Antonio Moreno Taquería,Antonio Moreno,Owner,México D.F.,Mexico,Central America
3,AROUT,Around the Horn,Thomas Hardy,Sales Representative,London,UK,British Isles
4,BERGS,Berglunds snabbköp,Christina Berglund,Order Administrator,Luleå,Sweden,Northern Europe
5,BLAUS,Blauer See Delikatessen,Hanna Moos,Sales Representative,Mannheim,Germany,Western Europe
6,BLONP,Blondesddsl père et fils,Frédérique Citeaux,Marketing Manager,Strasbourg,France,Western Europe
7,BOLID,Bólido Comidas preparadas,Martín Sommer,Owner,Madrid,Spain,Southern Europe
8,BONAP,Bon app,Laurence Lebihan,Owner,Marseille,France,Western Europe
9,BOTTM,Bottom-Dollar Markets,Elizabeth Lincoln,Accounting Manager,Tsawassen,Canada,North America


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 91 entries, 0 to 90
Data columns (total 7 columns):
Id              91 non-null object
CompanyName     91 non-null object
ContactName     91 non-null object
ContactTitle    91 non-null object
City            91 non-null object
Country         91 non-null object
Region          91 non-null object
dtypes: object(7)
memory usage: 5.1+ KB
customer_df
 None


In [14]:
# For Orders

c.execute("""SELECT Id, CustomerId, EmployeeId, OrderDate, RequiredDate, 
          ShippedDate, ShipVia, Freight, ShipCity, ShipCountry, ShipRegion
          FROM 'Order';""")
order_df = pd.DataFrame(c.fetchall())
order_df.columns = [x[0] for x in c.description]
display(order_df.head(10))
print('order_df\n',order_df.info())

c.execute("SELECT * FROM 'OrderDetail';")
orderdetail_df = pd.DataFrame(c.fetchall())
orderdetail_df.columns = [x[0] for x in c.description]
display(orderdetail_df.head(10))
print('orderdetail_df\n',orderdetail_df.info())

Unnamed: 0,Id,CustomerId,EmployeeId,OrderDate,RequiredDate,ShippedDate,ShipVia,Freight,ShipCity,ShipCountry,ShipRegion
0,10248,VINET,5,2012-07-04,2012-08-01,2012-07-16,3,32.38,Reims,France,Western Europe
1,10249,TOMSP,6,2012-07-05,2012-08-16,2012-07-10,1,11.61,Münster,Germany,Western Europe
2,10250,HANAR,4,2012-07-08,2012-08-05,2012-07-12,2,65.83,Rio de Janeiro,Brazil,South America
3,10251,VICTE,3,2012-07-08,2012-08-05,2012-07-15,1,41.34,Lyon,France,Western Europe
4,10252,SUPRD,4,2012-07-09,2012-08-06,2012-07-11,2,51.3,Charleroi,Belgium,Western Europe
5,10253,HANAR,3,2012-07-10,2012-07-24,2012-07-16,2,58.17,Rio de Janeiro,Brazil,South America
6,10254,CHOPS,5,2012-07-11,2012-08-08,2012-07-23,2,22.98,Bern,Switzerland,Western Europe
7,10255,RICSU,9,2012-07-12,2012-08-09,2012-07-15,3,148.33,Genève,Switzerland,Western Europe
8,10256,WELLI,3,2012-07-15,2012-08-12,2012-07-17,2,13.97,Resende,Brazil,South America
9,10257,HILAA,4,2012-07-16,2012-08-13,2012-07-22,3,81.91,San Cristóbal,Venezuela,South America


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 830 entries, 0 to 829
Data columns (total 11 columns):
Id              830 non-null int64
CustomerId      830 non-null object
EmployeeId      830 non-null int64
OrderDate       830 non-null object
RequiredDate    830 non-null object
ShippedDate     809 non-null object
ShipVia         830 non-null int64
Freight         830 non-null float64
ShipCity        830 non-null object
ShipCountry     830 non-null object
ShipRegion      830 non-null object
dtypes: float64(1), int64(3), object(7)
memory usage: 71.4+ KB
order_df
 None


Unnamed: 0,Id,OrderId,ProductId,UnitPrice,Quantity,Discount
0,10248/11,10248,11,14.0,12,0.0
1,10248/42,10248,42,9.8,10,0.0
2,10248/72,10248,72,34.8,5,0.0
3,10249/14,10249,14,18.6,9,0.0
4,10249/51,10249,51,42.4,40,0.0
5,10250/41,10250,41,7.7,10,0.0
6,10250/51,10250,51,42.4,35,0.15
7,10250/65,10250,65,16.8,15,0.15
8,10251/22,10251,22,16.8,6,0.05
9,10251/57,10251,57,15.6,15,0.05


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2155 entries, 0 to 2154
Data columns (total 6 columns):
Id           2155 non-null object
OrderId      2155 non-null int64
ProductId    2155 non-null int64
UnitPrice    2155 non-null float64
Quantity     2155 non-null int64
Discount     2155 non-null float64
dtypes: float64(2), int64(3), object(1)
memory usage: 101.1+ KB
orderdetail_df
 None
