# Analyzing Furniture Data

### 1.A Data Dictionary

| Variable                  | Data Type                              | Source       | Mnemonic     |
|---------------------------|----------------------------------------|--------------|--------------|
| Order Number              | Nominal Integer                        | Order Sys    | Onum         |
| Customer ID               | Nominal                                | Customer Sys | CID          | 
| Transaction Date          | MM/DD/YYYY                             | Order Sys    | Tdate        | 
| Product Line ID           | Five rooms of house                    | Product Sys  | Pline        |
| Product Class ID          | Item in line                           | Product Sys  | Pclass       |
| Units Sold                | Number of units per order              | Order Sys    | Usales       |
| Product Returned?         | Yes/No                                 | Order Sys    | Return       |
| Amount Returned           | Number of units                        | Order Sys    | returnAmount |
| Material Cost/Unit        | \$US cost of material                  | Product Sys  | Mcost        |
| List Price                | \$US list                              | Price Sys    | Lprice       |
| Dealer Discount           | \% discount to dealer (decimal)        | Sales Sys    | Ddisc        |
| Competitive Discount      | \% discount for competition (decimal)  | Sales Sys    | Cdisc        |
| Order Size Discount       | \% discount for size (decimal)         | Sales Sys    | Odisc        |
| Customer Pickup Allowance | \% discount for pickup (decimal)       | Sales Sys    | Pdisc        |

### Default Formats and Functions

In [59]:
def import_view(df):
    """
    Display the first 5 rows and data types of a DataFrame. Used to confirm import was correct.
    
    Args:
        df: DataFrame handle.
    """
    display(df.head().style.set_caption('First 5 Rows').
                            set_table_styles(tbl_styles))
    display(df.dtypes.to_frame().rename(columns={0: "dataType"}).
                                 style.set_caption('DataFrame Data Types').
                                       set_table_styles(tbl_styles))

    
def df_size( df ):
    """
    Display DataFrame size as nice output: rows and columns.
    
    Args:
        df: DataFrame handle.
    """
    data = { 'Count': [ df.shape[ 0 ], df.shape[ 1 ] ] }
    idx = [ 'Number of Rows', 'Number of Columns' ]
    display(pd.DataFrame( data, index = idx ).\
               style.set_caption( 'DataFrame Dimensions' ).\
               set_table_styles( tbl_styles ) )
    
    
##
## DataFrame styles
##
tbl_styles = [ {
    'selector': 'caption',
    'props': [
        ('color', 'darkblue'),
        ('font-size', '18px')
    ] } ]

### Import Packages

In [6]:
##
## ===> Data Management <===
##
import numpy as np
import pandas as pd
##
## ===> Visualization <===
##
import seaborn as sns
import matplotlib.pyplot as plt
##
## Set the seaborn grid style.  The dot between the seaborn alias,
## "sns", and the set() function connects or "chains" the alias and the method.
##
## sns.set()
##
## ===> Speciality <===
##
## import sidetable
##
## Set an option for the number of Pandas columns to display.  Eight in this case.
## 
pd.set_option( 'display.max_columns', 8 )
##
## ===> Modeling <===
##
## Import train_test_split package from sklearn
##
from sklearn.model_selection import train_test_split
##
## For modeling, notice the new import command for
## the formula API and the summary option
##
import statsmodels.api as sm
import statsmodels.formula.api as smf 
##
## Import the r2_score function from the sklearn metrics package
##
from sklearn.metrics import r2_score
##
## Import confusion functions for classification
##
from sklearn.metrics import confusion_matrix, classification_report
##
## Import decision tree classifier functions
##
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
from sklearn import preprocessing
from sklearn.tree import export_graphviz
##
from sklearn.preprocessing import LabelEncoder
##
## Some packages are needed for decision trees:
## Some additional packages are needed to plot a decision tree:
## - graphviz
## - pydotplus
## Both packages may have to be installed before they can be used.  
## Use the operating system to do this.
##
import os
##!{sys.executable} -m pip install graphviz
##!{sys.executable} -m pip install pydotplus
##
## Tell Python where the graphviz package is load; then load it.
##
##os.environ["PATH"] += os.pathsep + 'C:/Program Files (x86)/Graphviz2.38/bin/'
##
## Import misc tree packages
##
## from io import StringIO
## from IPython.display import Image  
## import pydotplus
## import graphviz

### 1. Import Data

In [19]:
##
## Set data path
##
path = os.getcwd()+'\\Data\\'

In [60]:
##
## Import orders.csv
## a. parse Tdate as a date
##
file = 'orders.csv'
##
df_orders = pd.read_csv(path + 'orders.csv', parse_dates=True)
import_view(df)

Unnamed: 0,CID,State,ZIP,Region
0,1700,MT,59821,West
1,850,ND,58068,Midwest
2,280,NY,10007,Northeast
3,1574,WY,83120,West
4,110,CO,80403,West


Unnamed: 0,dataType
CID,int64
State,object
ZIP,int64
Region,object
