# Time Series Project (NAME TBD)

---

**Project & Final Report Created By:** Mathias Boissevain & Rachel Robbins-Mayhill, April 22, 2022

---

5 POINTS -> GENERAL STYLE
- Does your notebook have a good title? 
- Does the readme provide a description of the project and instructions on running your code? 
- Are headings used to organize the notebook? 
- Is the text free from grammatical and spelling errors?

5 POINTS -> CODE STYLE
- Is the code well commented / documented? 
- Do functions and variables have descriptive names? 
- Is code broken up into functions / modules appropriately? - Is the code formatting consistent?

10 POINTS -> ACQUISITION & PREPARATION
- Is data from the relevant tables included? 
- Is the data wrangling easily reproducible? 
- Can I import a function from an acquire/prep/wrangle module and have it give me the prepared data? 
- Are the steps taken for data acquisition and preparation well documented in the report notebook?

In [1]:
# Import for data manipulation
import pandas as pd
import numpy as np

# Import for data viz
import seaborn as sns
from matplotlib import pyplot as plt

# Import for acquisition
import env
import os
import mathias_wrangle as mw

# Add for setting to see all rows and columns
pd.options.display.max_rows = None
pd.options.display.max_columns = None

# Import to ignore warnings
import warnings
warnings.filterwarnings('ignore')

## PROJECT DESCRIPTION:

Superstore's mission is to be the preferred supplier of workspace solutions; from home-office to cooperate office, we aspire to be the leading expert in workplace solutions for everyone! For this reason, it is important to know whether we are reaching everyone with our products and services. 
This project will use exploration, modeling, and statistics to identify the best customer segment for Superstore in regards to ________ and will provide recommendations on where to shift our company focus in order to maintain happy loyal customers while continuing to grow our customer base.

## PROJECT GOAL: 

The goal of this project is to identify which customer segment is the best for SuperStore, a office-supply retail store, and make recommendations on where to shift company focus in regards to customer segment. 

## INITIAL QUESTIONS: 

##### Data-Focused Questions
- What customer base contributes the most to profit?
- What customer base contributes the most to sales?
- How does the impact of each customer segment change over time?
- What is our total revenue?
- What amount of revenue is impacted by each customer segment?

##### Overall Project-Focused Questions
- What will the end product look like?
   + 5-minute presentation to key stakeholder, with best customer segment identified and recommendation given on where to shift company focus.
- What format will it be in?
   + Slide format, with agenda, executive summary, data overview, and recommendations along with Github Repo.
- Who will it be delivered to?
   + Company CEO
- How will it be used?
   + To recommend steps to take in order to grow customer segment________ .
- How will I know I'm done?
   + When customer segment impact and recommendation have been identified, along with deliverables complete.
- What is my MVP?
   + Identify best customer segment in regards to ONE of the following areas (sales volume, total profit, % profit, or sales growth). 
- How will I know it's good enough?
   + If the exploratory process produces data-backed results outlining the 'best' customer segment along with an avenue for improving company profits. 

## HYPOTHESIS:
- Which customer segment is the best?
   + H0: The consumer customer segment's profit is <= the profit of all other customer segments.
   + H1: The consumer customer segment's profit is > the profit of all other customer segments.

---

## I. ACQUIRE

The data for this report was acquired by accessing 'superstore_db' from the Codeup SQL database. The following query was used to acquire the data:

    SELECT *
    FROM orders
    JOIN categories
    USING(`Category ID`)
    JOIN customers
    USING(`Customer ID`)
    JOIN products
    USING(`Product ID`)
    JOIN regions
    USING(`Region ID`)

### The Original DataFrame Size: 1734 rows and 22 columns.

The acquisition of this data can be replicated using the following function saved within the wrangle.py file inside the 'mwb-rrm-codeup-time-series-project' repository on GitHub:

- get_superstore(use_cache=True)  

The function receives a boolean as input to see if the user wants to receive a fresh copy of the data from the database. Then it checks to see if the file being requested already exists as a local .csv. It runs a query for the data using the assigned url, creates a new .csv if needed, then returns the superstore dataframe.

For succinctness of this report, the acquisition and preparation calls are done together in the Section II. Prepare. 


========================================================================================

## II. PREPARE

After data acquisition, the table was analyzed and cleaned to facilitate functional exploration, clarify confusion, and standardize datatypes. 

The preparation of this data can be replicated using the following function saved within the wrangle.py file inside the 'mwb-rrm-codeup-time-series-project' repository on GitHub. 

- prep_superstore

The function takes in the original superstore dataframe and returns it with the changes noted below.

### Steps Taken to Clean & Prepare Data: 

- Removed unnecessary columns due to data duplication: region_id, product_id, category_id, customer_id
 
- Formatted column names to lower case while replacing spaces with underscores for ease of use thorugh exploration

- Set date columns to datetime type for use with time series analysis: order_date, ship_date

- Set order_date as index and sort by index for time series analysis

- Set postal code to object type for ease of exploration and potential modeling

- Engineered column for the number of days it takes to ship from the order date for potential exploration

- It was identified there were no null values

---

### Results of Data Preparation

In [2]:
#Acquire the superstore_db data using wrangle.get_superstore passed into the wrangle.prep_superstore function, which will prepare the dataset.
ssdb = mw.prep_superstore(mw.get_superstore())

In [10]:
# Inspect df
ssdb.head().T

order_date,2014-01-04,2014-01-04.1,2014-01-04.2,2014-01-09,2014-01-09.1
unnamed:_0,977,978,979,942,941
order_id,CA-2014-112326,CA-2014-112326,CA-2014-112326,CA-2014-135405,CA-2014-135405
ship_date,2014-01-08 00:00:00,2014-01-08 00:00:00,2014-01-08 00:00:00,2014-01-13 00:00:00,2014-01-13 00:00:00
ship_mode,Standard Class,Standard Class,Standard Class,Standard Class,Standard Class
segment,Home Office,Home Office,Home Office,Consumer,Consumer
country,United States,United States,United States,United States,United States
city,Naperville,Naperville,Naperville,Laredo,Laredo
state,Illinois,Illinois,Illinois,Texas,Texas
postal_code,60540.0,60540.0,60540.0,78041.0,78041.0
sales,11.784,272.736,3.54,31.2,9.344


### Prepared DataFrame Size: 1734 rows, 19 columns.
- Dropped 4 columns, added 1.

---

### PREPARE - SPLIT the DATA

========================================================================================

---

## III. EXPLORE

10 POINTS
- Are figures well labeled (title, x + y labels)? 
- Does the type of visualization make sense for the variables being explored? 
- Is color used in an appropriate way? 
- Are takeaways documented? 
- Are questions asked and answered? 
- Are statistical tests used appropriately to back up conclusions?

#### Univariate:

#### Bivariate:

#### Multivariate:

---

### EXPLORE - Questions

#### Question 1:  Which customer segment is the 'best'?

##### Visualization:

##### Stats Testing:

#### Answer: 

---

#### Question 2: How do customer segment profit and sales change over time on average and in total?

##### Visualization:

##### Stats Testing:

#### Answer: 

---

#### Question 3: Is sales total impacted by sales average?

##### Visualization:

##### Stats Testing:

#### Answer: 

---

#### Question 4:

##### Visualization:

##### Stats Testing:

#### Answer: 

---

### EXPLORATION TAKEAWAYS

---

## MODELING (OPTIONAL)

---

# SUMMARY

---

## RECOMMENDATIONS

10 POINTS -> RECOMMENDATIONS
- Are recommendations that answer the original business question present, 
- clearly communicated, 
- backed by data, 
- and supported by visualizations?

---

## NEXT STEPS