![logo.png](logo.png)

# Data Scientist and Sr. Data Scientist: Case Study (Core)

##  Overview

In this exercise, you will be given data to create a gross sales forecast and prepare a presentation about your findings. You will have 4 hours to complete this. 90 minutes after you begin the case study, an assessor will check-in to see if you have any questions. Your presentation will follow the 4 hours given to prepare and will last a total of 30 minutes: 15 minutes to present your findings and 15 minutes for answering questions. Please read this guide thoroughly before beginning the exercise.

We understand your desire to demonstrate your ability to use complex data science techniques, however, please be mindful of the 4 hour time constraint! In your presentation, we encourage you to highlight additional techniques and/or areas of exploration you would pursue given more time. Keep in mind, for us to fully evaluate your performance, a completed assignment that can thoughtfully communicate the results is needed.

### Background

As a Data Scientist, you are expected to demonstrate expertise in the following areas:

- Successful interpretation of data and feature engineering.

- Data cleansing and outlier control.

- Making data-driven recommendations.

- Communicating your recommendations in a clear and effective manner.

The intent of this case study is to evaluate you on these four key aptitudes. In evaluating your performance, we will be assessing both the quantitative and qualitative elements of your answer. We look forward to reviewing your results.

### Problem Set Up

You have recently been asked to develop a monthly forecasting model for a new product category to be sold within our stores. The data available is modeled after a subset of operational data that we use at Home Depot when confronting a problem of this nature. You are allowed and encouraged to answer this problem in whatever way best suits you and highlights your talents (e.g., program or software of choice). Note: that the scenario outlined below is fictitious and should NOT be construed as indicating The Home Depot has any plans to change its business.

### Scenario

Beginning next year, The Home Depot will begin to carry work apparel. Having never merchandised products of this type before, it is your responsibility to determine how customer demand will respond to this assortment expansion on a monthly basis. Our merchandising team has provided you an annual estimate of how much they expect to sell next year. You will need to articulate the likely financial impact to monthly sales of incorporating this new product line.

**To be successful on this assignment, you must:**

- Create a monthly FY 2020 sales forecast for the new products.

- Create a monthly FY 2020 total sales forecast for all product categories, including the new apparel category.

### Expectations

Successful completion of this exercise will require that you provide the following:

- Any code or spreadsheets used to create your recommendation including annotated comments on what was done.

- Any modified datasets that you produce.

- Any relevant visualizations you created (e.g., charts, graphs, etc.).

- A non-technical presentation of your findings.

All materials provided should be sufficient to replicate your approach. All data provided to the assessors is sufficient to complete this exercise - no external data should be needed or used. 


### Data Model

|**Table**|**Definition**|
| :-: | :-: |
|`HISTORICAL_CUSTOMER_SALES`|Product-level monthly sales (i.e. demand) for existing products carried by The Home Depot as well as new products.|
|`MONTHLY_SALES_ESTIMATE`|By category, the historical estimate that has been provided for existing categories.|
|`CURRENT_PRODUCT_ATTRIBUTES`|Product-level attributes for the existing products assorted at The Home Depot (e.g. dimensional data, space allocation, price, cost, etc.)|



For each table provided, please consult the following definitions for the columns within each table:

|`HISTORICAL_CUSTOMER_SALES`|  |
| :-: | :-: |
|**Column**|**Definition**|
|`PRODUCT_ID`|Unique identifier for the product.|
|`FISCAL_MONTH_END_DT`|The last day of the fiscal month.|
|`GROSS_SALES_QTY`|The gross sales (in units) recorded for the entirety of the month.|
|`NET_SALES_QTY`|The difference between gross sales and returns, in units.|


|`MONTHLY_SALES_ESTIMATE`| |
| :-: | :-: |
|**Column**|**Definition**|
|`PRODUCT_CATEGORY_NAME`|Name of the category of products to which the product belongs.|
|`FISCAL_YEAR`|Fiscal year for the entry.|
|`FISCAL_MONTH_END_DT`|The last day of the fiscal month.|
|`GROSS_SALES_ESTIMATE`|The gross sales estimate, in dollars, for the entirety of the month.|




|`PRODUCT_ATTRIBUTES`| |
| :-: | :-: |
|**Column**|**Definition**|
|`PRODUCT_ID`|Unique identifier for the product.|
|`PRODUCT_CATEGORY_NAME`|Name of the category of products to which the product belongs.|
|`PRODUCT_NAME`|Product displayed name to the customer.|
|`CURR_RETL_PRICE`|The current price of a single sellable unit of the product at a store.|
|`HEIGHT`|Height of the individual product as it is merchandised on the shelf.|
|`WIDTH`|Width of the individual product as it is merchandised on the shelf.|
|`DEPTH`|Depth of the individual product as it is merchandised on the shelf.|
|`WEIGHT`|Weight of the individual product as it is merchandised on the shelf.|
|`HAZMAT_FLG`|Indicates whether or not the product contains hazardous materials.|
|`MERCURY_FLG`|Indicates whether or not the product contains mercury.|
|`ELECTRIC_FLG`|Indicates whether or not the product is electrically powered.|
|`CHEMICAL_FLG`|Indicates whether or not the product is a chemical.|
|`LIQUID_FLG`|Indicates whether or not the product is a liquid.|
|`AEROSOL_FLG`|Indicates whether or not the product is an aerosol.|
|`PESTICIDE_FLG`|Indicates whether or not the product is a pesticide.|
|`BATTERY_FLG`|Indicates whether or not the product includes a battery.|
|`GAS_FLG`|Indicates whether or not the product is gas powered.|
|`MARKETING_COPY`|Text that appears on the website describing the product.|



|`FISCAL_CALENDAR`|  |
| :-: | :-: |
|**Column**|**Definition**|
|`FISCAL_YEAR`|Fiscal year|
|`FISCAL_MONTH_NBR`|Month of the fiscal year.  1 = February, 12 = January.|
|`FISCAL_MONTH_BEGIN_DT`|The first day of the fiscal month.|
|`FISCAL_MONTH_END_DT`|The last day of the fiscal month.|




![](https://cdn.mathpix.com/cropped/2022_09_15_1be9766c3a4f334ce49fg-5.jpg?height=1296&width=1895&top_left_y=187&top_left_x=104)





## Answer Sheet


This worksheet constitutes the **quantitative** component of your assessment.   

The 2020 Forecast (in Dollars) For **ONLY** the new products:

|**Fiscal Month**|**Month Ending Date**|**Winter Work Jacket**|**Winter Boot**|**Wool Hat**|
| :-: | :-: | :-: | :-: | :-: |
|February|3/1/2020||||
|March|3/29/2020||||
|April|5/3/2020||||
|May|5/31/2020||||
|June|6/28/2020||||
|July|8/2/2020||||
|August|8/30/2020||||
|September|9/27/2020||||
|October|11/1/2020||||
|November|11/29/2020||||
|December|12/27/2020||||
|January|1/31/2021||||

The 2020 Forecast (in Dollars) for **ALL PRODUCTS** including the new products:

|**Fiscal Month**|**Month Ending Date**|**Total Monthly Forecast (in Millions of $)**|
| :-: | :-: | :-: |
|February|3/1/2020||
|March|3/29/2020||
|April|5/3/2020||
|May|5/31/2020||
|June|6/28/2020||
|July|8/2/2020||
|August|8/30/2020||
|September|9/27/2020||
|October|11/1/2020||
|November|11/29/2020||
|December|12/27/2020||
|January|1/31/2021||


In [None]:
!git clone --branch home_depot_1 https://github.com/interviewquery/takehomes.git
%cd takehomes/home_depot_1
!if [[ $(ls *.zip) ]]; then unzip *.zip; fi
!ls