# MarketDial Data Science Challenge - Take Home

## Intro to MarketDial
At MarketDial we build software that enables our customers to make sophisticated multi-million dollar marketing, pricing, staffing, and operational decisions through offline A/B testing. Our customers are leaders in the retail, grocery, c-store, and restaurant spaces. 

While the online A/B testing suite of tools is very robust, offline (brick and mortar) requires different science to run quality tests. This exercise is meant to give you a glimpse into the type of work we do and problems we tackle, as well as give us insight into your problem solving ability and technical skills.

## What We're Looking For:
* Assume you're a data scientist on a team at MarketDial; for this assignment your deliverable will be a presentation of your analysis (including methodology, results, and insights) to an audience of data scientists and business team members. 
* We expect this take home challenge to take you approximately 2 hours, though there is no strict time limit; you may complete it whenever you like. You do *not* need to submit anything back to us, but instead will be presenting your results during your next scheduled interview.
* You may structure your presentation any way you see fit; note that we highly value creative problem solving, clear communication, and concise, simplified code. Understanding the path to your final answers as well as actionable insight from your analysis are at least as important as the results themselves.
* The questions are purposefully agnostic of any specific methodology,  techniques, or tools to use; notably, feel free to use any tool(s) (e.g. Python, SQL, R) that you are comfortable with to complete this analysis.


## Additional Guidelines
* Feel free to search the web as you normally would at work; however, please don't plagiarize or, for fairness, receive direct assistance.
* Please do not share this exercise or the data we've provided anywhere publicly accessible.
* If you encounter any bugs, questions, or misunderstandings, use your discretion in making any assumptions needed to complete the exercise, and please make note of them.

## Data

The following data has been provided to you to help inform your answers.

#### store_attributes.csv
* ***store_id*** - store's unique identifier
* ***attribute_id*** - name of the store attribute
* ***attribute_type*** - data type of the attribute's value
* ***attribute_int_val*** - value of the store attribute if attribute_type is integer
* ***attribute_str_val*** - value of the store attribute if attribute_type is string
* ***attribute_float_val*** - value of the store attribute if attribute_type is float

#### transactions.csv
* ***date_week*** - start date of the 7-day week the row's cumulative revenue value represents
* ***store_id*** - store's unique identifier
* ***product_id*** - product's unique identifier
* ***currency_code*** - currency that the revenue is measured in
* ***revenue*** - the revenue from sale of a specified product for a specific store in a given week

#### products_of_interest.csv
* contains product id's of interest for exercise (more details below)

#### q3_treatment_stores.csv
* contains treatment store id's of interest for question 3 (more details below)

#### q3_control_stores.csv
* contains control store id's of interest for question 3 (more details below)

## Challenge

### Premise
Suppose we have a client, Client X, that sells a variety of snacks and beverages at all of its stores. Client X suspects that by putting up new displays for Brand Z's candy, that consumers will purchase more of Brand Z's candy leading to higher revenue for candy. If Client X's intuition is correct, these new displays would lead to a multi-million dollar increase in total revenue across all stores; if they are incorrect, they will have wasted time and money putting up the new signs at best, and at worst they could see revenue from Brand Z's candy (or other candy brands) decrease. 

To mitigate this risk, Client X decides to conduct an experiment in a subset of stores first before making the decision to roll out the new signs to all stores. Client X enlists the help of MarketDial to help them devise an experiment to detect whether the new signs will result in a statistically significant increase in revenue from Brand Z's candy.

**Brand Z's candies are products (ids) listed in `products_of_interest.csv`**

*Note that the details we provide in each question should not necessarily inform your answers to the others.

### Question 1
Outline, **in words**, the steps of an experiment, from start to finish, to detect whether putting up new displays for Brand Z's candy will result in an increase in revenue from Brand Z's candy. Suppose your audience for this outline is the client, who has some general statistical understanding (i.e. you don't need to get into the weeds of any techniques/algorithms), but is looking to understand more thoroughly how this experiment will be set up and analyzed from enlisted help-- you, the expert data scientist.

### Question 2
One challenge in offline (brick and mortar) A/B testing compared to online A/B testing is that randomization is not possible at the customer level. Consequently, alternative methods must be used in choosing treatment and control groups. One way to overcome this challenge is to test at the store level and strategically select sets of stores to use as treatment and control groups. 

Utilizing data and your statistical expertise, how could we intelligently improve upon random selection to obtain (1) a set of treatment stores that better represents the complete set of stores to which we want to roll out the changes and (2) a set of control stores that are a better baseline for our experiment? Which exact stores (by id) would you use in the control group and which in the treatment group for this experiment? Please explain how you came to this decision (e.g. methodology, algorithms, assumptions, etc.). Assume the data we've provided contains all stores.

### Question 3
Suppose we decided to conduct this experiment using the following stores:
* **Treatment stores:** 
  * (ids) listed in `q3_treatment_stores.csv`
* **Control stores:** 
  * (ids) listed in `q3_control_stores.csv`

Suppose, also, that we've completed collecting data for the experiment and are using the following time periods:
* **Pre-Test Period:** (X weeks prior to the start of the implementation)
  * 07/17/2016 - 10/15/2016  
* **Implemention Period:** (The period of time needed to actually get get the experiment ready; in this case, the time to put the signs up across all stores)
  * 10/16/2016 - 11/12/2016  
* **Test Period:**  (X weeks after the end of the implementation period)
  * 11/13/2016 - 02/12/2017

How would you quantify the impact of this test on revenue from Brand Z's candy sales? Should we recommend Client X roll out the new signs to all of their stores? Explain your analysis and reasoning, and include any code used.