## Context

The objective of this test is to perform data anlysis in order to achieve the following tasks:

1. Determine and return the top 3 retailers (not stores) in terms of revenue for the requested time period


2. Determine the top product family in terms of sales for the requested time period, and return its daily sales

## Inputs

3 csv files are provided as inputs.

In [4]:
import pandas as pd

In [7]:
stores = pd.read_csv('data/stores.csv', delimiter=',')
stores.head(2)

Unnamed: 0,store_id,store_name,retailer_id
0,0,store_0,2
1,1,store_1,1


In [8]:
products = pd.read_csv('data/products.csv', delimiter=',')
products.head(2)

Unnamed: 0,product_id,product_name,family_id,price
0,0,product_0,6,109.9
1,1,product_1,8,238.7


In [9]:
sales_data = pd.read_csv('data/sales_data.csv', delimiter=',')
sales_data.head(2)

Unnamed: 0,product_id,store_id,date,sales,revenue
0,0,0,2016-01-01,106,11649.4
1,0,0,2016-01-02,118,12968.2


## Solution

I've chosen to code two ways of doing it.

In the first version, I focused on building objects in order to have a high interpretability when reading the code.

In the second version, I focused on performance. I used pandas method extensively to minimize execution time.

## Code

#### Version 1

This solution is object-oriented and corresponds to module <i>data_analysis_v1.py</i>

1 - <b>init()</b>: creates instances <i>Product</i>, <i>Store</i>, <i>Sales</i> and fill <i>product_list</i>, <i>store_list</i> and <i>sales_list</i>

2 - <b>topRetailersRev()</b>, <b>topFamilySales()</b>: build and display two dictionaries <i>dict_rev</i> and <i>dict_sales</i>

#### Version 2

This solution is performance-oriented and corresponds to module <i>data_analysis_v2.py</i>

1 - Preprocessing: <b>join()</b> & <b>groupBy()</b>

2 - Fill and display dictionaries <i>dict_rev_sorted</i>, <i>dict_sales_sorted</i>

## Running the code

The module version needs to be imported in a python environment (3.7)

<b>topRetailersRev()</b> and <b>topFamilySales()</b> are the functions to run to perform the two tasks.

They both take 3 parameters:

- path to data in string format

- start date in datetime.date format

- end date in datetime.date format

Below is an example of running the first task with version 2:

<img src="bash_screenshot.png"></img>

## Test module

The test module corresponds to <i>test.py</i>

It runs unitary tests on <i>data_analysis_v2.py</i>. The tests consists in simulating basic dataframes and testing the two tasks.

<b>Important</b>: to make it run properly, you need to create an empty directory and write its path in variable <i>PATH_TEST</i> in the module like in the below screenshot:

<img src="path_screenshot.png"><img>

The test module can be run in bash command:

<img src="test_screenshot.png"></img>