### Congrats!

Your previous work is proving useful, so your boss has asked to expand your analysis to the whole chain. 

Let's get the new data: `all_stores.csv` under `data/`

# Imports

In [11]:
import os
import pandas as pd
import numpy as np
import hashlib # for grading purposes
%matplotlib inline

# Exercise 0 - Read the Data

In [12]:
chain = pd.read_csv(os.path.join('data/all_stores.csv'))
chain


Unnamed: 0,date,store_nbr,customers
0,23-05-2015,6,2053
1,17-02-2014,2,1766
2,26-03-2015,15,1336
3,14-09-2015,35,637
4,14-07-2013,44,4383
...,...,...,...
82673,25-07-2014,9,1761
82674,26-12-2014,27,2075
82675,03-07-2014,47,3784
82676,10-05-2014,13,1392


In [13]:
print('We now have %0.0f data points. Wooooow!' % len(chain))

We now have 82678 data points. Wooooow!


The thing is, we can't just set the index to be the day, as we now have multiple stores on the same day. 

Looks like we have to go into multi-indexing...

## Exercise 1: Multi-indexing

#### 1.1) make the date into a datetime, and then set the index to be `[date, store_nbr]`

In [14]:
chain['date'] = pd.to_datetime(chain['date'],format='%d-%m-%Y')
chain = chain.set_index(['date','store_nbr'])
chain = chain.sort_index()

#food = food.set_index(['Store', 'Product'])

# YOUR CODE HERE
#raise NotImplementedError()

In [15]:
assert hashlib.sha256(str(chain.index.get_level_values(0).dtype).encode()).hexdigest() == \
        '261738f2e43a1c47a16f043b46deb993943d61f4a2bbe5ef4b03c3fb1af362b5', "First level of index should be a datetime!"
assert hashlib.sha256(str(chain.index.get_level_values(0)).encode()).hexdigest() ==  \
        'ea3a4358e60ac9e478fd489b4ea9a3e2ebe0256e823b416125cf544160073b1f', "Is the index sorted? Never forget to sort your time series."
assert hashlib.sha256(str(chain.index.get_level_values(1)).encode()).hexdigest() ==  \
        'c602dc5ca179ea4f0f7d09e14dcd28a5cbdce1571acd6040357a513c0f70d53a', "Check if you selected the correct values for the second level."

#### 1.2) Customers on shop 10, April 2016

What's the maximum daily number of customers, on store 10, on April 2016?

In [18]:
idx = pd.IndexSlice  # <---- convention, get ready to copy paste this a lot 

In [17]:
chain

Unnamed: 0_level_0,Unnamed: 1_level_0,customers
date,store_nbr,Unnamed: 2_level_1
2013-01-01,25,770
2013-01-02,1,2111
2013-01-02,2,2358
2013-01-02,3,3487
2013-01-02,4,1922
...,...,...
2017-07-31,50,2593
2017-07-31,51,1572
2017-07-31,52,2206
2017-07-31,53,1065


In [20]:
# hint: the answer should be an integer

max_store10 = chain.loc[idx['April 2016', 10],'customers'].max()
max_store10

# YOUR CODE HERE
#raise NotImplementedError()

1532

In [21]:
expected_hash = 'f76cb816b3f74ecf30d387c64869038ac163fe26f8aabd727c1071dd567fc3d5'
assert hashlib.sha256(str(max_store10).encode()).hexdigest() == expected_hash

print(f"Correct! The maximum daily number of customers on April 2016 was {max_store10}.")

Correct! The maximum daily number of customers on April 2016 was 1532.


#### 1.3) how many new stores opened in 2015, given that no existing stores closed in 2015?

In [None]:
# hint : Check number of stores at last day of each year.
nr_stores_2014 = 
nr_stores_2015 = 
nr_stores_opened_2015 = nr_stores_2015 - nr_stores_2014

# YOUR CODE HERE
#raise NotImplementedError()

In [None]:
expected_hash = 'ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d'
assert hashlib.sha256(str(nr_stores_opened_2015).encode()).hexdigest() == expected_hash

#### 1.4) Record number of customers

Find the total number of customers in 2015 in each store. The result should be a pandas dataframe where the index corresponds to the `store_nbr` and the values to the corresponding sum of customers on record for 2015.

In [None]:
# sum_per_store_2015 = 

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert sum_per_store_2015.shape[0] == 53, "There should be 53 stores in record_per_store's index."
expected_hash_1 = '6e941f790715b03b4634bdc0d844ab7743caaf05d633b71eb6e01f2efa95d777'
expected_hash_2 = 'a2fcf2b0fb1b2b76eac8ed32e88fb3de06d460db4e7c606efe3dc5dc09588a0a'
assert hashlib.sha256(str(sum_per_store_2015.iloc[5, :].values[0]).encode()).hexdigest() == expected_hash_1
assert hashlib.sha256(str(sum_per_store_2015.iloc[35, :].values[0]).encode()).hexdigest() == expected_hash_2

print(f"Good job!! Also, the store with the highest count of customers in 2015 was store {sum_per_store_2015.idxmax()[0]}, with a total of {sum_per_store_2015.max()[0]} customers. Now that's a lot of customers!...")

---