## Multi-Index
recall the number of customers data from before

In [29]:
# run forest run
import pandas as pd

customer_num = pd.Series([100, 80, 60, 200],
                       index=['humus_hakerem', 'falafel_gina', 
                              '24_rupee', 'pizza_munch']) # number of customers per day

hours_open = pd.Series([10, 12, 9, 17],
                      index=['humus_hakerem', 'falafel_gina', 
                             'al_harampa', '24_rupee'])

print('customer_num=\n{}\n'.format(customer_num))
print('hours_open=\n{}\n'.format(hours_open))

print('average number of customers per hour\n{}\n'.format(customer_num/hours_open))

customer_num=
humus_hakerem    100
falafel_gina      80
24_rupee          60
pizza_munch      200
dtype: int64

hours_open=
humus_hakerem    10
falafel_gina     12
al_harampa        9
24_rupee         17
dtype: int64

average number of customers per hour
24_rupee          3.529412
al_harampa             NaN
falafel_gina      6.666667
humus_hakerem    10.000000
pizza_munch            NaN
dtype: float64



it turns out this is data from 2017, while the new data reflecting the ITC fellows increasing hunger for data and food is

In [30]:
# run forest run
customer_num_2018 = pd.Series([120, 90, 62, 180], 
                             index=['humus_hakerem', 'falafel_gina',
                                   '24_rupee', 'pizza_munch'])
customer_num_2018

humus_hakerem    120
falafel_gina      90
24_rupee          62
pizza_munch      180
dtype: int64

It sure would be nice to show them side by side. I have an idea, let's use a data frame

In [31]:
# run forest run
customer_num_df = pd.DataFrame({'2017':customer_num, '2018':customer_num_2018})
customer_num_df

Unnamed: 0,2017,2018
humus_hakerem,100,120
falafel_gina,80,90
24_rupee,60,62
pizza_munch,200,180


This is nice. Let's do the same for opening hours

In [32]:
# run forest run
hours_open_2018 = pd.Series([11, 11, 9, 15],
                            index=['humus_hakerem', 'falafel_gina', 
                                   'al_harampa', '24_rupee'])
hours_open_df = pd.DataFrame({'2017':hours_open, '2018':hours_open_2018})
hours_open_df

Unnamed: 0,2017,2018
humus_hakerem,10,11
falafel_gina,12,11
al_harampa,9,9
24_rupee,17,15


But how can we represent opening hours and number of customers on the same table. Here's a trick

In [33]:
# run forest run
customer_num_df.stack()

humus_hakerem  2017    100
               2018    120
falafel_gina   2017     80
               2018     90
24_rupee       2017     60
               2018     62
pizza_munch    2017    200
               2018    180
dtype: int64

What's this now? You've just created your first multi-index. See how well it turns out

In [34]:
# run forest run
res_data = pd.DataFrame({'opening_hours': hours_open_df.stack(), 'customer_num':customer_num_df.stack()})
res_data

Unnamed: 0,Unnamed: 1,opening_hours,customer_num
24_rupee,2017,17.0,60.0
24_rupee,2018,15.0,62.0
al_harampa,2017,9.0,
al_harampa,2018,9.0,
falafel_gina,2017,12.0,80.0
falafel_gina,2018,11.0,90.0
humus_hakerem,2017,10.0,100.0
humus_hakerem,2018,11.0,120.0
pizza_munch,2017,,200.0
pizza_munch,2018,,180.0


we can also reverse this process

In [35]:
# run forest run
res_data.unstack()  

Unnamed: 0_level_0,opening_hours,opening_hours,customer_num,customer_num
Unnamed: 0_level_1,2017,2018,2017,2018
24_rupee,17.0,15.0,60.0,62.0
al_harampa,9.0,9.0,,
falafel_gina,12.0,11.0,80.0,90.0
humus_hakerem,10.0,11.0,100.0,120.0
pizza_munch,,,200.0,180.0


We can do a lot of clever stuff with indices. Let's give them names to make this easier.

In [36]:
# run forest_run
res_data.index.names = ['restaurant', 'year']
res_data

Unnamed: 0_level_0,Unnamed: 1_level_0,opening_hours,customer_num
restaurant,year,Unnamed: 2_level_1,Unnamed: 3_level_1
24_rupee,2017,17.0,60.0
24_rupee,2018,15.0,62.0
al_harampa,2017,9.0,
al_harampa,2018,9.0,
falafel_gina,2017,12.0,80.0
falafel_gina,2018,11.0,90.0
humus_hakerem,2017,10.0,100.0
humus_hakerem,2018,11.0,120.0
pizza_munch,2017,,200.0
pizza_munch,2018,,180.0


We can convert an index to a column with `reset_index` and a column to index with `set index`. You try it
1. turn restaurant and year to columns
2. from the resulting data frame convert restaurant and opening hours back to indices

In [39]:
reset_data = res_data.reset_index()
print(reset_data)
new_data = reset_data.set_index(['restaurant','opening_hours'])
print(new_data)

      restaurant  year  opening_hours  customer_num
0       24_rupee  2017           17.0          60.0
1       24_rupee  2018           15.0          62.0
2     al_harampa  2017            9.0           NaN
3     al_harampa  2018            9.0           NaN
4   falafel_gina  2017           12.0          80.0
5   falafel_gina  2018           11.0          90.0
6  humus_hakerem  2017           10.0         100.0
7  humus_hakerem  2018           11.0         120.0
8    pizza_munch  2017            NaN         200.0
9    pizza_munch  2018            NaN         180.0
                             year  customer_num
restaurant    opening_hours                    
24_rupee      17.0           2017          60.0
              15.0           2018          62.0
al_harampa    9.0            2017           NaN
              9.0            2018           NaN
falafel_gina  12.0           2017          80.0
              11.0           2018          90.0
humus_hakerem 10.0           2017         10

Most methods of pandas objects can take level (index name) as a parameter. Use the `mean` method of `res_data` to derive
1. average number of customers per restaurant
2. average number of customers per year

**Note** pandas object methods are mostly NaN safe by default

In [51]:
print(res_data.groupby('restaurant').mean()['customer_num'])
print(res_data.groupby('year').mean()['customer_num'])

restaurant
24_rupee          61.0
al_harampa         NaN
falafel_gina      85.0
humus_hakerem    110.0
pizza_munch      190.0
Name: customer_num, dtype: float64
year
2017    110.0
2018    113.0
Name: customer_num, dtype: float64


Multi-indexing is great and allows great flexibilty in handling and displaying complicated data. It's best to think as adding a multi-index as adding a new dimension (or subdimension) to your data. Slicing and indexing dataframes with multi-index is tricky and should be handled with care. The intricacies of indexing and the multiple ways of creating multi-indexes are out of the scope of this exercise