It's important to know how to directly multi-index DataFrames.

Like the groupby function, multi-indexing allows data to be grouped and accessed or manipulated by group. 

**Multi-indexing** is the process of indexing a dataset by more than one value. Multi-indexing is like using two bookmarks in a book. Each bookmark is an index, and depending on which index you go to, you'll get different content.

**Multi-indexing** is sometimes referred to as hierarchical indexing, as relationships can exist between indexes. For example, a state can be one index and a city can be another. Because a city belongs to a state, these indexes would be hierarchical.

Essentially, multi-indexing improves data storage, lookup, and manipulation/assignment.

### Import Libraries and Dependencies

In [2]:
import pandas as pd


### Read in CSV as Pandas DataFrame and Set the Index



In [3]:
# Read in data
df = pd.read_csv("twtr_google_finance.csv", parse_dates=True, index_col='Date', infer_datetime_format=True)
df.head()

Unnamed: 0_level_0,Close
Date,Unnamed: 1_level_1
2019-04-09,35.139999
2019-04-10,34.75
2019-04-11,34.580002
2019-04-12,34.369999
2019-04-15,34.709999


### Display DataFrame Index

In [4]:
df.index

DatetimeIndex(['2019-04-09', '2019-04-10', '2019-04-11', '2019-04-12',
               '2019-04-15', '2019-04-16', '2019-04-17', '2019-04-18',
               '2019-04-22', '2019-04-23', '2019-04-24', '2019-04-25',
               '2019-04-26', '2019-04-29', '2019-04-30', '2019-05-01',
               '2019-05-02', '2019-05-03', '2019-05-06', '2019-05-07',
               '2019-05-08', '2019-05-09'],
              dtype='datetime64[ns]', name='Date', freq=None)

Multi-indexing is commonly done when working with Date data.

When used as an index, Date data is considered a DatetimeIndex. DatetimeIndexes have the ability to inherently create multi-indexing.

Group by year, month, and day with 3 level index

### Create Multiple Indices by Grouping By DatetimeIndex `year`, `month`, and `day` with `first` Function

In [5]:
# Group by year, month, and day and grab first of each group

## DatetimeIndexes can be split into year, month, and day segments. 
## The DatetimeIndex object includes the attributes index.year, index.month, and index.day for this. 
## Passing these to a groupby statement will create multiple indexes based on each attribute.

df_grp = df.groupby([df.index.year, df.index.month, df.index.day]).first() 
df_grp

## The first function is used to display the first value for each group within a GroupBy object.
## In this case, every group down to the year, month, and day level is unique 
## Therefore grabs the first and only value of every group.

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Close
Date,Date,Date,Unnamed: 3_level_1
2019,4,9,35.139999
2019,4,10,34.75
2019,4,11,34.580002
2019,4,12,34.369999
2019,4,15,34.709999
2019,4,16,34.459999
2019,4,17,34.48
2019,4,18,34.400002
2019,4,22,34.389999
2019,4,23,39.77


### Create Multiple Indices by Grouping By DatetimeIndex `year` and `month` with `first` Function

Multi-indexed data can be selected by using the first and last functions. First selects the first multi-index group, and last selects the last group.

In [6]:
# Group by year and month and take the first value of each group
df_grp_2 = df.groupby([df.index.year, df.index.month]).first()
df_grp_2

Unnamed: 0_level_0,Unnamed: 1_level_0,Close
Date,Date,Unnamed: 2_level_1
2019,4,35.139999
2019,5,39.290001


### Create Multiple Indices by Grouping By DatetimeIndex `year` and `month` with `last` Function

In [7]:
# Group by year and month and take the last value of each group
df_grp_3 = df.groupby([df.index.year, df.index.month]).last()
df_grp_3

Unnamed: 0_level_0,Unnamed: 1_level_0,Close
Date,Date,Unnamed: 2_level_1
2019,4,39.91
2019,5,38.790001


### Create Multiple Indices by Grouping By DatetimeIndex `year` and `month` with `mean` Function

Because multi-indexing involves grouping data, an aggregation can be applied against the data. A common example is the mean function for calculating average. This is an alternative to using the first and last functions. Because aggregate functions are being used, outputs represent summarized/aggregated records.

In [8]:
# Group by year and month and calculate the average of each group
df_grp_4 = df.groupby([df.index.year, df.index.month]).mean()
df_grp_4

Unnamed: 0_level_0,Unnamed: 1_level_0,Close
Date,Date,Unnamed: 2_level_1
2019,4,36.478666
2019,5,39.465715


### Slice Data for 2019

The loc function can be used to slice data from a DataFrame with multiple indexes.

While not all indexes are required to be passed, the top level index needs to be specified (e.g., year).

When all indexes are passed to the loc function, only one record will be returned. If fewer than all indexes are provided, more than one record of data will be output.

Essentially, indexes must be accessed and used hierarchically (e.g., year > month > day).

In [9]:
df_grp

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Close
Date,Date,Date,Unnamed: 3_level_1
2019,4,9,35.139999
2019,4,10,34.75
2019,4,11,34.580002
2019,4,12,34.369999
2019,4,15,34.709999
2019,4,16,34.459999
2019,4,17,34.48
2019,4,18,34.400002
2019,4,22,34.389999
2019,4,23,39.77


In [10]:
# Slice data for April 2019 from first group
df_slice = df_grp.loc[2019]
df_slice

Unnamed: 0_level_0,Unnamed: 1_level_0,Close
Date,Date,Unnamed: 2_level_1
4,9,35.139999
4,10,34.75
4,11,34.580002
4,12,34.369999
4,15,34.709999
4,16,34.459999
4,17,34.48
4,18,34.400002
4,22,34.389999
4,23,39.77


### Slice Data For All Days in April 2019

In [11]:
# Slice data for April 2019 from first group
df_slice = df_grp.loc[2019,4]
df_slice

Unnamed: 0_level_0,Close
Date,Unnamed: 1_level_1
9,35.139999
10,34.75
11,34.580002
12,34.369999
15,34.709999
16,34.459999
17,34.48
18,34.400002
22,34.389999
23,39.77


### Slice Data For All Days in 04/12/2019

In [12]:
# Slice data for 4/12/2019 from first group
df_slice = df_grp.loc[2019,4,12]
df_slice

Close    34.369999
Name: (2019, 4, 12), dtype: float64