---
title: "Aggregate Data on Multi-Indices"
description: "Pandas comes built-in with many aggregation methods like `mean()`, `sum()`, and `max()`. For hierarchically indexed data, you can pass an additional argument `level` to specify which subset of the data to perform the operation on."
tags: Pandas
URL: https://github.com/jakevdp/PythonDataScienceHandbook/
Licence: MIT for code, Text is copyrighted
Creator: 
Meta: "mean sum max"

---

 <div>
    	<img src="./coco.png" style="float: left;height: 55px">
    	<div style="height: 150px;text-align: center; padding-top:5px">
        <h1>
      	Aggregate Data on Multi-Indices
        </h1>
        <p>Pandas comes built-in with many aggregation methods like `mean()`, `sum()`, and `max()`. For hierarchically indexed data, you can pass an additional argument `level` to specify which subset of the data to perform the operation on.</p>
    	</div>
		</div> 

 <div style="height:40px">
		<div style="width:100%; text-align:center; border-bottom: 1px solid #000; line-height:0.1em; margin:40px 0 20px;">
    	<span style="background:#fff; padding:0 10px; font-size:25px; font-family: 'Open Sans', sans-serif;">
        Example
    	</span>
		</div>
		</div>
			

## Create some example data

In [186]:
import pandas as pd
# hierarchical indices and columns
index = pd.MultiIndex.from_product([[2013, 2014], [1, 2]],
                                   names=['year', 'visit'])
columns = pd.MultiIndex.from_product([['Alice', 'Bob', 'Sue'], ['HR', 'Temp']],
                                     names=['subject', 'type'])

# mock some data
data = np.round(np.random.randn(4, 6), 1)
data[:, ::2] *= 10
data += 37

# create the DataFrame
health_data = pd.DataFrame(data, index=index, columns=columns)
health_data

Unnamed: 0_level_0,subject,Alice,Alice,Bob,Bob,Sue,Sue
Unnamed: 0_level_1,type,HR,Temp,HR,Temp,HR,Temp
year,visit,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
2013,1,52.0,35.4,33.0,36.7,29.0,35.9
2013,2,32.0,36.6,37.0,35.8,43.0,38.3
2014,1,52.0,38.1,39.0,37.6,35.0,38.1
2014,2,33.0,37.2,39.0,38.9,35.0,38.0


## Mean on year rows

In [187]:
data_mean = health_data.mean(level='year')
data_mean

subject,Alice,Alice,Bob,Bob,Sue,Sue
type,HR,Temp,HR,Temp,HR,Temp
year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2013,42.0,36.0,35.0,36.25,36.0,37.1
2014,42.5,37.65,39.0,38.25,35.0,38.05


## Mean on type columns

In [188]:
data_mean.mean(axis=1, level='type')

type,HR,Temp
year,Unnamed: 1_level_1,Unnamed: 2_level_1
2013,37.666667,36.45
2014,38.833333,37.983333
