# Introduction

Maps allow us to transform data in a DataFrame or Series one value at a time for an entire column. However, often we want to group our data, and then do something specific to the group the data is in. 

As you'll learn, we do this with the `groupby()` operation.  We'll also cover some additional topics, such as more complex ways to index your DataFrames, along with how to sort your data.

# Groupwise analysis

One function we've been using heavily thus far is the `value_counts()` function. We can replicate what `value_counts()` does by doing the following:

In [3]:

import pandas as pd
data = pd.read_csv("Salary_Data.csv", index_col=0)
pd.set_option("display.max_rows", 5)

In [4]:
data.head()

Unnamed: 0_level_0,Salary,Skill,Age,education,expense,savings
YearsExperience,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
100.0,3900,c++,22,bachelor,31200.0,7800.0
110.0,390000,python,22,masters,31200.0,7800.0
2.0,37731,c++,24,bachelor,30184.8,7546.2
2.0,43525,,26,bachelor,34820.0,8705.0
2.2,39891,c,23,masters,31912.8,7978.2


In [7]:
data.groupby('education').education.count()

education
bachelor    20
masters     10
Name: education, dtype: int64

`groupby()` created a group of reviews which allotted the same education values. Then, for each of these groups, we grabbed the `education()` column and counted how many times it appeared.  `value_counts()` is just a shortcut to this `groupby()` operation. 

We can use any of the summary functions we've used before with this data. For example, to get the =mininum age for each category, we can do the following:

In [8]:
data.groupby('education').Age.min()

education
bachelor    20
masters     22
Name: Age, dtype: int64

You can think of each group we generate as being a slice of our DataFrame containing only data with values that match. This DataFrame is accessible to us directly using the `apply()` method, and we can then manipulate the data in any way we see fit. For example, here's one way of selecting the name of the first wine reviewed from each winery in the dataset:

Another `groupby()` method worth mentioning is `agg()`, which lets you run a bunch of different functions on your DataFrame simultaneously. For example, we can generate a simple statistical summary of the dataset as follows:

In [18]:
data.groupby(['education']).Age.agg([len, min, max])

Unnamed: 0_level_0,len,min,max
education,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
bachelor,20,20,28
masters,10,22,29


Effective use of `groupby()` will allow you to do lots of really powerful things with your dataset.

# Multi-indexes

In all of the examples we've seen thus far we've been working with DataFrame or Series objects with a single-label index. `groupby()` is slightly different in the fact that, depending on the operation we run, it will sometimes result in what is called a multi-index.

A multi-index differs from a regular index in that it has multiple levels. For example:

In [26]:
edu_skill = data.groupby(['education', 'Skill']).Age.agg([min])
edu_skill

Unnamed: 0_level_0,Unnamed: 1_level_0,min
education,Skill,Unnamed: 2_level_1
bachelor,c++,20
bachelor,java,27
bachelor,python,22
masters,c,23
masters,python,22


In [27]:
mi = edu_skill.index
type(mi)

pandas.core.indexes.multi.MultiIndex

Multi-indices have several methods for dealing with their tiered structure which are absent for single-level indices. They also require two levels of labels to retrieve a value. Dealing with multi-index output is a common "gotcha" for users new to pandas.

The use cases for a multi-index are detailed alongside instructions on using them in the [MultiIndex / Advanced Selection](https://pandas.pydata.org/pandas-docs/stable/advanced.html) section of the pandas documentation.

However, in general the multi-index method you will use most often is the one for converting back to a regular index, the `reset_index()` method:

In [28]:
edu_skill.reset_index()

Unnamed: 0,education,Skill,min
0,bachelor,c++,20
1,bachelor,java,27
2,bachelor,python,22
3,masters,c,23
4,masters,python,22


`sort_values()` defaults to an ascending sort, where the lowest values go first. However, most of the time we want a descending sort, where the higher numbers go first. That goes thusly: