# Count unique values of a column in a group

You can use `groupby()` with multiple variables or `drop_duplicates()` to see all the unique combinations of multiple variables. However, sometimes you just want to get a count of the number of unique values of one variable, within groups by a different variable. To do this, use `Series.nunique()` on the column after a `DataFrame.groupby()`.

In [1]:
# Setup

import numpy as np
import pandas as pd

In [2]:
# Create some toy data

# Create a list of four categories named "category 1", etc.
categories = [f"category {i}" for i in range(1, 5)]

# Create a DataFrame where the first column is just the
# categories repeated twice.
n_cycles = 2
df = pd.DataFrame.from_dict({
    "category": np.tile(categories, n_cycles),
    # Populate a second category column with a random selection
    # from the categories.
    "subcategory": np.random.choice(categories, n_cycles * len(categories))
}, dtype="category")


# Sort by categories so it's easier to see the expected result of grouping.
df.sort_values(["category", "subcategory"])

Unnamed: 0,category,subcategory
4,category 1,category 1
0,category 1,category 2
1,category 2,category 1
5,category 2,category 3
2,category 3,category 1
6,category 3,category 3
3,category 4,category 3
7,category 4,category 3


In [3]:
# Count number of unique subcategories by category

df.groupby("category")["subcategory"].nunique()

category
category 1    2
category 2    2
category 3    2
category 4    1
Name: subcategory, dtype: int64