# Getting more value from the Pandas’ value_counts()

![](https://miro.medium.com/max/1280/1*lOht9o73PICksasDplo0Pg.jpeg)

Data exploration is an important aspect of the Machine Learning pipeline. Before we decide which model to train and how many to train, we must have an idea of what our data contains. The Pandas library is equipped with a number of useful functions for this very purpose and value_counts is one of them. This function returns the count of unique items in a pandas dataframe. However, most of the time, we end up using value_counts with the default parameters. So in this short article, I’ll show you how to achieve more by altering the default parameters.


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.


In [None]:
# Importing necessary libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
# Reading in the data
train = pd.read_csv('../input/titanic/train.csv')
test = pd.read_csv('../input/titanic/test.csv')

Let's look at the first few rows to get an idea about the dataset

In [None]:
train.head()

In [None]:
# Calculating the number of null values

train.isnull().sum()

Age, Cabin and Embarked columns have null values

# Value_Counts()

The [value_counts() function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html) returns an object containing counts of unique values.This means it enables us to count the number of unique elements in a column of a Pandas dataframe.

### Syntax

`Series.value_counts()`

### Parameters

![](https://miro.medium.com/max/597/1*j5Gi_-E-b4h6tqtbYsxTrA.png)

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html

Let's see how we can use it in our analysis

# 1. value_counts() with default parameters

Let’s call the value_counts() on the Embarked column of the dataset. This will return the count of unique occurrences in this column.

In [None]:
train['Sex'].value_counts()

In [None]:
train['Embarked'].value_counts()

The function returns the count of all unique values in the given index in descending order, without any null values. The function returns the count of all unique values in the given index in descending order without any null values. We can quickly see that the maximum people embarked from Southampton, followed by Cherbourg and then Queenstown.

# 2. value_counts() with relative frequencies of the unique values.

Sometimes, getting a percentage of the total is a better criteria then the count. By setting `normalize =True`,the object returned will contain the relative frequencies of the unique values. `normalize` is set to `False` by default.

In [None]:
train['Embarked'].value_counts(normalize=True)

# 3. value_counts() in ascending order

1. Again, to sort the results obtained in ascending order, simply set the `ascending` parameter to `True`, which is again set to `False` by default. 

In [None]:
train['Embarked'].value_counts(ascending=True)

# 4. value_counts() with NaN values

By default, count of null values are excluded. However, this can be reversed by setting `dropna=False`.

In [None]:
train['Embarked'].value_counts(dropna=False)

This shows there are 2 null values in the `Embarked' column.

# 5. value_counts() with bins
value_counts() can also be used to bin continuous data into discrete intervals with the help of `bin` parameter.So rather than counting one can group the values in bins. This option works only with numerical data.

In [None]:
# applying value_counts on a numerical column
train['Fare'].value_counts()

This doesn't convey much since the function above has given a count of every available Fare amount. Instead, let's group them into 7 bins.

In [None]:
train['Fare'].value_counts(bins=7)

Binning makes it easy to understand the idea being conveyed. We can easily see that most of the people out of the total population paid less than 73.19 for their ticket.

value_counts() is a very useful method and helps to get a sense of data easily.

## References

[Documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html)