# Nullable Integer Data Type

With the release of pandas version 0.24 in 2019, the nullable integer data type became available to pandas users. This new data type allows for missing values to be present in column of integers. This is a completely separate data type than the normal integer data types. The original integer data types still exist and are unable to contain missing values.

Let's begin learning how to use the nullable integer data type by reading in the college dataset, which contains several integer columns that contain missing values. We will read in just the first seven columns setting the institution name as the index.

In [None]:
import pandas as pd
college = pd.read_csv('../data/college.csv', usecols=range(7), index_col='instnm')
college.head(3)

## Converting data to the nullable integer data type

By inspection, you should notice that the columns hbcu, menonly, and womenonly show their values with decimals which usually indicates that they are read in as a float. The column relaffil is the only numeric column that appears to have been read in as an integer. Let's confirm this with the `dtypes` attribute.

In [None]:
college.dtypes

From the data dictionary, we expect that hbcu, menonly, and womenonly are integer columns containing exactly two values, 0 and 1. Since they are read in as floats, they will also contain missing values. Let's use the `value_counts` method to determine the number of unique values and as the number of missing values in each of these three columns.

In [None]:
college['hbcu'].value_counts(dropna=False)

In [None]:
college['menonly'].value_counts(dropna=False)

In [None]:
college['womenonly'].value_counts(dropna=False)

### Attempt integer conversion

As previously covered, we can pass a string to the `astype` method to convert a column to a particular data type. Let's attempt to convert the hbcu column to one of the default 8-bit integer with the string 'int8'. 

In [None]:
college['hbcu'].astype('int8')

### Convert to nullable integer with 'Int'

The above conversion fails as this type of integer cannot handle missing values. In order to convert to the new nullable integer, you need to use the string 'Int' appended to the bit size. The difference is the capital letter 'I'. For example, 'Int8' refers to an 8-bit nullable integer, while 'int8' refers to a normal 8-bit integer. The same bit-sizes (8, 16, 32, 64) are available for nullable integers. Let's properly convert the hbcu column to a nullable integer.

In [None]:
hbcu2 = college['hbcu'].astype('Int8')
hbcu2.head(3)

Let's verify that there are missing values in this 

In [None]:
hbcu2.value_counts(dropna=False)

## Constructing a Series with nullable integers

Use the `dtype` parameter in the Series constructor to make the data type nullable integers. Here, we create a Series of 16-bit nullable integers.

In [None]:
import numpy as np
data = [1, 5, np.nan, 3, np.nan]
pd.Series(data, dtype='Int16')

By default, a Series with the same data will be read in as a 64-bit float.

In [None]:
pd.Series(data)