In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


<a id='integer-na'></a>

# Nullable integer data type

>**Note**
>
>IntegerArray is currently experimental. Its API or implementation may
change without warning.

Changed in version 1.0.0: Now uses `pandas.NA` as the missing value rather
than `numpy.nan`.

In missing_data, we saw that pandas primarily uses `NaN` to represent
missing data. Because `NaN` is a float, this forces an array of integers with
any missing values to become floating point. In some cases, this may not matter
much. But if your integer column is, say, an identifier, casting to float can
be problematic. Some integers cannot even be represented as floating point
numbers.

## Construction

pandas can represent integer data with possibly missing values using
`arrays.IntegerArray`. This is an extension types
implemented within pandas.

In [None]:
arr = pd.array([1, 2, None], dtype=pd.Int64Dtype())
arr

Or the string alias `"Int64"` (note the capital `"I"`, to differentiate from
NumPy’s `'int64'` dtype:

In [None]:
pd.array([1, 2, np.nan], dtype="Int64")

All NA-like values are replaced with `pandas.NA`.

In [None]:
pd.array([1, 2, np.nan, None, pd.NA], dtype="Int64")

This array can be stored in a `DataFrame` or `Series` like any
NumPy array.

In [None]:
pd.Series(arr)

You can also pass the list-like object to the `Series` constructor
with the dtype.

Currently `pandas.array()` and `pandas.Series()` use different
rules for dtype inference. `pandas.array()` will infer a nullable-
integer dtype

In [None]:
pd.array([1, None])

In [None]:
pd.array([1, 2])

For backwards-compatibility, `Series` infers these as either
integer or float dtype

In [None]:
pd.Series([1, None])

In [None]:
pd.Series([1, 2])

We recommend explicitly providing the dtype to avoid confusion.

In [None]:
pd.array([1, None], dtype="Int64")

In [None]:
pd.Series([1, None], dtype="Int64")

In the future, we may provide an option for `Series` to infer a
nullable-integer dtype.

## Operations

Operations involving an integer array will behave similar to NumPy arrays.
Missing values will be propagated, and the data will be coerced to another
dtype if needed.

In [None]:
s = pd.Series([1, 2, None], dtype="Int64")

# arithmetic
s + 1

In [None]:
# comparison
s == 1

In [None]:
# indexing
s.iloc[1:3]

In [None]:
# operate with other dtypes
s + s.iloc[1:3].astype("Int8")

In [None]:
# coerce when needed
s + 0.01

These dtypes can operate as part of `DataFrame`.

In [None]:
df = pd.DataFrame({"A": s, "B": [1, 1, 3], "C": list("aab")})
df

In [None]:
df.dtypes

These dtypes can be merged & reshaped & casted.

In [None]:
pd.concat([df[["A"]], df[["B", "C"]]], axis=1).dtypes
df["A"].astype(float)

Reduction and groupby operations such as ‘sum’ work as well.

In [None]:
df.sum()

In [None]:
df.groupby("B").A.sum()

## Scalar NA Value

`arrays.IntegerArray` uses `pandas.NA` as its scalar
missing value. Slicing a single element that’s missing will return
`pandas.NA`

In [None]:
a = pd.array([1, None], dtype="Int64")
a[1]