# Agenda: dtypes

1. Basic dtypes (review)
2. Changing dtypes
3. Limits/issues with changing them
4. `NaN` ("Not a number")
5. Nullable types -- the evolution of Pandas

# Dtypes in Pandas

Each column in a data frame is a series. Each series (whether on its own or inside of a data frame) has a dtype. That determines the type of data that each value in the series contains.

In traditional Python lists, we can have any values, in any combination. In a series, they all must have exactly the same type. In a series, though, we do need to tell Pandas what types of data we want, so it can turn to NumPy (the lower-level layer) and allocate an array of the right size. Moreover, it needs to interpret the bits in memory in the right way.

If we see `12` and `12.34`, we understand that the first is an integer, and the second is a float. We also think of a float has having "extra stuff beyond the integer." The dtype not only tells Pandas what kinds of data we're going to store, and thus what the limits are on those values, but also how it needs to interpret the bits at the lowest level.

Choosing a dtype is thus important for (a) making sure that the values will work, (b) making sure that they'll fit, and (c) making sure that you don't use too much memory.

Normally, when we create a series, Pandas chooses a dtype for us:

- If it sees only integers (decimal digits), then we get a dtype of `int64` -- 64-bit integers, aka 8-byte integers. These are signed, meaning that half of the numbers are positive and half are negative.
- If it sees decimal digits and one decimal point, then we get a dtype of `float64` -- 64-bit floats, aka 8-byte floats. These are also signed.
- If it sees other things, then it basically assumes that we have strings. But it doesn't use NumPy's strings, which are awful. Instead, it uses Python strings, and refers to them using a dtype of `object`.