<img src='images/pandas.png' width='300px' align=left>
<img src='images/gdd-logo.png' width='200px' align='right' style="padding: 15px">

# Frequently asked questions about Pandas

## Can we change the datatype of the columns?


By default Pandas will load all numeric types as integers or floats. However, you might wanna manually change the object types that are used to represent the data. 

In [None]:
import pandas as pd

chickweight = (
    pd.read_csv('data/chickweight.csv')
    .rename(columns=str.lower)
)

You can use the `.as_type()` method to coherce certain columns to a particular type. 

For example:

In [None]:
(
    pd.read_csv('data/chickweight.csv')
    .rename(columns=str.lower)
    .assign(weight = lambda df: df["weight"].astype("float"))
).head()

One particularly common usecase is to encode categorial values in a more efficient way than just using integers.

The `chickweight` DataFrame has a memory footprint of 22.7 KB.

In [None]:
chickweight.info()

And if we encode the categorical `chick` and `diet` columns as categorical variables we significanfly reduce the memory that the DataFrame occupies in our computer's RAM.

In [None]:
chickweight_typed = (
    pd.read_csv('data/chickweight.csv')
    .rename(columns=str.lower)
    .assign(
        chick = lambda df: df["chick"].astype("category"),
        diet = lambda df: df["diet"].astype("category"),
    )
)

In [None]:
chickweight_typed.info()

In [None]:
(
    pd.DataFrame([chickweight_typed.memory_usage(), (chickweight.memory_usage())],
              index = ["typed", "all_ints"])
    .transpose()
    .assign(ratio = lambda df: round(df["typed"] /df["all_ints"], 2))
)

In this toy example the total memory footprint is reduced by a factor of

In [None]:
17.4 / 22.7

, which is significant and can make a very noticeable difference when working with larger datasets. Your code will run faster and it will be harder to run out of RAM. 