Reading a csv file with pd.read_csv can take a long time.
Loading With feather and Datatable very fast.

The feather format uses the following notebook output file.  
https://www.kaggle.com/yamsam/riiid-feather-format

I also found the following NOTEBOOK very helpful.  
https://www.kaggle.com/rohanrao/tutorial-on-reading-large-datasets  
https://www.kaggle.com/yihdarshieh/riiid-verifying-private-test-dataset-properties  

In [None]:
!pip install ../input/python-datatable/datatable-0.11.0-cp37-cp37m-manylinux2010_x86_64.whl
!mkdir ../tmp/

In [None]:
from time import time
from contextlib import contextmanager
import pandas as pd
from tqdm.auto import tqdm
import gc
import pickle
import datatable as dt

gc.enable()

@contextmanager
def timer(name):
    t0 = time()
    yield
    print(f'[{name}] done in {time() - t0:.2f} s')

def sizeof_fmt(num, suffix='B'):
    for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']:
        if abs(num) < 1024.0:
            return "%3.1f%s%s" % (num, unit, suffix)
        num /= 1024.0
    return "%.1f%s%s" % (num, 'Yi', suffix)

# feather

In [None]:
!du -h ../input/riiid-feather-format/train.feather

In [None]:
# read the feather format 10 times.
with timer('feather'):
    for _ in tqdm(range(10)):
        train_df = pd.read_feather('../input/riiid-feather-format/train.feather')

In [None]:
print(sizeof_fmt(train_df.memory_usage().sum()))

In [None]:
with timer('feather save'):
    train_df.to_feather('../tmp/train.feather')

# pickle

In [None]:
with timer('pickle save'):
    with open('../tmp/train.pickle', 'wb') as f:
        pickle.dump(train_df, f)

In [None]:
!du -h ../tmp/train.pickle

In [None]:
with timer('pickle load'):
    for _ in tqdm(range(10)):
        with open('../tmp/train.pickle', 'rb') as f:
            train_df = pickle.load(f)

# Datatable

https://datatable.readthedocs.io/en/latest/index.html

In [None]:
with timer('DataFrame save'):
    dt.Frame(train_df).to_jay("train.jay")

In [None]:
del train_df

In [None]:
with timer('Datatable'):
    for _ in tqdm(range(10)):
        train_dt = dt.fread('train.jay')

In [None]:
with timer('Datatable.Frame save'):
    train_dt.to_jay('train.jay')

In [None]:
type(train_dt)

In [None]:
!du -h train.jay

In [None]:
import sys
print(sizeof_fmt(sys.getsizeof(train_dt)))
del train_dt

In [None]:
with timer('Datatable to pd.DataFrame'):
    for _ in tqdm(range(10)):
        train_df = dt.fread('train.jay').to_pandas()

In [None]:
type(train_df)

In [None]:
train_df.dtypes

# conclusion

Datatable loads really fast. However, the conversion from Datatable to pandas.DataFrame is not fast.  
The feather format was able to load in about 2 seconds.  