**Reference:**

https://www.kaggle.com/pedrocouto39/fast-reading-w-pickle-feather-parquet-jay

In [None]:
# datatable installation with internet
!pip install datatable==0.11.0 > /dev/null

import numpy as np 
import pandas as pd 
import datatable as dt

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
%%time
train = pd.read_csv("../input/g-research-crypto-forecasting/train.csv")

In [None]:
train.info()

# 1. Converting to multiple formatas
The formats that will be created are:
1. *Pickle* - great for object serialization and though it has a slower performance when comparing with other formats, it may work for our porpuse.
2. *Feather* - is a fast, lightweight, and easy-to-use binary file format for storing data frames.
3. *Parquet* - compared to a traditional approach where data is stored in row-oriented approach, parquet is more efficient in terms of storage and performance.
4. *Jay* - also a binary format, that means it is fast, lightweight, and easy-to-use binary file format for storing data frames.

In [None]:
# writing dataset as pickle
train.to_pickle("g_research_crypto_forecasting.pkl.gzip")

# writing dataset as feather
train.to_feather("g_research_crypto_forecasting.feather")

# writing dataset as parquet
train.to_parquet("g_research_crypto_forecasting.parquet")

# writing dataset as jay
dt.Frame(train).to_jay("g_research_crypto_forecasting.jay")

# writing dataset as hdf5
train.to_hdf("g_research_crypto_forecasting.h5", "g_research_crypto_forecasting")

# 2. Reading and timing
Now let's read each file and time them to see how long it will take for each one.

### 2.1 Pickle

In [None]:
%%time
train_pickle = pd.read_pickle("g_research_crypto_forecasting.pkl.gzip")

In [None]:
train_pickle.info()

### 2.2 Feather

In [None]:
%%time
train_feather = pd.read_feather("g_research_crypto_forecasting.feather")

In [None]:
train_feather.info()

### 2.3 Parquet

In [None]:
%%time
train_parquet = pd.read_parquet("g_research_crypto_forecasting.parquet")

In [None]:
train_parquet.info()

### 2.4 Jay

In [None]:
%%time
train_jay = dt.fread("g_research_crypto_forecasting.jay")

In [None]:
train_jay.shape