# Feather or CSV?
This notebook provides a straightforward comparison between the feather format and the traditional .csv file.

## Setting up the environment
Loading the libraries

In [12]:
import pandas as pd
import numpy as np
import time
import os

In [13]:
# Simulating some data
columns = 5
rows = 1_000_000

np.random.seed(0)
df = pd.DataFrame(np.random.rand(rows, columns),
                  columns=["c1", "c2", "c3", "c4", "c5"])

# Using Feather
We proceed to measure the reading time and writing time, as well as dumping the feather file into our root directory in order to later calculate its size.

In [14]:
tic = time.process_time()
df.to_feather('data.feather')
feather_write = time.process_time() - tic
tic = time.process_time()
feather = pd.read_feather('data.feather')
feather_read = time.process_time() - tic

# Using CSV
Similarly as we did before with the feather file, we measure the reading time and writing time, as well as dump the csv file so as to calculate its size later.

In [15]:
tic = time.process_time()
df.to_csv('data.csv', index=False)
csv_write = time.process_time() - tic
tic = time.process_time()
csv = pd.read_csv('data.csv')
csv_read = time.process_time() - tic

# Computing the file size
In order to compute the size for each file, we'll use the helper function below:

In [16]:
def get_file_size_in_megabytes(file_name, decimals=2):
    file_stats = os.stat(file_name)
    return round(file_stats.st_size / (1024 * 1024), decimals)

In [17]:
feather_file_size = get_file_size_in_megabytes('data.feather')
csv_file_size = get_file_size_in_megabytes('data.csv')

# Comparing times
Now, we can compare the read and write time (in seconds) as well as the file size (in megabytes), for each file.

In [24]:
# Comparing times
times = pd.DataFrame({'Write (s)': [round(feather_write, 2), round(csv_write, 2)],
                      'Read (s)': [round(feather_read, 2), round(csv_read, 2)],
                     'Size (mb)': [feather_file_size, csv_file_size]
                     },
                     index=['Feather', 'CSV'])
times

Unnamed: 0,Write (s),Read (s),Size (mb)
Feather,0.23,0.11,38.16
CSV,8.94,0.97,92.84
