**Performance testing of reading dataframes using some Python libraries**

Details of dataframe used for the test:
- Size: <span style="color: red">1,1 GB</span>
- Rows Total: <span style="color: red">5012957</span>
- Columns Total: <span style="color: red">19</span>

## [Pandas](https://pandas.pydata.org/)

In [1]:
import pandas as pd

In [2]:
%%time 
df = pd.read_csv("/home/juancaio/Downloads/697683_1432303_compressed_large_metro_areas_NYPD_Arrests_Data__Historic_.csv/NYPD_Arrests_Data__Historic_.csv", low_memory=False)

CPU times: user 16.6 s, sys: 5.51 s, total: 22.1 s
Wall time: 3min 42s


## [Modin](https://modin.readthedocs.io/en/latest/)

In [3]:
import modin.pandas as pd
import os



### Modin Engine = Ray

In [4]:
%%time
os.environ["MODIN_ENGINE"] = "ray"
df = pd.read_csv("/home/juancaio/Downloads/697683_1432303_compressed_large_metro_areas_NYPD_Arrests_Data__Historic_.csv/NYPD_Arrests_Data__Historic_.csv")

CPU times: user 9.53 s, sys: 4.85 s, total: 14.4 s
Wall time: 1min 58s


### Modin Engine = Dask

In [5]:
%%time
os.environ["MODIN_ENGINE"] = "dask"
df = pd.read_csv("/home/juancaio/Downloads/697683_1432303_compressed_large_metro_areas_NYPD_Arrests_Data__Historic_.csv/NYPD_Arrests_Data__Historic_.csv")

CPU times: user 5.75 s, sys: 3.09 s, total: 8.84 s
Wall time: 44.7 s


## [Vaex](https://vaex.readthedocs.io/en/latest/)

In [6]:
import vaex as vx

In [7]:
%%time
df = vx.open("/home/juancaio/Downloads/697683_1432303_compressed_large_metro_areas_NYPD_Arrests_Data__Historic_.csv/NYPD_Arrests_Data__Historic_.csv", low_memory=False)

CPU times: user 32.2 s, sys: 5.57 s, total: 37.8 s
Wall time: 2min 30s


## [Dask](https://dask.org/)

In [8]:
import dask.dataframe as dd

In [9]:
%%time
df = dd.read_csv('/home/juancaio/Downloads/697683_1432303_compressed_large_metro_areas_NYPD_Arrests_Data__Historic_.csv/NYPD_Arrests_Data__Historic_.csv')

CPU times: user 69.1 ms, sys: 7.7 ms, total: 76.7 ms
Wall time: 676 ms


## [Pyspark](https://spark.apache.org/docs/latest/api/python/index.html)

In [10]:
import pyspark

In [11]:
%%time
df = spark.read.csv('/home/juancaio/Downloads/697683_1432303_compressed_large_metro_areas_NYPD_Arrests_Data__Historic_.csv/NYPD_Arrests_Data__Historic_.csv')

CPU times: user 4.64 s, sys: 525 ms, total: 5.17 s
Wall time: 1min 10s
