### [Big Data Sets in Python using Dask](https://www.analyticsvidhya.com/blog/2018/08/dask-big-datasets-machine_learning-python/?utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer)

In [61]:
import dask.array as da
import dask.dataframe as dd
import numpy as np
import pandas as pd
import psutil
import os

#### Array

In [3]:
np_X = np.arange(1,1000)

In [5]:
da_X = da.from_array(np_X,chunks=250)

In [12]:
da_X.chunks

((250, 250, 250, 249),)

In [22]:
da_X.sum(keepdims=False).compute()

499500

#### Data Frame

Wall time: 1.25 s


#### Get Python's pid

In [33]:
pid = os.getpid()
py = psutil.Process(pid)

In [42]:
py_cpu1 = py.cpu_percent()

##### Using pandas

In [65]:
%time temp_pd = pd.read_csv(r"C:\Home\Work\Data Science\AnalyticsVidhya\Data\BlackFriday\train.csv")
py_cpu2_user,py_cpu2_sys = py.cpu_times()[0], py.cpu_times()[1]
print("User : %4.7f" % (py_cpu2_user))
print("System : %4.7f" % (py_cpu2_sys))
print("Total : %4.7f" % (py_cpu2_sys+py_cpu2_user))


Wall time: 1.53 s
User : 27.5625000
System : 8.9218750
Total : 36.4843750


#### Using DASK

In [66]:
%time temp_dd = dd.read_csv(r"C:\Home\Work\Data Science\AnalyticsVidhya\Data\BlackFriday\train.csv")
py_cpu2_user,py_cpu2_sys = py.cpu_times()[0], py.cpu_times()[1]
print("User : %4.7f" % (py_cpu2_user))
print("System : %4.7f" % (py_cpu2_sys))
print("Total : %4.7f" % (py_cpu2_sys+py_cpu2_user))

Wall time: 66 ms
User : 27.6406250
System : 8.9218750
Total : 36.5625000


#### Value Counts in Pandas

In [71]:
%time temp_pd.Gender.value_counts()

Wall time: 193 ms


M    414259
F    135809
Name: Gender, dtype: int64

#### Value counts in Dask

In [72]:
%time temp_dd.Gender.value_counts().compute()

Wall time: 1.32 s


M    414259
F    135809
Name: Gender, dtype: int64

#### Group by using Pandas

In [79]:
%time temp_pd.groupby(by=temp_pd.Gender).Purchase.sum()

Wall time: 128 ms


Gender
F    1186232642
M    3909580100
Name: Purchase, dtype: int64

#### Group by using dask

In [80]:
%time temp_dd.groupby(by=temp_dd.Gender).Purchase.sum().compute()

Wall time: 2.07 s


Gender
F    1186232642
M    3909580100
Name: Purchase, dtype: int64