### Datatables 
Official page: https://datatable.readthedocs.io/en/latest/start/quick-start.html

Introduction/tutorial: https://towardsdatascience.com/an-overview-of-pythons-datatable-package-5d3a97394ee9



A different way of treating frames (like pandas dataframes)

Also there is a comparison between SQL langauges and `datatables`: https://datatable.readthedocs.io/en/latest/manual/comparison_with_sql.html


In [3]:
#!pip install datatable --user

*    Can automatically **detect separators**, headers, column types, quoting rules, etc.
*    Can **read** data **from multiple sources** including file, URL, shell, raw text, archives and glob.
*    Provides **multi-threaded** file reading for maximum speed
*    Includes a **progress indicator** when reading large files
*    Can read both RFC4180-compliant and non-compliant files.

In [6]:
import datatable as dt
import pandas as pd
import os

datafile = "/home/course/public/Datasets/Additional/Traffic_Violations.csv"
if not os.path.exists(datafile):
    datafile = "/home/course/Datasets/Additional/Traffic_Violations.csv"

#### Loading time¶


In [7]:
%%time
dtf = dt.fread(datafile)

CPU times: user 8.88 s, sys: 5.38 s, total: 14.3 s
Wall time: 1.23 s


In [8]:
%%time
df = pd.read_csv(datafile)

CPU times: user 11.2 s, sys: 5.02 s, total: 16.2 s
Wall time: 16.2 s


#### Data manipulation

In [9]:
dtf[1:10,:4]

Unnamed: 0_level_0,SeqID,Date Of Stop,Time Of Stop,Agency
Unnamed: 0_level_1,▪▪▪▪,▪▪▪▪,▪▪▪▪,▪▪▪▪
0,63f0c83d-9127-4734-8655-84b5c5c542ea,05/21/2019,22:52:00,MCP
1,8ba4a63a-daab-4144-8fb2-73ca2cc9c818,05/21/2019,22:51:00,MCP
2,6008542f-ebd4-4a6c-8aca-8db1ec961484,05/21/2019,22:48:00,MCP
3,41a5c596-a342-4928-8433-85c128fc7990,05/21/2019,22:45:00,MCP
4,e9924f8b-9a1a-4614-9177-d6f3afc0664c,05/21/2019,22:44:00,MCP
5,e9924f8b-9a1a-4614-9177-d6f3afc0664c,05/21/2019,22:44:00,MCP
6,b9cc5ffd-a6e6-487c-8e90-751c233385ab,05/21/2019,22:44:00,MCP
7,2bf1e1d2-a59a-4460-888d-3637faaa4939,05/21/2019,22:44:00,MCP
8,829cd2e0-c909-44e0-b80f-0e3b61b1fd51,05/21/2019,22:41:00,MCP


In [10]:
dtf[:,dtf.names[5:10]].head()

Unnamed: 0_level_0,Description,Location,Latitude,Longitude,Accident
Unnamed: 0_level_1,▪▪▪▪,▪▪▪▪,▪▪▪▪▪▪▪▪,▪▪▪▪▪▪▪▪,▪▪▪▪
0,OPER. MOTOR VEH. WITH OPERATOR NOT RESTRAINED BY S…,12 N. WASHINGTON ST,39.0848,−77.1528,No
1,FAILURE TO DISPLAY REGISTRATION CARD UPON DEMAND B…,12 N. WASHINGTON ST,39.0848,−77.1528,No
2,FAILURE TO ATTACH VEHICLE REGISTRATION PLATES AT F…,RANDOLPH RD/ HUNTERS LN,39.0537,−77.1009,No
3,DRIVING VEH. W/O ADEQUATE REAR REG. PLATE ILLUMINA…,OLD GEORGETOWN RD @ TUCKERMAN LA,39.0245,−77.103,No
4,EXCEEDING THE POSTED SPEED LIMIT OF 25 MPH,OLD GEORGETOWN RD / AUBURN AVE,38.9881,−77.0999,No
5,FAILURE TO ATTACH VEHICLE REGISTRATION PLATES AT F…,1700 UNIVERSITY BLVD W,39.0367,−77.0386,No
6,EXCEEDING THE POSTED SPEED LIMIT OF 35 MPH,1700 UNIVERSITY BLVD W,39.0367,−77.0386,No
7,DRIVER USING HANDS TO USE HANDHELD TELEPHONE WHILE…,GEORGIA AVE / SLIGO AVE,38.9918,−77.0267,No
8,FAILURE OF DR. TO MAKE LANE CH. TO AVAIL. LANE NOT…,MVA/270,39.1497,−77.2155,No
9,DRIVING VEH. W/O ADEQUATE REAR REG. PLATE ILLUMINA…,RANDOLPH RD/ROCKING HORSE RD,39.0538,−77.0975,No


#### Frame proxy f

In [11]:
dtf[:5,dt.f.SeqID]

Unnamed: 0_level_0,SeqID
Unnamed: 0_level_1,▪▪▪▪
0,63f0c83d-9127-4734-8655-84b5c5c542ea
1,63f0c83d-9127-4734-8655-84b5c5c542ea
2,8ba4a63a-daab-4144-8fb2-73ca2cc9c818
3,6008542f-ebd4-4a6c-8aca-8db1ec961484
4,41a5c596-a342-4928-8433-85c128fc7990


Datatable frames can be converted pandas DataFrame or numpy array

In [None]:
numpy_df = dtf.to_numpy()
pandas_df = dtf.to_pandas()