### Data setup
The cell below checks if you have the data for this demo, and, if you don't, will download it for you. 

Once that is complete, the path to the data will be identified and set to the variable `data_path` for your convenience. 

In [1]:
%%time
import os
import urllib 

# tag raw data dir & file name
base_url = 'https://github.com/gumdropsteve/silent-disco/raw/master/data/'
fn = 'taxi_sample.csv'
# check if we already have the file
if not os.path.isfile('data/' + fn):
    # we don't let me know we're downloading it now
    print(f'Downloading {base_url + fn} to data/{fn}')
    # download file
    !wget -P data 'https://github.com/gumdropsteve/silent-disco/raw/master/data/taxi_sample.csv'
# we already have data
else:
    # let us know
    print(f'data/{fn} already downloaded')
    
# identify current working directory
cwd = os.getcwd()
# add relative path to data for full path
data_path = cwd + '/data/taxi_sample.csv'

Downloading https://github.com/gumdropsteve/silent-disco/raw/master/data/taxi_sample.csv to data/taxi_sample.csv
--2020-01-27 23:02:04--  https://github.com/gumdropsteve/silent-disco/raw/master/data/taxi_sample.csv
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/gumdropsteve/silent-disco/master/data/taxi_sample.csv [following]
--2020-01-27 23:02:04--  https://raw.githubusercontent.com/gumdropsteve/silent-disco/master/data/taxi_sample.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.248.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.248.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1693592 (1.6M) [text/plain]
Saving to: ‘data/taxi_sample.csv’


2020-01-27 23:02:04 (109 MB/s) - ‘data/taxi_sample.csv’ saved [1693592/1693592]

CPU ti

## CONCAT with BlazingSQL

In [2]:
from blazingsql import BlazingContext
# connect to BlazingSQL
bc = BlazingContext()

BlazingContext ready


In [3]:
# create base table
bc.create_table('taxi', data_path, header=0)

<pyblazing.apiv2.context.BlazingTable at 0x7f42d4065898>

In [4]:
# calculate cost/rider, $type-rateid & vndr-id + flag
query = '''
        select 
            total_amount / passenger_count AS cost_per_rider, 
            payment_type || '-' || RateCodeID AS payment_id,
            VendorID || store_and_fwd_flag AS vendor_flag
        from 
            taxi
            '''
# run query 
gdf = bc.sql(query)
# how's it look?
gdf.to_pandas().sample(20)

Unnamed: 0,cost_per_rider,payment_id,vendor_flag
1910,7.8,2-1,1N
1214,12.8,2-1,1N
4457,11.6,1-1,2N
2414,7.8,1-1,1N
780,5.3,2-1,1N
1326,17.76,1-1,1N
5160,3.866667,1-1,2N
9899,7.8,1-1,1N
2602,10.4,1-1,2N
4945,1.48,1-1,2N
