# Getting Started with BlazingSQL
*By Winston Robson*

In this notebook, we will cover how to read and query CSV files with cuDF and BlazingSQL.

#### BlazingSQL install check
The next cell checks that you have BlazingSQL installed, and offers to install it if not (making sure the notebook will run as expected).

In [1]:
import sys 
# point import path notebooks-contrib/utils
sys.path.append('../../../utils/')

from sql_check import bsql_start
# check that BlazingSQL is installed
bsql_start()

"You've got BlazingSQL set up perfectly! Let's get started with SQL in RAPIDS AI!"

#### Download Data
This cell will check if you have the data for this demo, and, if you don't, will download it for you.

In [2]:
import os

# relative path to data 
file = '../../../data/blazingsql/Music.csv'

# do we have music file?
if not os.path.isfile(file):
    !wget -P ../../../data/blazingsql 'https://s3.amazonaws.com/blazingsql-colab/Music.csv'
else:
    print("You've got the data!")

--2020-01-21 17:08:03--  https://s3.amazonaws.com/blazingsql-colab/Music.csv
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.236.253
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.236.253|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10473 (10K) [text/csv]
Saving to: ‘../../../data/blazingsql/Music.csv’


2020-01-21 17:08:03 (207 MB/s) - ‘../../../data/blazingsql/Music.csv’ saved [10473/10473]



## Create BlazingContext
You can think of BlazingContext much like a Spark Context (i.e. information such as FileSystems registered & Tables created will be stored).

In [3]:
from blazingsql import BlazingContext

bc = BlazingContext()

BlazingContext ready


## Read CSV
First we need to download a CSV file. Then we use cuDF to read the CSV file, which gives us a GPU DataFrame (GDF). 

To learn more about the GDF and how it enables end to end workloads on rapids, check out *[The GPU DataFrame (GDF) and cuDF in RAPIDS AI](https://blog.blazingdb.com/blazingsql-part-1-the-gpu-dataframe-gdf-and-cudf-in-rapids-ai-96ec15102240)*.

In [4]:
import cudf

# cudf (gpu) dataframe from csv 
gdf = cudf.read_csv('../../../data/blazingsql/Music.csv')

# let's see how it looks
gdf.head()

Unnamed: 0,ARTIST,RATING,YEAR,LOCATION,FESTIVAL_SET
0,Arcade Fire,10.0,2018.0,Las Vegas,1.0
1,Justice,10.0,2018.0,Las Vegas,1.0
2,Florence and The Machine,10.0,2018.0,Las Vegas,1.0
3,Odesza,10.0,2018.0,Indio,1.0
4,Bon Iver,10.0,2017.0,Indio,1.0


## Create a Table
Now we just need to create a table. 

In [5]:
# BlazingSQL table from cuDF DataFrame 
bc.create_table('music', gdf)

<pyblazing.apiv2.context.BlazingTable at 0x7f37b3a42710>

## Query a Table
That's it! Now when you can write a SQL query the data will get processed on the GPU with BlazingSQL, and the output will be a GPU DataFrame (GDF) inside RAPIDS!

In [6]:
# query artist, rating & location for 10 events with a rating of at least 7
result_gdf = bc.sql('SELECT ARTIST, RATING, LOCATION FROM music where RATING >= 7 LIMIT 10')

# display dataframe (type(result_gdf)==cudf.core.dataframe.DataFrame)
result_gdf

Unnamed: 0,ARTIST,RATING,LOCATION
0,Arcade Fire,10.0,Las Vegas
1,Justice,10.0,Las Vegas
2,Florence and The Machine,10.0,Las Vegas
3,Odesza,10.0,Indio
4,Bon Iver,10.0,Indio
5,LA Philharmonic + Sigur Ros,10.0,LA
6,Sigur Ros,10.0,Malmo
7,Arcade Fire,10.0,Indio
8,Escort,9.0,San Francisco
9,Phoenix,9.0,Berkeley


In [7]:
# define query 
query = '''
        select 
            ARTIST, RATING, 
            LOCATION, FESTIVAL_SET 
        from 
            music 
            where LOCATION = 'San Francisco'
            '''

# pull events in San Francisco, CA
gdf = bc.sql(query)

# sample 7 rows by converting to pandas 
gdf.to_pandas().sample(7)

Unnamed: 0,ARTIST,RATING,LOCATION,FESTIVAL_SET
18,SOHN,6.0,San Francisco,1.0
23,Lorde,6.0,San Francisco,1.0
3,Goldroom,8.0,San Francisco,1.0
47,Passion Pit,4.0,San Francisco,0.0
51,Zedd,3.0,San Francisco,1.0
25,Oh Wonder,6.0,San Francisco,1.0
38,Atmosphere,5.0,San Francisco,1.0


# You're Ready to Rock
And... thats it! You are now live with BlazingSQL.


Check out our [docs](https://docs.blazingdb.com) or [Twitter](https://twitter.com/blazingsql) to get fancy or to learn more about how BlazingSQL works with the rest of [RAPIDS AI](https://rapids.ai/).