For Python programmers Ibis offers a way to write SQL in Python that allows for unit-testing, composability, and abstraction over specific query engines (e.g.BigQuery)! You can carry out joins, filters and other operations on your data in a familiar, Pandas-like syntax. Overall, using Ibis simplifies your workflows, makes you more productive, and keeps your code readable.

Let's start by getting ibis and sqlite installed on your system. In the next several cells, you will notice that the first character is an exclamation point: ***!*** This tells Jupyter that we're running a shell (terminal) command instead of python code in the notebook

## Install SQLite

In [None]:
! conda install -c anaconda sqlite

## Install ibis

In [None]:
! conda install -c conda-forge ibis-framework     

## Get the data
We're using the `Civic Art Collection` database, which can be downloaded here: https://data.sfgov.org/Culture-and-Recreation/Civic-Art-Collection/r7bn-7v9c using the `Export` service in the top right corner. Please select `CSV` for your export format. When you've completed the download, store the downloaded file in the `data` directory in this folder.

When you're done, you should see the following information when you list this folder:
>`data                       ibis_sqlite_tutorial.ipynb`

and when you list the Data folder you should see: 
>`Civic_Art_Collection.csv`
<br>

## Create the SQLite database

## We will use python to create the target database as this is not yet supported in ibis. 
The following code will:
 - create the initial SQL collection (database)
 - set SQLite to import the data from a CSV file
 - import the data
Following that, we will check to see that the correct tables have been imported and describe the tables.  

### let's first create variables for all of the things we're about to go do

In [None]:
import sqlite3
import os
path="data" #where did we store the data
file="Civic_Art_Collection.csv" #what's the name of the file we're importing
mode="csv" # what is the file format
collection_name="civicArt.db" # what are we going to call the database
table_name="civicArtTable" # what are we going to call the table into which we are importing the file
path_separator=os.path.sep # get the path separator (forward or backward slash) for this operating system
conn=None # place holder for the db connection

### Create the SQL Collection (database)
When SQLite receives a connection request for a database, if a db of that name does not already exist, it will be created on the fly. 

In [None]:
from sqlite3 import Error
try:
    conn = sqlite3.connect(collection_name)
    print(f"Connection Successful Currently using sqllite version: {sqlite3.version}")
except Error as e:
    print(f"Connection request failed with following error: \n{e}")

### Create the table schema

In [None]:
# get the header information from the csv file. Let's use pandas for that
import pandas as pd
art_df = pd.read_csv(path+path_separator+file)
colNames = list(art_df.columns)
# Most of the columns in this table are strings and we could very simply import the table with everything 
# treated as a string
# The `accession_number` is unique for the table and we could use that as the key.
# first, let's just make everything a string as far as the db is concerned:
for idx, each in enumerate(colNames):
    if each == "accession_number":
        colNames[idx] = "_".join(colNames[idx].split()) + " text PRIMARY KEY, "
    elif idx != len(colNames)-1:
        colNames[idx] = "_".join(colNames[idx].split()) + " text, "
    else:
        colNames[idx] = "_".join(colNames[idx].split()) + " text "
table_schema = "(" + "\n".join(colNames) + ")"
sql_string = " CREATE TABLE IF NOT EXISTS " + table_name + " " + table_schema +" ;"
sql_create_civicArt_table = """""" + sql_string +""""""

try:
    c = conn.cursor()
    c.execute(sql_create_civicArt_table)
except Error as e:
   print(f"Table Creation request failed with following error: \n{e}")
conn.commit()

### import the data into the table

In [None]:
# sql doesn't really like spaces in the column names, which we fixed up in the preceding section. 
# let's fix them in the pandas dataframe
newHeaders = list(art_df.columns)

for idx, each in enumerate(newHeaders):
    newHeaders[idx] = "_".join(newHeaders[idx].split())
    
art_df.columns = newHeaders
art_df.to_sql(table_name, conn, if_exists='replace', index = False)

### OK, let's start using ibis to work with this data
#### Connecting to the database

In [None]:
import ibis
ibis.options.interactive = True
db = ibis.sqlite.connect("civicArt.db")

#### list the tables

In [None]:
sql_tables = db.list_tables()
print(sql_tables)

#### list the columns in the table

In [None]:
# since the results of the preceding command is a list and we may get more than one table in the list
# print each table name and the columns for that table
for each in sql_tables:
    print(f"columns in {each}")
    print(f"{db.table(each).columns}")

## Querying 

Anything you can write in a SELECT statement you can write in Ibis. Let's test this out!
I’ll use the following code to find out which artists have art currently displayed in the city and what the title of their pieces are. 

### Selecting columns from a table 

In [None]:
art = db.table(sql_tables[0])
sql_results = art["artist", "display_title"]
sql_results

### Filtering Data

Next let's pick an artist and figure out where exactly all their art is located. Adriane Colburns’ display title, `Geological Ghost` caught my eye so let's choose them! 

I use the following commands to do this

In [None]:
adrianes_art = art.filter(art["artist"] == 'Colburn, Adriane')
adrianes_art["display_title", "street_address_or_intersection"]


So it turns out Adriane has two pieces on display, one at `4 Guy Place` and the other at `Dagget Street & 16th Street`. This is great, we already have some places we can add to our tourist itinerary!  

### Groupby

I don’t usually stay more than one or two days in a city after a conference, so it might be nice to know which locations have the most art on display. To figure this out we’ll use the following `groupby` expressions to get the information we need. 
We use `groupy` and `sort_by` to get the 10 locations in San Francisco with the most art! 

In [None]:
art_loc = art.groupby("street_address_or_intersection").count('display_title')
most_art=art_loc.sort_by('display_title')
most_art[268:278]


### Great, the first stop would be 1001 Potrero Avenue that has 59 titles on display!! 