# Forma AI SQL Take Home Assignment

The purpose of this notebook is to answer the questions laid out by Forma AI for the data analyst role.

As a part of the assignment Forma AI has provided a SQLite3 database and 5 questions. The questions will be reiterated in this notebook but more information can be found in the README file.

## The database

The SQLite3 database contains 3 tables.

The first table `transactions`, contains details about each product that a customer has purchased. A transaction is labeled by a 'trans_id' and can contain multiple products. Here are the columns in the `transactions` table:
   * `trans_id`: the transaction id
   * `cust_id`: the customer id
   * `prod_id`: the product id
   * `item_qty`: the quantity of the product that is being purchased
   * `item_price`: the per unit price of the product (NOTE: the total revenue
     for a product is `item_qty * item_price`)

The second table `products`, contains details about each product. The columns in the `products` table are:
   * `prod_id`: the product id (same meaning as in `transactions`)
   * `prod_name`: the product name
   * `brand`: the brand of the product
   * `category`: the category of the product
   
Finally the third table `segments`, contains the history of each customer and which market segment they belong to. Segmentation is calculated periodically for current customers and are appended to this table. The most recent segment is labeled in the `active-flag` column by a `Y`. The columns in this table are:
   * `cust_id`: the customer id (same meaning as in `transactions`)
   * `seg_name`: the segment of this customer
   * `update_dt`: the date when this segment was updated
   * `active_flag`: whether or not this segment is the active segment for this customer


## Reading the database

Before we can begin answering the questions let us import the `sqlite` library and connecting to the database.

In [15]:
import sqlite3

conn = sqlite3.connect('sample.db')

sample_query = 'SELECT * FROM transactions;'
sample_pull = conn.execute(sample_query).fetchmany(5)
print(sample_pull)


[(1, '2016-01-02 10:06:00', 9085146, 223029, 1, 42.99), (2, '2016-01-02 10:30:00', 1215814, 252270, 1, 103.95), (2, '2016-01-02 10:30:00', 1215814, 260383, 1, 74.99), (4, '2016-01-02 11:33:00', 18511160, 269119, 1, 51.99), (4, '2016-01-02 11:33:00', 18511160, 411162, 1, 59.99)]
[(0, 'cust_id', 'INTEGER', 0, None, 0), (1, 'seg_name', 'TEXT', 0, None, 0), (2, 'update_at', 'TIMESTAMP', 0, None, 0), (3, 'active_flag', 'TEXT', 0, None, 0)]


We see that our connection to the database is now established. We are ready to start answering the questions given us for the assessment.

## Q1

**Find the current active segment for each customer sorted by the segment
   update date.  The output should contain three columns: `cust_id`,
   `seg_name`, `updated_at`**
   
In order to create this table we would use the query: 
"SELECT cust_id, 
        seg_name, 
        updated_dt AS updated_at 
 FROM segments 
 WHERE active_flag = Y
 ORDER BY updated_at DESC"
 


In [16]:
#Pulling the data using the sqlite3 library

query_1 = '''
SELECT cust_id,
       seg_name,
       update_dt AS updated_at
FROM segments
WHERE active_flag = Y
ORDER BY updated_at DESC;
'''

question_1 = conn.execute(query_1).fetchall()
print(question_1[:3])

OperationalError: no such column: update_dt

When we tried to run the query we got an error saying there was no such column `update_dt`. Let's query the column headers of the `segments` table to dive deeper.

In [17]:
print(conn.execute('PRAGMA table_info(segments)').fetchall())

[(0, 'cust_id', 'INTEGER', 0, None, 0), (1, 'seg_name', 'TEXT', 0, None, 0), (2, 'update_at', 'TIMESTAMP', 0, None, 0), (3, 'active_flag', 'TEXT', 0, None, 0)]
