# Part 2: Data Modeling

## Assuming we are starting with the data set from Part 1 as our raw data table, how would you model this data in a data warehouse for analytical purposes? What tables would you create? What kinds of questions do you imagine business users would want to ask of this data, and how would they express them in your data model? Please use whatever tools you are comfortable with to answer this question and whatever flavor of SQL you are most familiar with. A github repo or gist is preferred.

### Tables

__Table 1: Requests__
 - Requests
 - Request_created_at

__Table 2: Analyst Quality__
 - Event_occured_at (Another option is Request but I am not sure if a Analyst can do anothr job for a request after their Quality_score has been changed; If the source to quality can be connected, then perhaps this column would nit be needed and a smaller table can be stored)
 - Analyst
 - Quality_score (Quality_score_sourcing & Quality_score_writing are equivalent in all observations)
 
__Table 3: Jobs Available__
 - Event_occured_at
 - Total_jobs_available
 - Analysts_available
 - Analysts_occupied
 - Job_type
 - Number_available
 
  For this table I would be converting the following column from wide format to long format
     - Review_jobs_available
     - Vetting_jobs_available
     - Editing_jobs_available
     - Sourcing_jobs_available
     - Writing_jobs_available
 
__Table 4: Job__
 - Job
 - Analyst
 - Action
 - Request
 - Wait_time_min
 

### Modeling the Data 
(https://vita.had.co.nz/papers/tidy-data.pdf)

In [1]:
# Import necessary libraries

import sqlite3
import pandas as pd
pd.options.display.max_rows = 999

In [2]:
# Connect to databse
conn = sqlite3.connect('database.db')
c = conn.cursor()

In [7]:
# Create Requests Table

c.execute("""CREATE TABLE Requests AS 
                SELECT DISTINCT Request, Request_created_at
                FROM Assignments;""")

<sqlite3.Cursor at 0x11ae6e650>

In [8]:
# Query Requests Table

c.execute("""SELECT *
             FROM Requests;""")
data = pd.DataFrame(c.fetchall())
data.columns = [x[0] for x in c.description]
data

Unnamed: 0,Request,Request_created_at
0,594bec5c95e2ce005840c23a,2017-06-22 12:12:12
1,594bec83fd2cf400280aa965,2017-06-22 12:12:51
2,594c1f5cd7e68f0028c9062c,2017-06-22 15:49:48
3,594c1e983b593b00281250ba,2017-06-22 15:46:32
4,594bec5b95e2ce005840c232,2017-06-22 12:12:11
5,594c41ab1e24bf00350a1f11,2017-06-22 18:16:11
6,594c151273dd9c002873cd98,2017-06-22 15:05:54
7,594c41482c137900353e0fea,2017-06-22 18:14:32
8,594c0b883b593b002812506e,2017-06-22 14:25:12
9,594c3f0c73dd9c002873ce44,2017-06-22 18:05:00


In [10]:
# Create Analyst_Quality Table

c.execute("""CREATE TABLE Analyst_Quality AS 
                SELECT Event_occurred_at, Analyst, Quality_score_sourcing AS Quality_score
                FROM Assignments;""")

<sqlite3.Cursor at 0x11ae6e650>

In [11]:
# Query Analyst_Quality Table

c.execute("""SELECT *
             FROM Analyst_Quality;""")
data = pd.DataFrame(c.fetchall())
data.columns = [x[0] for x in c.description]
data

Unnamed: 0,Event_occurred_at,Analyst,Quality_score
0,2017-06-22 19:59:06,9fcbc63ff4c8bea5cea4efad782c87cf,5.0
1,2017-06-22 19:59:02,9fcbc63ff4c8bea5cea4efad782c87cf,5.0
2,2017-06-22 19:51:30,85c7b78e76b5232cd38014ea4cdc8f56,4.35
3,2017-06-22 19:51:01,0e9802516f8a79dd0d45211dd4ee74af,4.5
4,2017-06-22 19:50:58,85c7b78e76b5232cd38014ea4cdc8f56,4.35
5,2017-06-22 19:50:58,0e9802516f8a79dd0d45211dd4ee74af,4.5
6,2017-06-22 19:50:58,85c7b78e76b5232cd38014ea4cdc8f56,4.35
7,2017-06-22 19:50:23,85c7b78e76b5232cd38014ea4cdc8f56,4.35
8,2017-06-22 19:47:03,85c7b78e76b5232cd38014ea4cdc8f56,4.35
9,2017-06-22 19:47:03,85c7b78e76b5232cd38014ea4cdc8f56,4.35


In [33]:
# Create Jobs_available Table

c.execute("""CREATE TABLE Jobs_available AS 
                SELECT Event_occurred_at, Total_jobs_available, Analysts_available, Analysts_occupied, 'Review' AS 'Job _type', Review_jobs_available AS Jobs_available FROM Assignments UNION
                SELECT Event_occurred_at, Total_jobs_available, Analysts_available, Analysts_occupied, 'Vetting' AS 'Job _type', Vetting_jobs_available AS Jobs_available FROM Assignments UNION
                SELECT Event_occurred_at, Total_jobs_available, Analysts_available, Analysts_occupied, 'Planning' AS 'Job _type', Planning_jobs_available AS Jobs_available FROM Assignments UNION
                SELECT Event_occurred_at, Total_jobs_available, Analysts_available, Analysts_occupied, 'Editing' AS 'Job _type', Editing_jobs_available AS Jobs_available FROM Assignments UNION
                SELECT Event_occurred_at, Total_jobs_available, Analysts_available, Analysts_occupied, 'Sourcing' AS 'Job _type', Sourcing_jobs_available AS Jobs_available FROM Assignments UNION
                SELECT Event_occurred_at, Total_jobs_available, Analysts_available, Analysts_occupied, 'Writing' AS 'Job _type', Writing_jobs_available AS Jobs_available FROM Assignments;""")

<sqlite3.Cursor at 0x11ae6e650>

In [34]:
# Query Jobs_available Table

c.execute("""SELECT *
             FROM Jobs_available;""")
data = pd.DataFrame(c.fetchall())
data.columns = [x[0] for x in c.description]
data

Unnamed: 0,Event_occurred_at,Total_jobs_available,Analysts_available,Analysts_occupied,Job _type,Jobs_available
0,2017-06-21 20:15:42,14,2,20,Editing,0
1,2017-06-21 20:15:42,14,2,20,Planning,0
2,2017-06-21 20:15:42,14,2,20,Review,11
3,2017-06-21 20:15:42,14,2,20,Sourcing,1
4,2017-06-21 20:15:42,14,2,20,Vetting,2
5,2017-06-21 20:15:42,14,2,20,Writing,0
6,2017-06-21 20:15:45,13,1,20,Editing,0
7,2017-06-21 20:15:45,13,1,20,Planning,0
8,2017-06-21 20:15:45,13,1,20,Review,10
9,2017-06-21 20:15:45,13,1,20,Sourcing,1


In [35]:
# Create Job Table

c.execute("""CREATE TABLE Job AS 
                SELECT Event_occurred_at, Job, Analyst, Action, Request, Wait_time_min, Waiting_for
                FROM Assignments;""")

<sqlite3.Cursor at 0x11ae6e650>

In [36]:
# Query Job Table

c.execute("""SELECT *
             FROM Job;""")
data = pd.DataFrame(c.fetchall())
data.columns = [x[0] for x in c.description]
data

Unnamed: 0,Event_occurred_at,Job,Analyst,Action,Request,Wait_time_min,Waiting_for
0,2017-06-22 19:59:06,review,9fcbc63ff4c8bea5cea4efad782c87cf,Accepted Job,594bec5c95e2ce005840c23a,1,review
1,2017-06-22 19:59:02,review,9fcbc63ff4c8bea5cea4efad782c87cf,Assigned Job,594bec5c95e2ce005840c23a,1,review
2,2017-06-22 19:51:30,writing,85c7b78e76b5232cd38014ea4cdc8f56,Declined Job,594bec83fd2cf400280aa965,9,"sourcing, writing"
3,2017-06-22 19:51:01,sourcing,0e9802516f8a79dd0d45211dd4ee74af,Accepted Job,594c1f5cd7e68f0028c9062c,1,"sourcing, writing"
4,2017-06-22 19:50:58,writing,85c7b78e76b5232cd38014ea4cdc8f56,Assigned Job,594bec83fd2cf400280aa965,8,"sourcing, writing"
5,2017-06-22 19:50:58,sourcing,0e9802516f8a79dd0d45211dd4ee74af,Assigned Job,594c1f5cd7e68f0028c9062c,1,"sourcing, writing"
6,2017-06-22 19:50:58,writing,85c7b78e76b5232cd38014ea4cdc8f56,Declined Job,594bec83fd2cf400280aa965,8,"sourcing, writing"
7,2017-06-22 19:50:23,writing,85c7b78e76b5232cd38014ea4cdc8f56,Assigned Job,594bec83fd2cf400280aa965,8,"sourcing, writing"
8,2017-06-22 19:47:03,sourcing,85c7b78e76b5232cd38014ea4cdc8f56,Accepted Job,594c1e983b593b00281250ba,4,"sourcing, writing"
9,2017-06-22 19:47:03,sourcing,85c7b78e76b5232cd38014ea4cdc8f56,Accepted Job,594c1e983b593b00281250ba,4,"sourcing, writing"


In [32]:
#c.execute('DROP TABLE Jobs_available;')

<sqlite3.Cursor at 0x11ae6e650>

### Business Questions

#### How long does it take for a request to be completed?