# Designing Data Warehouse

We saw from the data model how combersome it is to query.  We also did some simple data analysis to query the data to get information on customer, city by month etc.  We also saw how long and how many joins we performed in oder to get the data.   

Below is a simple data model with dimensions and fact table. 

<img src = "img/dvd rental dw.png" />


### Designing Dimensions and Facts 

In [5]:
%load_ext sql
import pandas as pd



The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [7]:
DB_ENDPOINT = "127.0.0.1"
DB = 'dvdrental'
DB_USER = 'manny'
DB_PASSWORD = 'Yankees1'
DB_PORT = '5433'

# postgresql://username:password@host:port/database
conn_string = "postgresql://{}:{}@{}:{}/{}" \
                        .format(DB_USER, DB_PASSWORD, DB_ENDPOINT, DB_PORT, DB)

print(conn_string)

#this initiate the connection to postgres
%sql $conn_string

postgresql://manny:Yankees1@127.0.0.1:5433/dvdrental


### Creating Dimensions Tables

In [8]:
%%sql 

create table dimDate ( 
    date_key int primary key, 
    date date, 
    year smallint,
    quarter smallint,
    month smallint, 
    day smallint,
    week smallint,
    is_weekend boolean

)

* postgresql://manny:***@127.0.0.1:5433/dvdrental
Done.


[]

Below we can check weather the table was created and access the data type in the information_schema.columns.  

In [10]:
%%sql 

select column_name, data_type
from information_schema.columns
where table_name = 'dimdate' 

* postgresql://manny:***@127.0.0.1:5433/dvdrental
8 rows affected.


column_name,data_type
date_key,integer
date,date
year,smallint
quarter,smallint
month,smallint
day,smallint
week,smallint
is_weekend,boolean


### The dimCustomer table

In [13]:
%%sql 

create table dimCustomer (
    customer_key serial primary key, 
    customer_id smallint not null, 
    first_name varchar(45) not null, 
    last_name varchar(45) not null, 
    email varchar(50), 
    address varchar(50) not null, 
    address2 varchar(50), 
    district varchar(50) not null, 
    city varchar(50) not null, 
    country varchar(50) not null, 
    postalcode varchar(10), 
    phone varchar(20) not null, 
    active smallint not null, 
    create_date timestamp not null,
    start_date date not null, 
    end_date date not null
);

* postgresql://manny:***@127.0.0.1:5433/dvdrental
Done.


[]

### The dimMovie Table

In [14]:
%%sql 

create table dimMovie ( 
    movie_key serial primary key, 
    film_id smallint not null, 
    title varchar(255), 
    description text, 
    release_year year, 
    language varchar(20) not null, 
    original_language varchar(20), 
    rental_durantion smallint not null, 
    length smallint not null, 
    rating varchar(5) not null, 
    special_features varchar(60) not null
);

* postgresql://manny:***@127.0.0.1:5433/dvdrental
Done.


[]

### The dimStore table 

In [17]:
%%sql 

create table dimStore(
    store_key serial primary key, 
    store_id smallint not null, 
    address varchar(50) not null, 
    address2 varchar(50), 
    district varchar(50) not null, 
    city varchar(50) not null, 
    country varchar(50) not null, 
    postal_code varchar(20), 
    manager_first_name varchar(45) not null, 
    manager_last_name varchar(45) not null, 
    start_date date not null, 
    end_date date not null
);

* postgresql://manny:***@127.0.0.1:5433/dvdrental
Done.


[]

### The factSales table 

In [18]:
%%sql 

CREATE TABLE factSales
(
  sales_key        SERIAL PRIMARY KEY,
  date_key         INT NOT NULL REFERENCES dimDate(date_key),
  customer_key     INT NOT NULL REFERENCES dimCustomer(customer_key),
  movie_key        INT NOT NULL REFERENCES dimMovie(movie_key),
  store_key        INT NOT NULL REFERENCES dimStore(store_key),
  sales_amount     decimal(5,2) NOT NULL
);

* postgresql://manny:***@127.0.0.1:5433/dvdrental
Done.


[]