# Order data 

A database for this exercise has been created already, called `ex_superstore_normalize`.

The data for this exercise has been taken from Tableau's [2015 Superstore dataset](https://www.superdatascience.com/tableau/), with each sheet converted into CSVs. Unlike other exercises, you don't need the database for this exercise. You can choose to either make a new database directly from the CSVs, or import data from the large table into the smaller (normalized) tables.

 
If you think you have found an error in the questions below, please open a Github Issue.

## Exercise

There is only one exercise here: take the data in the `orders` table, and normalize it. That is, create other tables so that you are not repeating information. For example, you probably want to have customer data in a `customer` table, where each customer has a customer id as a primary key. The order table should contain the customer id, and no other information about the customer.

There are several different approaches to this problem, so no solution is provided. Here are some things to think about:
1. Can your proposed solution deal with multiple addresses per customer?
2. Can you tell which address each customer's order went to?
3. When you have made all your tables, can you write a SELECT statement with JOINs that recreates the original table?

## Preview of the `orders` table:

```sql
SELECT * FROM orders LIMIT 4;
```

<table border="1">
  <tr>
    <th align="center">row_id</th>
    <th align="center">order_priority</th>
    <th align="center">discount</th>
    <th align="center">unit_price</th>
    <th align="center">shipping_cost</th>
    <th align="center">customer_id</th>
    <th align="center">customer_name</th>
    <th align="center">ship_mode</th>
    <th align="center">customer_segment</th>
    <th align="center">product_category</th>
    <th align="center">product_subcategory</th>
    <th align="center">product_container</th>
    <th align="center">product_name</th>
    <th align="center">product_base_margin</th>
    <th align="center">country</th>
    <th align="center">region</th>
    <th align="center">state</th>
    <th align="center">city</th>
    <th align="center">postal_code</th>
    <th align="center">order_date</th>
    <th align="center">ship_date</th>
    <th align="center">profit</th>
    <th align="center">quantity_ordered_new</th>
    <th align="center">sales</th>
    <th align="center">order_id</th>
  </tr>
  <tr valign="top">
    <td align="right">20847</td>
    <td align="left">High</td>
    <td align="right">0.01</td>
    <td align="right">2.84</td>
    <td align="right">0.93</td>
    <td align="right">3</td>
    <td align="left">Bonnie Potter</td>
    <td align="left">Express Air</td>
    <td align="left">Corporate</td>
    <td align="left">Office Supplies</td>
    <td align="left">Pens &amp; Art Supplies</td>
    <td align="left">Wrap Bag</td>
    <td align="left">SANFORD Liquid Accent™ Tank-Style Highlighters</td>
    <td align="right">0.54</td>
    <td align="left">United States</td>
    <td align="left">West</td>
    <td align="left">Washington</td>
    <td align="left">Anacortes</td>
    <td align="right">98221</td>
    <td align="left">2015-01-07 00:00:00</td>
    <td align="left">2015-01-08 00:00:00</td>
    <td align="right">4.56</td>
    <td align="right">4</td>
    <td align="right">13.01</td>
    <td align="right">88522</td>
  </tr>
  <tr valign="top">
    <td align="right">20228</td>
    <td align="left">Not Specified</td>
    <td align="right">0.02</td>
    <td align="right">500.98</td>
    <td align="right">26</td>
    <td align="right">5</td>
    <td align="left">Ronnie Proctor</td>
    <td align="left">Delivery Truck</td>
    <td align="left">Home Office</td>
    <td align="left">Furniture</td>
    <td align="left">Chairs &amp; Chairmats</td>
    <td align="left">Jumbo Drum</td>
    <td align="left">Global Troy™ Executive Leather Low-Back Tilter</td>
    <td align="right">0.6</td>
    <td align="left">United States</td>
    <td align="left">West</td>
    <td align="left">California</td>
    <td align="left">San Gabriel</td>
    <td align="right">91776</td>
    <td align="left">2015-06-13 00:00:00</td>
    <td align="left">2015-06-15 00:00:00</td>
    <td align="right">4390.37</td>
    <td align="right">12</td>
    <td align="right">6362.85</td>
    <td align="right">90193</td>
  </tr>
  <tr valign="top">
    <td align="right">21776</td>
    <td align="left">Critical</td>
    <td align="right">0.06</td>
    <td align="right">9.48</td>
    <td align="right">7.29</td>
    <td align="right">11</td>
    <td align="left">Marcus Dunlap</td>
    <td align="left">Regular Air</td>
    <td align="left">Home Office</td>
    <td align="left">Furniture</td>
    <td align="left">Office Furnishings</td>
    <td align="left">Small Pack</td>
    <td align="left">DAX Two-Tone Rosewood/Black Document Frame, Desktop, 5 x 7</td>
    <td align="right">0.45</td>
    <td align="left">United States</td>
    <td align="left">East</td>
    <td align="left">New Jersey</td>
    <td align="left">Roselle</td>
    <td align="right">7203</td>
    <td align="left">2015-02-15 00:00:00</td>
    <td align="left">2015-02-17 00:00:00</td>
    <td align="right">-53.8096</td>
    <td align="right">22</td>
    <td align="right">211.15</td>
    <td align="right">90192</td>
  </tr>
  <tr valign="top">
    <td align="right">24844</td>
    <td align="left">Medium</td>
    <td align="right">0.09</td>
    <td align="right">78.69</td>
    <td align="right">19.99</td>
    <td align="right">14</td>
    <td align="left">Gwendolyn F Tyson</td>
    <td align="left">Regular Air</td>
    <td align="left">Small Business</td>
    <td align="left">Furniture</td>
    <td align="left">Office Furnishings</td>
    <td align="left">Small Box</td>
    <td align="left">Howard Miller 12-3/4 Diameter Accuwave DS ™ Wall Clock</td>
    <td align="right">0.43</td>
    <td align="left">United States</td>
    <td align="left">Central</td>
    <td align="left">Minnesota</td>
    <td align="left">Prior Lake</td>
    <td align="right">55372</td>
    <td align="left">2015-05-12 00:00:00</td>
    <td align="left">2015-05-14 00:00:00</td>
    <td align="right">803.471</td>
    <td align="right">16</td>
    <td align="right">1164.45</td>
    <td align="right">86838</td>
  </tr>
</table>

# My approach

I decided that I was going for "bug for bug compatibility" with the given dataset. For example, since each customer_id in the 
given database has only one location, I did not try
to support multiple addresses per customer. My goal 
is simply for the
joined normalized database to match the original orders database, with no loss of data and no "feature creep."

I found myself perplexed by the money columns:
prices, discounts, base_margin, and shipping costs were not 
one-to-one with products or customers.  Perhaps there are 
sales or price adjustments that just change with the calendar,
segment, quantity puchased, sales negotiations, etc.
At any rate, no information was given about the 
underlying business rules 
and my attempts at figuring it out
were inconclusive, so I put all of dollar amount fields in the
order_item table. With better documentation of the 
underlying business
rules, these might have been normalized differently. 

In [1]:
%load_ext sql
%sql postgres://localhost/ex_superstore_normalize

'Connected: @ex_superstore_normalize'

In [2]:
%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
import pandas as pd
import numpy as np

In [3]:
from sqlalchemy import MetaData
from sqlalchemy_schemadisplay import create_schema_graph

## Product attribute types

Create enumerated types for product container, category, and subcategory

In [4]:
def make_enum( column, enum_name ):
    """
        Input: name of column, name of enumerated type to create
        Effects: selects distinct 'column' from orders
                creates enumerated type 'enum_name' with those values
    """
    items = %sql SELECT DISTINCT $column FROM orders;
    # convert results of query into a string with parens on the ends,
    # commas between the values, and quotes around each string
    ilist = str(tuple(items[column].values))
    %sql CREATE TYPE $enum_name AS ENUM $ilist ; 

In [5]:
make_enum( 'product_container', 'product_container_t')
make_enum( 'product_category', 'product_category_t')
make_enum( 'product_subcategory', 'product_subcategory_t')

 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize


In [6]:
## Product table

In [7]:
%%sql
CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    product_container product_container_t,
    product_category product_category_t,
    product_subcategory product_subcategory_t,
    product_name TEXT
);

INSERT INTO 
    products (
         product_name, 
         product_container, 
         product_category,
         product_subcategory
    )
SELECT DISTINCT 
        product_name,
        product_container::product_container_t ,
        product_category::product_category_t,
        product_subcategory::product_subcategory_t
FROM
        orders 
;       

 * postgres://localhost/ex_superstore_normalize


## Shipping addresses

- Regions : another enumerated type
- States : I'm going to use the common two-letter state abbreviations as a state id

In [8]:
make_enum( 'region', 'region_t' )

 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize


In [9]:
%%sql
CREATE TABLE state (
    state_id VARCHAR(2) PRIMARY KEY,
    state_name VARCHAR,
    region region_t
);

 * postgres://localhost/ex_superstore_normalize


In [10]:
# dictionary downloaded from 
# http://code.activestate.com/recipes/577305-python-dictionary-of-us-states-and-territories/
abbr_to_state = {
        'AK': 'Alaska',
        'AL': 'Alabama',
        'AR': 'Arkansas',
        'AS': 'American Samoa',
        'AZ': 'Arizona',
        'CA': 'California',
        'CO': 'Colorado',
        'CT': 'Connecticut',
        'DC': 'District of Columbia',
        'DE': 'Delaware',
        'FL': 'Florida',
        'GA': 'Georgia',
        'GU': 'Guam',
        'HI': 'Hawaii',
        'IA': 'Iowa',
        'ID': 'Idaho',
        'IL': 'Illinois',
        'IN': 'Indiana',
        'KS': 'Kansas',
        'KY': 'Kentucky',
        'LA': 'Louisiana',
        'MA': 'Massachusetts',
        'MD': 'Maryland',
        'ME': 'Maine',
        'MI': 'Michigan',
        'MN': 'Minnesota',
        'MO': 'Missouri',
        'MP': 'Northern Mariana Islands',
        'MS': 'Mississippi',
        'MT': 'Montana',
        'NC': 'North Carolina',
        'ND': 'North Dakota',
        'NE': 'Nebraska',
        'NH': 'New Hampshire',
        'NJ': 'New Jersey',
        'NM': 'New Mexico',
        'NV': 'Nevada',
        'NY': 'New York',
        'OH': 'Ohio',
        'OK': 'Oklahoma',
        'OR': 'Oregon',
        'PA': 'Pennsylvania',
        'PR': 'Puerto Rico',
        'RI': 'Rhode Island',
        'SC': 'South Carolina',
        'SD': 'South Dakota',
        'TN': 'Tennessee',
        'TX': 'Texas',
        'UT': 'Utah',
        'VA': 'Virginia',
        'VI': 'Virgin Islands',
        'VT': 'Vermont',
        'WA': 'Washington',
        'WI': 'Wisconsin',
        'WV': 'West Virginia',
        'WY': 'Wyoming'
}

In [11]:
%%capture
for key in abbr_to_state.keys():
    %sql insert into state (state_id, state_name) values ('{key}','{abbr_to_state[key]}');

In [12]:
%%sql
WITH state_region_pairs AS (
    SELECT DISTINCT
        state state_name, region::region_t
    FROM
        orders
)
UPDATE 
    state
SET 
    region = state_region_pairs.region
FROM
    state_region_pairs
WHERE
    state.state_name = state_region_pairs.state_name
;

 * postgres://localhost/ex_superstore_normalize


I could guess a region for states that we don't have orders from, but I 
will just take the simple route and delete those rows.

In [13]:
%%sql
SELECT 
    * 
FROM 
    state
WHERE 
    region is NULL;

 * postgres://localhost/ex_superstore_normalize


Unnamed: 0,state_id,state_name,region
0,AK,Alaska,
1,AS,American Samoa,
2,GU,Guam,
3,HI,Hawaii,
4,MP,Northern Mariana Islands,
5,PR,Puerto Rico,
6,VI,Virgin Islands,


In [14]:
%%sql
DELETE FROM 
    state 
WHERE 
    region is NULL;

 * postgres://localhost/ex_superstore_normalize


And just to make sure I didn't miss any non-US orders...

In [15]:
%%sql
SELECT 
    * 
FROM 
    orders 
WHERE 
    country != 'United States'

 * postgres://localhost/ex_superstore_normalize


We are now ready to create and fill the customer name/address table. (There are no
street addresses in the given data, just region/state/city/zip.)

In [16]:
%%sql
CREATE TABLE customer (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR,
    state_id VARCHAR(2),
    city VARCHAR,
    postal_code INT,
    
    FOREIGN KEY (state_id)
        REFERENCES state(state_id)
)

 * postgres://localhost/ex_superstore_normalize


In [17]:
%%sql
INSERT INTO 
    customer (
        customer_id,
        customer_name,
        state_id,
        city,
        postal_code
    )
SELECT DISTINCT 
    customer_id,
    customer_name,
    state_id,
    city,
    postal_code
FROM
        orders
    JOIN 
        state 
    ON 
        orders.state = state.state_name
    
;

 * postgres://localhost/ex_superstore_normalize


Now that we've gotten the old data ported over, we want to set up the database to 
create new customer_ids that do not conflict with the inherited data.

In [18]:
def make_a_default_counter(table, column ):
    max_id = %sql SELECT MAX($column) max_used from $table
    starting_value = max_id.loc[0,'max_used'] + 1
    seq_name = 'seq_' + column
    quoted_name = "'" + seq_name + "'"
    %sql CREATE SEQUENCE $seq_name START $starting_value; 
    %sql ALTER TABLE $table ALTER COLUMN $column SET DEFAULT NEXTVAL($quoted_name);

In [19]:
make_a_default_counter('customer', 'customer_id')

 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize


## Orders and order items

See the EDA.ipynb for queries that demonstrate the following (non-intuitive) properties
of the order database:

- The entire order has one order date, but items can have separate ship dates, ship modes, and priority
- The same order_id can have multiple customer_id's associated with it 
- Two different order_id's can re-use the same row_id for order items
- Customers can have two different 'customer_segment' values in two different orders, so it's not really 
a per-customer attribute

In [20]:
make_enum( 'customer_segment', 'segment_t')
make_enum( 'order_priority', 'priority_code_t')
make_enum( 'ship_mode', 'ship_mode_t')

 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize


The name "orders" is being used by the old database, so
named the normalized version orders_n. (Probably would be 
better to copy from one database to another.) 

In [21]:
%%sql
CREATE TABLE orders_n (
    order_id INTEGER,
    customer_id INTEGER,
    segment segment_t,
    order_date TIMESTAMP DEFAULT 'today'::TIMESTAMP,
    
    /* same order_id is re-used across customer ids 
       in the legacy data ??!! 
    */
    PRIMARY KEY (order_id, customer_id),
    FOREIGN KEY (customer_id) REFERENCES
        customer (customer_id)
);

 * postgres://localhost/ex_superstore_normalize


In [22]:
%%sql
INSERT INTO orders_n ( 
    order_id, 
    customer_id, 
    segment, 
    order_date 
)
SELECT DISTINCT
    order_id, 
    customer_id, 
    customer_segment::segment_t, 
    order_date
FROM 
    orders
;

 * postgres://localhost/ex_superstore_normalize


In [23]:
make_a_default_counter('orders_n', 'order_id')

 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize


In [24]:
%%sql
CREATE TABLE order_item (
    row_id INTEGER,
    order_id INTEGER,
    customer_id INTEGER,
    product_id INTEGER,
    discount REAL,
    unit_price REAL,
    base_margin REAL,
    ship_mode ship_mode_t,
    shipping_cost REAL,
    ship_date TIMESTAMP,
    quantity_ordered_new INTEGER,
    profit REAL,
    sales REAL,
    priority priority_code_t DEFAULT 'Not Specified'::priority_code_t,
    
    /* the row_id in the original orders table is not a 
       unique identifier, so this is a two-part key */
    PRIMARY KEY (row_id, order_id),
    FOREIGN KEY (order_id, customer_id) 
        REFERENCES orders_n (order_id, customer_id),
    FOREIGN KEY (product_id)
        REFERENCES products (product_id)
);

 * postgres://localhost/ex_superstore_normalize


In [25]:
%%sql
INSERT INTO order_item (
    row_id, 
    order_id, 
    customer_id, 
    product_id, 
    discount, 
    unit_price,
    base_margin, 
    ship_mode, 
    ship_date,
    shipping_cost, 
    quantity_ordered_new, 
    profit, 
    sales,
    priority)
SELECT
    row_id, 
    order_id, 
    customer_id, 
    product_id, 
    discount, 
    unit_price, 
    product_base_margin,
    ship_mode::ship_mode_t, 
    ship_date,
    shipping_cost, 
    quantity_ordered_new, 
    profit, 
    sales,
    order_priority::priority_code_t
FROM
        orders
    JOIN
        products
    ON 
        products.product_name =  orders.product_name
;

 * postgres://localhost/ex_superstore_normalize


In [26]:
# sanity check
%sql select count(*) from orders;

 * postgres://localhost/ex_superstore_normalize


Unnamed: 0,count
0,1952


In [27]:
make_a_default_counter('order_item', 'row_id')

 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize


## Testing

To test:
1. Createa a view that mimics the original data table
2. Compare the joined data to the original data and check for errors

In [28]:
%%sql
CREATE VIEW 
    rejoined AS
SELECT 
    i.row_id,
    i.priority order_priority,
    i.discount,
    i.unit_price,
    i.shipping_cost,
    cu.customer_id,
    cu.customer_name,
    i.ship_mode,
    o.segment customer_segment,
    p.product_category,
    p.product_subcategory,
    p.product_container,
    p.product_name,
    i.base_margin product_base_margin,
    'United States' country,
    st.region,
    st.state_name state,
    cu.city,
    cu.postal_code,
    o.order_date,
    i.ship_date,
    i.profit,
    i.quantity_ordered_new,
    i.sales,
    i.order_id
FROM
    order_item i
    JOIN 
        orders_n o
    ON
        o.order_id = i.order_id
        and
        o.customer_id = i.customer_id
    JOIN
        products p
    ON
        i.product_id = p.product_id
    JOIN
        customer cu
    ON
        cu.customer_id = o.customer_id
    JOIN
        state st
    ON
        cu.state_id = st.state_id
;

 * postgres://localhost/ex_superstore_normalize


In [29]:
original = %sql SELECT * FROM orders ORDER BY customer_id, row_id;
normalized = %sql SELECT * FROM rejoined ORDER BY customer_id, row_id;

 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize


In [30]:
assert(original.shape == normalized.shape)

In [31]:
# see EDA notebook - this field has NULLs which 
# pandas won't ever compare as equal, even to another NULL
original.product_base_margin = original.product_base_margin.fillna(0)
normalized.product_base_margin = normalized.product_base_margin.fillna(0)

In [32]:
compare = (original == normalized) 
assert(compare.all(axis=0).all())
    

In [33]:
%%sql
/* drop the old tables before creating the image of the new schema */
DROP TABLE orders;
DROP TABLE returns;

 * postgres://localhost/ex_superstore_normalize


# Create schema diagram of final result 

In [34]:
connection = "postgres://localhost/ex_superstore_normalize"
graph = create_schema_graph(metadata=MetaData(connection), 
                            show_datatypes=True, # show datatypes
                            show_indexes=True, # show index (in ourcase unique)
                            rankdir='LR', # left to right alignment
                            concentrate=False)
graph.write_svg('images/database_schema_diagram.svg', f='svg:cairo', prog='dot')

![image](images/database_schema_diagram.svg 'schema')

# Additional testing 
Make sure the new schema isn't too complex to use. May want to create some views with additional joins, 
for example, if these basic operations are difficult. I also went back and did things like creating a 
default order date ('today') for the orders table due to the testing below.

In [35]:
## Look up a customer and get their orders

In [36]:
# grab a random customer (we're still in autopandas mode!)
customer_id = %sql SELECT customer_id FROM customer ORDER BY RANDOM() LIMIT 1;
customer_id = customer_id.customer_id.iloc[0]
customer_id

 * postgres://localhost/ex_superstore_normalize


507

In [37]:
query = f"""
SELECT 
    o.customer_id,
    c.customer_name,
    o.order_id,
    o.order_date,
    p.product_name,
    i.unit_price,
    i.quantity_ordered_new,
    i.ship_date
FROM 
        orders_n o
    JOIN
        order_item  i
    ON
        o.order_id = o.order_id
    JOIN
        products p
    ON
        p.product_id = i.product_id
    JOIN
        customer c
    ON
        o.customer_id = c.customer_id
WHERE
    o.customer_id = {customer_id}
ORDER BY
    order_id
LIMIT 
    10
;
"""
%sql $query

 * postgres://localhost/ex_superstore_normalize


Unnamed: 0,customer_id,customer_name,order_id,order_date,product_name,unit_price,quantity_ordered_new,ship_date
0,507,Carol Saunders,87357,2015-04-18,Xerox 1920,5.98,14,2015-02-13
1,507,Carol Saunders,87357,2015-04-18,5185,115.99,11,2015-04-13
2,507,Carol Saunders,87357,2015-04-18,5185,115.99,20,2015-05-04
3,507,Carol Saunders,87357,2015-04-18,2300 Heavy-Duty Transfer File Systems by Perma,24.98,23,2015-05-01
4,507,Carol Saunders,87357,2015-04-18,EcoTones® Memo Sheets,4.0,12,2015-01-09
5,507,Carol Saunders,87357,2015-04-18,EcoTones® Memo Sheets,4.0,14,2015-05-04
6,507,Carol Saunders,87357,2015-04-18,EcoTones® Memo Sheets,4.0,19,2015-04-06
7,507,Carol Saunders,87357,2015-04-18,EcoTones® Memo Sheets,4.0,5,2015-04-06
8,507,Carol Saunders,87357,2015-04-18,Surelock™ Post Binders,30.56,17,2015-05-27
9,507,Carol Saunders,87357,2015-04-18,Surelock™ Post Binders,30.56,12,2015-02-07


## How to create a new customer

In [38]:
%%sql 
INSERT INTO 
    customer (customer_name, state_id, city, postal_code )
VALUES  
    ('New Customer', 'WA', 'AnyTown', 12345 )
RETURNING 
    customer_id
;

 * postgres://localhost/ex_superstore_normalize


Unnamed: 0,customer_id
0,3404


In [39]:
customer_id = _.loc[0,'customer_id']

## How to create a new order

The order_id automatically populates. Order date will be "today".

In [40]:
query = f"""
INSERT INTO
    orders_n (customer_id, segment)
VALUES
    ( {customer_id}, 'Small Business'::segment_t)
RETURNING
    order_id
"""
order_id = %sql $query
# get the order number out of the returned 
# one-row, one-column table
order_id = order_id.order_id.iloc[0]
order_id

 * postgres://localhost/ex_superstore_normalize


91587

In [41]:
order_id

91587

Just select three random products for our new order

In [42]:
%%sql
SELECT 
    product_id
FROM
    products
ORDER BY 
    RANDOM()
LIMIT
    3;

 * postgres://localhost/ex_superstore_normalize


Unnamed: 0,product_id
0,580
1,628
2,768


Add items to the sale (using random numbers here)

In [43]:
random_products = _.product_id.values

# create the sale
for i,product_id in enumerate(random_products):
    quantity = i+1
    query = f"""
INSERT INTO
    order_item (order_id, 
                customer_id, 
                product_id, 
                discount, 
                unit_price, 
                base_margin, 
                quantity_ordered_new, 
                sales)
VALUES (
    {order_id},
    {customer_id},
    {product_id},
    .01,
    12.34,
    11.12,
    {quantity},
    12.34 * {quantity}
);
"""
    %sql $query

 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize


See what we have so far

In [44]:
# create string versions of numbers to do string substitution into query below
s_order_id = str(order_id)
s_customer_id = str(customer_id)
s_order_id, s_customer_id

('91587', '3404')

In [45]:
%%sql
SELECT 
    * 
FROM
        orders_n
    JOIN
        order_item
    ON
        orders_n.order_id = order_item.order_id
WHERE
        orders_n.customer_id = :s_customer_id
    AND
        orders_n.order_id = :s_order_id
;  

 * postgres://localhost/ex_superstore_normalize


Unnamed: 0,order_id,customer_id,segment,order_date,row_id,order_id.1,customer_id.1,product_id,discount,unit_price,base_margin,ship_mode,shipping_cost,ship_date,quantity_ordered_new,profit,sales,priority
0,91587,3404,Small Business,2019-05-09,26390,91587,3404,580,0.01,12.34,11.12,,,,1,,12.34,Not Specified
1,91587,3404,Small Business,2019-05-09,26391,91587,3404,628,0.01,12.34,11.12,,,,2,,24.68,Not Specified
2,91587,3404,Small Business,2019-05-09,26392,91587,3404,768,0.01,12.34,11.12,,,,3,,37.02,Not Specified


Ship the items

In [46]:
%%sql
UPDATE 
    order_item
SET ( ship_mode, 
      shipping_cost, 
      ship_date, 
      profit 
    )
=
    ( 'Delivery Truck'::ship_mode_t,
    6,
    'TODAY'::TIMESTAMP,
    sales - 6 
    )
WHERE
        customer_id = :s_customer_id
    AND
        order_id = :s_order_id
;

 * postgres://localhost/ex_superstore_normalize


In [47]:
%%sql
SELECT 
    * 
FROM
        orders_n
    JOIN
        order_item
    ON
        orders_n.order_id = order_item.order_id
WHERE
        orders_n.customer_id = :s_customer_id
    AND
        orders_n.order_id = :s_order_id
;  

 * postgres://localhost/ex_superstore_normalize


Unnamed: 0,order_id,customer_id,segment,order_date,row_id,order_id.1,customer_id.1,product_id,discount,unit_price,base_margin,ship_mode,shipping_cost,ship_date,quantity_ordered_new,profit,sales,priority
0,91587,3404,Small Business,2019-05-09,26390,91587,3404,580,0.01,12.34,11.12,Delivery Truck,6.0,2019-05-09,1,6.34,12.34,Not Specified
1,91587,3404,Small Business,2019-05-09,26391,91587,3404,628,0.01,12.34,11.12,Delivery Truck,6.0,2019-05-09,2,18.68,24.68,Not Specified
2,91587,3404,Small Business,2019-05-09,26392,91587,3404,768,0.01,12.34,11.12,Delivery Truck,6.0,2019-05-09,3,31.02,37.02,Not Specified


## How to create a new product?  

In [48]:
%%sql
INSERT INTO products (product_container, product_category, product_subcategory, product_name )
VALUES ( 'Wrap Bag'::product_container_t,
       'Office Supplies'::product_category_t,
       'Pens & Art Supplies'::product_subcategory_t,
       'A Fancy New Glitter Pen')
RETURNING product_id
;

 * postgres://localhost/ex_superstore_normalize


Unnamed: 0,product_id
0,914


## What if it has a new product category and subcategory? 

More and more companies are declaring their offices 'dog friendly.' This represents a new marketing opportunity 
for our hypothetical office supply company. 

Note: ipython-sql runs a transaction by default, and you cannot update a type as part of a transaction, hence
the extra COMMITs. In production, you'd run either sql or straight python over ipython and ipython-sql, but 
for a code portfolio, I think ipython-sql is much more readable than psycopg2 and a straight SQL script wouldn't
easily let you see inputs and outputs in one jupyter notebook. 

In [49]:
%sql COMMIT; ALTER TYPE product_category_t ADD VALUE 'Pet-Friendly Office';
%sql COMMIT; ALTER TYPE product_subcategory_t ADD VALUE 'Feeding';

 * postgres://localhost/ex_superstore_normalize
 * postgres://localhost/ex_superstore_normalize


In [50]:
%%sql
INSERT INTO products (product_container, product_category, product_subcategory, product_name )
VALUES ( 'Small Box'::product_container_t,
       'Pet-Friendly Office'::product_category_t,
       'Feeding'::product_subcategory_t,
       'Stainless Steel Water Bowl'
       )
RETURNING product_id
;

 * postgres://localhost/ex_superstore_normalize


Unnamed: 0,product_id
0,915


In [51]:
%%sql
SELECT 
    *
FROM
    products
WHERE
    product_id IN (914, 915)

 * postgres://localhost/ex_superstore_normalize


Unnamed: 0,product_id,product_container,product_category,product_subcategory,product_name
0,914,Wrap Bag,Office Supplies,Pens & Art Supplies,A Fancy New Glitter Pen
1,915,Small Box,Pet-Friendly Office,Feeding,Stainless Steel Water Bowl
