## Using Python to Integrate MongoDB Data into an ETL Process
Modern Data Warehousing and Analytics solutions frequently use languages like Python or Scala to extract data from numerous sources, including relational database management systems, NoSQL database systems, real-time streaming endpoints and Data Lakes. These languages can then be used to perform many types of transformation before then loading the data into a variety of destinations including file systems and data warehouses. This data can then be consumed by data scientists or business analysts.

In this lab you will build upon the **Northwind_DW2** dimensional database from Lab 3; however, you will be integrating new data sourced from an instance of MongoDB. The new data will be concerned with new business processes; inventory and purchasing. You will continue to interact with both the source systems (MongoDB and MySQL), and the destination system (the Northwind_DW2 data warehouse) from a remote client running Python (Jupyter Notebooks). 

Just as in Lab 3, you will fetch data into Pandas DataFrames, perform all the necessary transformations in-memory on the client, and then push the newly transformed DataFrame to the RDBMS data warehouse using a Pandas function that will create the table and fill it with data with a single operation.

### Prerequisites:
This notebook uses the PyMongo database connectivity library to connect to MySQL databases; therefore, you must have first installed that libary into your python environment by executing the following command in a Terminal window.

- `python -m pip install pymongo[srv]`

#### Import the Necessary Libraries

In [1]:
import os
import json
import numpy
import datetime
import certifi
import pandas as pd

import pymongo
import sqlalchemy
from sqlalchemy import create_engine

In [2]:
print(f"Running SQL Alchemy Version: {sqlalchemy.__version__}")
print(f"Running PyMongo Version: {pymongo.__version__}")

Running SQL Alchemy Version: 1.4.39
Running PyMongo Version: 4.6.2


#### Declare & Assign Connection Variables for the MongoDB Server, the MySQL Server & Databases with which You'll be Working 

In [3]:
mysql_uid = "root"
mysql_pwd = "Passw0rd123"
mysql_hostname = "localhost"

atlas_cluster_name = "Cluster0"
atlas_user_name = "bzt4em"
atlas_password = "nYECGm3XXrAK5jqy"

conn_str = {"local" : f"mongodb://localhost:27017/",
    "atlas" : f"mongodb+srv://{atlas_user_name}:{atlas_password}@{atlas_cluster_name}.mongodb.net"
}

src_dbname = "classic_purchasing"
dst_dbname = "classic_dw"

print(f"Local Connection String: {conn_str['local']}")
print(f"Atlas Connection String: {conn_str['atlas']}")

Local Connection String: mongodb://localhost:27017/
Atlas Connection String: mongodb+srv://bzt4em:nYECGm3XXrAK5jqy@Cluster0.mongodb.net


#### Define Functions for Getting Data From and Setting Data Into Databases

In [4]:
def get_sql_dataframe(user_id, pwd, host_name, db_name, sql_query):
    '''Create a connection to the MySQL database'''
    conn_str = f"mysql+pymysql://{user_id}:{pwd}@{host_name}/{db_name}"
    sqlEngine = create_engine(conn_str, pool_recycle=3600)
    
    '''Invoke the pd.read_sql() function to query the database, and fill a Pandas DataFrame.'''
    conn = sqlEngine.connect()
    dframe = pd.read_sql(sql_query, conn);
    conn.close()
    
    return dframe


def get_mongo_dataframe(connect_str, db_name, collection, query):
    '''Create a connection to MongoDB'''
    client = pymongo.MongoClient(connect_str)
    
    '''Query MongoDB, and fill a python list with documents to create a DataFrame'''
    db = client[db_name]
    dframe = pd.DataFrame(list(db[collection].find(query)))
    dframe.drop(['_id'], axis=1, inplace=True)
    client.close()
    return dframe


def set_dataframe(user_id, pwd, host_name, db_name, df, table_name, pk_column, db_operation):
    '''Create a connection to the MySQL database'''
    conn_str = f"mysql+pymysql://{user_id}:{pwd}@{host_name}/{db_name}"
    sqlEngine = create_engine(conn_str, pool_recycle=3600)
    connection = sqlEngine.connect()
    
    '''Invoke the Pandas DataFrame .to_sql( ) function to either create, or append to, a table'''
    if db_operation == "insert":
        df.to_sql(table_name, con=connection, index=False, if_exists='replace')
        connection.execute(f"ALTER TABLE {table_name} ADD PRIMARY KEY ({pk_column});")
            
    elif db_operation == "update":
        df.to_sql(table_name, con=connection, index=False, if_exists='append')
    
    connection.close()

#### Populate MongoDB with Source Data
You only need to run this cell once; however, the operation is *idempotent*.  In other words, it can be run multiple times without changing the end result.

In [5]:
client = pymongo.MongoClient(conn_str["local"])
db = client[src_dbname]

# Gets the path of the Current Working Directory for this Notebook, and then Appends the 'data' directory.
data_dir = os.path.join(os.getcwd(), 'project_data')

json_files = {"order_details" : 'classic_order_details.json', 
              "orders": 'classic_orders.json'
             }

for file in json_files:
    db.drop_collection(file)
    json_file = os.path.join(data_dir, json_files[file])
    with open(json_file, 'r') as openfile:
        json_object = json.load(openfile)
        file = db[file]
        result = file.insert_many(json_object)
        #print(f"{file} was successfully loaded.")

        
client.close()      

In [6]:
df_customers = pd.read_csv(os.path.join(data_dir, 'customers.csv'))
print(df_customers.head(2))
# read_csv exports a dataframe, so no need to convert into dataframe below

   customerNumber        customerName contactLastName contactFirstName  \
0             103   Atelier graphique         Schmitt          Carine    
1             112  Signal Gift Stores            King             Jean   

        phone     addressLine1 addressLine2       city state postalCode  \
0  40.32.2555   54, rue Royale          NaN     Nantes   NaN      44000   
1  7025551838  8489 Strong St.          NaN  Las Vegas    NV      83030   

  country  salesRepEmployeeNumber  creditLimit  
0  France                  1370.0      21000.0  
1     USA                  1166.0      71800.0  


### 1.0. Create and Populate the New Dimension Tables
#### 1.1. Extract Data from the Source MongoDB Collections Into DataFrames

In [7]:
query = {} # Select all elements (columns), and all documents (rows).
collection = "order_details"

df_order_details = get_mongo_dataframe(conn_str['local'], src_dbname, collection, query)
df_order_details.head(2)

Unnamed: 0,orderNumber,productCode,quantityOrdered,priceEach,orderLineNumber
0,10100,S18_1749,30,136.0,3
1,10100,S18_2248,50,55.09,2


In [8]:
query = {} # Select all elements (columns), and all documents (rows).
collection = "orders"

df_orders = get_mongo_dataframe(conn_str['local'], src_dbname, collection, query)  # Specify 'atlas', or 'local'
df_orders.head(2)

Unnamed: 0,orderNumber,orderDate,requiredDate,shippedDate,status,comments,customerNumber
0,10100,2003-01-06,2003-01-13,2003-01-10,Shipped,,363
1,10101,2003-01-09,2003-01-18,2003-01-11,Shipped,Check on availability.,128


#### 1.2. Lookup the Invoice Date Keys from the Date Dimension Table.
Here we see an example of where a dimension cross-references another dimension; the Date dimension.  The Date dimension is a classic example of a **Role-Playing dimension**. Dates in-and-of themselves are universal; however, when applied to specific events they take on the identity of those events. Here, the *dim_date* table takes on the identity of *dim_invoice_date* to supply invoice date keys to the *dim_invoices* table.

##### 1.2.1. Get the Data from the Date Dimension Table.
First, fetch the Surrogate Primary Key (date_key) and the Business Key (full_date) from the Date Dimension table using the **get_sql_dataframe()** function. Be certain to cast the **full_date** column to the **datetime64[ns]** data type using the **.astype()** function that is native to Pandas DataFrame columns. Also, extract the **date** portion using the **.dt.date** attribute of the **datetime64[ns]** datatype.

In [9]:
sql_dim_date = "SELECT date_key, full_date FROM classic_dw.dim_date;"
df_dim_date = get_sql_dataframe(mysql_uid, mysql_pwd, mysql_hostname, dst_dbname, sql_dim_date)
df_dim_date.full_date = df_dim_date.full_date.astype('datetime64[ns]').dt.date
df_dim_date.head(2)

Unnamed: 0,date_key,full_date
0,20000101,2000-01-01
1,20000102,2000-01-02


In [10]:
sql_dim_products = "SELECT * FROM classic_dw.dim_products;"
df_dim_products = get_sql_dataframe(mysql_uid, mysql_pwd, mysql_hostname, dst_dbname, sql_dim_products)
df_dim_products.head(2)

Unnamed: 0,productCode,productName,productLine,productScale,productVendor,productDescription,quantityInStock,buyPrice,MSRP
0,S10_1678,1969 Harley Davidson Ultimate Chopper,Motorcycles,1:10,Min Lin Diecast,"This replica features working kickstand, front...",7933,48.81,95.7
1,S10_1949,1952 Alpine Renault 1300,Classic Cars,1:10,Classic Metal Creations,Turnable front wheels; steering function; deta...,7305,98.58,214.3


##### 1.2.2. Lookup the Surrogate Primary Key (date_key) that Corresponds to the order_date Column

In [11]:
df_dim_orderDate = df_dim_date.rename(columns={"date_key" : "orderDateKey", "full_date" : "orderDate"})
df_orders.orderDate = df_orders.orderDate.astype('datetime64[ns]').dt.date
df_orders = pd.merge(df_orders, df_dim_orderDate, on='orderDate', how='left')
df_orders.drop(['orderDate'], axis=1, inplace=True)
df_orders.head(2)

Unnamed: 0,orderNumber,requiredDate,shippedDate,status,comments,customerNumber,orderDateKey
0,10100,2003-01-13,2003-01-10,Shipped,,363,20030106
1,10101,2003-01-18,2003-01-11,Shipped,Check on availability.,128,20030109


#### 1.3. Perform Any Necessary Transformations to the DataFrames

In [12]:
# 3. Insert a new column, with an ever-incrementing numeric value, to serve as the primary key.
df_orders.insert(0, "orderKey", range(1, df_orders.shape[0]+1))
df_orders.head(2)

Unnamed: 0,orderKey,orderNumber,requiredDate,shippedDate,status,comments,customerNumber,orderDateKey
0,1,10100,2003-01-13,2003-01-10,Shipped,,363,20030106
1,2,10101,2003-01-18,2003-01-11,Shipped,Check on availability.,128,20030109


In [13]:
df_order_details.insert(0, "orderDetailsKey", range(1, df_order_details.shape[0]+1))
df_order_details.head(2)

Unnamed: 0,orderDetailsKey,orderNumber,productCode,quantityOrdered,priceEach,orderLineNumber
0,1,10100,S18_1749,30,136.0,3
1,2,10100,S18_2248,50,55.09,2


In [14]:
# 1. Rename the "id" column to reflect the entity as it will serve as the business key for lookup operations
#df_suppliers.rename(columns={"id":"supplier_id"}, inplace=True)


# 2. Insert a new column, with an ever-incrementing numeric value, to serve as the primary key.
#df_suppliers.insert(0, "supplier_key", range(1, df_suppliers.shape[0]+1))
#df_suppliers.head(2)
df_customers.insert(0, "customerKey", range(1, df_customers.shape[0] + 1))
df_customers.head(2)

Unnamed: 0,customerKey,customerNumber,customerName,contactLastName,contactFirstName,phone,addressLine1,addressLine2,city,state,postalCode,country,salesRepEmployeeNumber,creditLimit
0,1,103,Atelier graphique,Schmitt,Carine,40.32.2555,"54, rue Royale",,Nantes,,44000,France,1370.0,21000.0
1,2,112,Signal Gift Stores,King,Jean,7025551838,8489 Strong St.,,Las Vegas,NV,83030,USA,1166.0,71800.0


#### 1.3. Load the Transformed DataFrames into the New Data Warehouse by Creating New Tables

Here we will call our **set_dataframe( )** function to create each dimension table. This function expects a number of parameters including the usual connection information (e.g., user_id, password, MySQL server name and database), the *table_name* we need to assign to the table, the *pandas DataFrame* we crafted to define & populate the table, the *name* of the column we wish to designate as the *primary_key* column, and finally, the database operation (insert or update). 

In [15]:
dataframe = df_orders
table_name = 'dim_orders'
primary_key = 'orderKey'
db_operation = "insert"

set_dataframe(mysql_uid, mysql_pwd, mysql_hostname, dst_dbname, dataframe, table_name, primary_key, db_operation)

In [16]:
dataframe = df_order_details
table_name = 'dim_order_details'
primary_key = 'orderDetailsKey'
db_operation = "insert"

set_dataframe(mysql_uid, mysql_pwd, mysql_hostname, dst_dbname, dataframe, table_name, primary_key, db_operation)

In [17]:
# TODO: Upload the "Customers" dataframe to create the new "dim_customers" dimension table
dataframe = df_customers
table_name = 'dim_customers'
primary_key = 'customerKey'
db_operation = "insert"

set_dataframe(mysql_uid, mysql_pwd, mysql_hostname, dst_dbname, dataframe, table_name, primary_key, db_operation)

#### 1.4. Validate that the New Dimension Tables were Created.

In [18]:
sql_orders = "SELECT * FROM classic_dw.dim_orders;"
df_dim_orders = get_sql_dataframe(mysql_uid, mysql_pwd, mysql_hostname, dst_dbname, sql_orders)
df_dim_orders.head(2)

Unnamed: 0,orderKey,orderNumber,requiredDate,shippedDate,status,comments,customerNumber,orderDateKey
0,1,10100,2003-01-13,2003-01-10,Shipped,,363,20030106
1,2,10101,2003-01-18,2003-01-11,Shipped,Check on availability.,128,20030109


In [19]:
sql_order_details = "SELECT * FROM classic_dw.dim_order_details;"
df_dim_order_details = get_sql_dataframe(mysql_uid, mysql_pwd, mysql_hostname, dst_dbname, sql_order_details)
df_dim_order_details.head(2)

Unnamed: 0,orderDetailsKey,orderNumber,productCode,quantityOrdered,priceEach,orderLineNumber
0,1,10100,S18_1749,30,136.0,3
1,2,10100,S18_2248,50,55.09,2


In [20]:
sql_customers = "SELECT * FROM classic_dw.dim_customers;"
df_dim_customers = get_sql_dataframe(mysql_uid, mysql_pwd, mysql_hostname, dst_dbname, sql_customers)
df_dim_customers.head(2)

Unnamed: 0,customerKey,customerNumber,customerName,contactLastName,contactFirstName,phone,addressLine1,addressLine2,city,state,postalCode,country,salesRepEmployeeNumber,creditLimit
0,1,103,Atelier graphique,Schmitt,Carine,40.32.2555,"54, rue Royale",,Nantes,,44000,France,1370.0,21000.0
1,2,112,Signal Gift Stores,King,Jean,7025551838,8489 Strong St.,,Las Vegas,NV,83030,USA,1166.0,71800.0


### 2.0. Create and Populate the New Fact Tables
#### 2.1. Extract Data from the Source MongoDB Collections Into DataFrames

In [21]:
# TODO: Extract data for your new "Inventory Transactions" Fact Table
#query = {} # Select all elements (columns), and all documents (rows).

#collection = "inventory_transactions"

#df_fact_inventory = get_mongo_dataframe(conn_str['local'], src_dbname, collection, query)
#df_fact_inventory.head(2)

#### 2.2. Lookup the DateKeys from the Date Dimension Table.
**2.2.1.** For each date typed column in the **purchase orders** fact table, lookup the corresponding Surrogate Primary Key column. Be certain to cast the date typed column to the **datetime64[ns]** data type using the **.astype()** function that is native to Pandas DataFrame columns. Also, extract the **date** portion using the **.dt.date** attribute of the **datetime64[ns]** datatype.

In [22]:
df_fact_orders = pd.merge(df_orders, df_order_details, on='orderNumber', how='right')
df_fact_orders.head(2)

Unnamed: 0,orderKey,orderNumber,requiredDate,shippedDate,status,comments,customerNumber,orderDateKey,orderDetailsKey,productCode,quantityOrdered,priceEach,orderLineNumber
0,1,10100,2003-01-13,2003-01-10,Shipped,,363,20030106,1,S18_1749,30,136.0,3
1,1,10100,2003-01-13,2003-01-10,Shipped,,363,20030106,2,S18_2248,50,55.09,2


In [23]:
# Lookup the Surrogate Primary Key (date_key) that Corresponds to the "submitted_date" Column.
df_dim_required_date = df_dim_date.rename(columns={"date_key" : "requiredDateKey", "full_date" : "requiredDate"})
df_fact_orders.requiredDate = df_fact_orders.requiredDate.astype('datetime64[ns]').dt.date
df_fact_orders = pd.merge(df_fact_orders, df_dim_required_date, on='requiredDate', how='left')
df_fact_orders.drop(['requiredDate'], axis=1, inplace=True)
df_fact_orders.head(2)

Unnamed: 0,orderKey,orderNumber,shippedDate,status,comments,customerNumber,orderDateKey,orderDetailsKey,productCode,quantityOrdered,priceEach,orderLineNumber,requiredDateKey
0,1,10100,2003-01-10,Shipped,,363,20030106,1,S18_1749,30,136.0,3,20030113
1,1,10100,2003-01-10,Shipped,,363,20030106,2,S18_2248,50,55.09,2,20030113


In [24]:
# Lookup the Surrogate Primary Key (date_key) that Corresponds to the "creation_date" Column.
df_dim_shippedDate = df_dim_date.rename(columns={"date_key" : "shippedDateKey", "full_date" : "shippedDate"})
df_fact_orders.shippedDate = df_fact_orders.shippedDate.astype('datetime64[ns]').dt.date
df_fact_orders = pd.merge(df_fact_orders, df_dim_shippedDate, on='shippedDate', how='left')
df_fact_orders.drop(['shippedDate'], axis=1, inplace=True)
df_fact_orders.head(2)

Unnamed: 0,orderKey,orderNumber,status,comments,customerNumber,orderDateKey,orderDetailsKey,productCode,quantityOrdered,priceEach,orderLineNumber,requiredDateKey,shippedDateKey
0,1,10100,Shipped,,363,20030106,1,S18_1749,30,136.0,3,20030113,20030110.0
1,1,10100,Shipped,,363,20030106,2,S18_2248,50,55.09,2,20030113,20030110.0


In [25]:
df_fact_orders['orderTotalPrice'] = df_fact_orders['quantityOrdered'] * df_fact_orders['priceEach']
df_fact_orders.head(2)

Unnamed: 0,orderKey,orderNumber,status,comments,customerNumber,orderDateKey,orderDetailsKey,productCode,quantityOrdered,priceEach,orderLineNumber,requiredDateKey,shippedDateKey,orderTotalPrice
0,1,10100,Shipped,,363,20030106,1,S18_1749,30,136.0,3,20030113,20030110.0,4080.0
1,1,10100,Shipped,,363,20030106,2,S18_2248,50,55.09,2,20030113,20030110.0,2754.5


#### 2.3. Lookup the Primary Keys from the Dimension Tables
**Foreign key relationships** must be established between each newly-crafted **Fact table** and each related **Dimension table**.

##### 2.3.1. First, fetch the Surrogate Primary Key and the Business Key from each Dimension table.

In [26]:
# Modify 'df_fact_orders' by merging it with 'df_dim_customers' on the 'customer_id' column
# Drop the 'customer_id' column
# Display the first 2 rows of the dataframe to validate your work
sql_dim_customers = "SELECT customerKey, customerName, customerNumber FROM classic_dw.dim_customers;"
df_dim_customers_selected_columns = get_sql_dataframe(mysql_uid, mysql_pwd, mysql_hostname, dst_dbname, sql_dim_customers)
df_dim_customers_selected_columns.head(2)

df_fact_orders = pd.merge(df_fact_orders, df_dim_customers_selected_columns, on='customerNumber', how='inner')
df_fact_orders.drop(['customerNumber'], axis=1, inplace=True)
df_fact_orders.head(2)

Unnamed: 0,orderKey,orderNumber,status,comments,orderDateKey,orderDetailsKey,productCode,quantityOrdered,priceEach,orderLineNumber,requiredDateKey,shippedDateKey,orderTotalPrice,customerKey,customerName
0,1,10100,Shipped,,20030106,1,S18_1749,30,136.0,3,20030113,20030110.0,4080.0,86,Online Diecast Creations Co.
1,1,10100,Shipped,,20030106,2,S18_2248,50,55.09,2,20030113,20030110.0,2754.5,86,Online Diecast Creations Co.


In [27]:
sql_dim_products = "SELECT productCode, productName FROM classic_dw.dim_products;"
df_dim_products_selected_columns = get_sql_dataframe(mysql_uid, mysql_pwd, mysql_hostname, dst_dbname, sql_dim_products)
df_dim_products_selected_columns.head(2)

Unnamed: 0,productCode,productName
0,S10_1678,1969 Harley Davidson Ultimate Chopper
1,S10_1949,1952 Alpine Renault 1300


In [28]:
df_fact_orders = pd.merge(df_fact_orders, df_dim_products_selected_columns, on='productCode')
df_fact_orders.drop('productCode', axis=1, inplace=True)
df_fact_orders.head(2)

Unnamed: 0,orderKey,orderNumber,status,comments,orderDateKey,orderDetailsKey,quantityOrdered,priceEach,orderLineNumber,requiredDateKey,shippedDateKey,orderTotalPrice,customerKey,customerName,productName
0,1,10100,Shipped,,20030106,1,30,136.0,3,20030113,20030110.0,4080.0,86,Online Diecast Creations Co.,1917 Grand Touring Sedan
1,74,10173,Shipped,Cautious optimism. We have happy customers her...,20031105,661,24,168.3,13,20031115,20031109.0,4039.2,57,Rovelli Gifts,1917 Grand Touring Sedan


In [29]:
df_fact_orders.insert(0, "factOrdersKey", range(1, df_fact_orders.shape[0]+1))
df_fact_orders.head(2)

Unnamed: 0,factOrdersKey,orderKey,orderNumber,status,comments,orderDateKey,orderDetailsKey,quantityOrdered,priceEach,orderLineNumber,requiredDateKey,shippedDateKey,orderTotalPrice,customerKey,customerName,productName
0,1,1,10100,Shipped,,20030106,1,30,136.0,3,20030113,20030110.0,4080.0,86,Online Diecast Creations Co.,1917 Grand Touring Sedan
1,2,74,10173,Shipped,Cautious optimism. We have happy customers her...,20031105,661,24,168.3,13,20031115,20031109.0,4039.2,57,Rovelli Gifts,1917 Grand Touring Sedan


#### 2.4. Perform Any Necessary Transformations to the DataFrames

In [30]:
df_fact_orders.drop(['comments', 'orderLineNumber'], axis=1, inplace=True)
df_fact_orders.head(2)

Unnamed: 0,factOrdersKey,orderKey,orderNumber,status,orderDateKey,orderDetailsKey,quantityOrdered,priceEach,requiredDateKey,shippedDateKey,orderTotalPrice,customerKey,customerName,productName
0,1,1,10100,Shipped,20030106,1,30,136.0,20030113,20030110.0,4080.0,86,Online Diecast Creations Co.,1917 Grand Touring Sedan
1,2,74,10173,Shipped,20031105,661,24,168.3,20031115,20031109.0,4039.2,57,Rovelli Gifts,1917 Grand Touring Sedan


In [31]:
# Rename columns
column_name_map = {"status" : "orderStatus",
                   "priceEach" : "unitPrice",
                  }

df_fact_orders.rename(columns=column_name_map, inplace=True)

# Reorder the Columns
ordered_columns = ['factOrdersKey', 'orderNumber','orderStatus'
                   ,'customerName','productName'
                   ,'unitPrice','quantityOrdered','orderTotalPrice', 
                  'orderKey', 'orderDateKey', 'orderDetailsKey', 'requiredDateKey', 'shippedDateKey', 
                  'customerKey']

df_fact_orders = df_fact_orders[ordered_columns]
df_fact_orders.head(2)

Unnamed: 0,factOrdersKey,orderNumber,orderStatus,customerName,productName,unitPrice,quantityOrdered,orderTotalPrice,orderKey,orderDateKey,orderDetailsKey,requiredDateKey,shippedDateKey,customerKey
0,1,10100,Shipped,Online Diecast Creations Co.,1917 Grand Touring Sedan,136.0,30,4080.0,1,20030106,1,20030113,20030110.0,86
1,2,10173,Shipped,Rovelli Gifts,1917 Grand Touring Sedan,168.3,24,4039.2,74,20031105,661,20031115,20031109.0,57


#### 2.5. Load Newly Transformed MongoDB Data into the Northwind_DW2 Data Warehouse

In [32]:
dataframe = df_fact_orders
table_name = 'fact_orders'
primary_key = 'factOrdersKey'
db_operation = "insert"

set_dataframe(mysql_uid, mysql_pwd, mysql_hostname, dst_dbname, dataframe, table_name, primary_key, db_operation)

#### 2.6. Validate that the New Fact Tables were Created

In [33]:
sql_fact_orders = "SELECT * FROM classic_dw.fact_orders;"
df_fact_orders_validate = get_sql_dataframe(mysql_uid, mysql_pwd, mysql_hostname, dst_dbname, sql_fact_orders)
df_fact_orders_validate.head(2)

Unnamed: 0,factOrdersKey,orderNumber,orderStatus,customerName,productName,unitPrice,quantityOrdered,orderTotalPrice,orderKey,orderDateKey,orderDetailsKey,requiredDateKey,shippedDateKey,customerKey
0,1,10100,Shipped,Online Diecast Creations Co.,1917 Grand Touring Sedan,136.0,30,4080.0,1,20030106,1,20030113,20030110.0,86
1,2,10173,Shipped,Rovelli Gifts,1917 Grand Touring Sedan,168.3,24,4039.2,74,20031105,661,20031115,20031109.0,57


### 3.0. Demonstrate that the New Data Warehouse Exists and Contains the Correct Data
To demonstrate the viability of your solution, author a SQL SELECT statement that returns:
- Each Company’s Name
- The total amount of the purchase order detail quantity associated with each company
- The total amount of the purchase order detail unit cost associated with each company

**NOTE:** *Remember that a string typed variable whose value is contained by triple-quotes (""" ... """) can preserve multi-line formatting, and that a string variable has an intrinsic **.format()** function that accepts ordered parameters that will replace tokens (e.g., {0}) in the formatted string.*  

In [41]:
sql_test = """
    SELECT * FROM dim_customers;
""".format(dst_dbname)

df_test = get_sql_dataframe(mysql_uid, mysql_pwd, mysql_hostname, dst_dbname, sql_test)
df_test.head()

Unnamed: 0,customerKey,customerNumber,customerName,contactLastName,contactFirstName,phone,addressLine1,addressLine2,city,state,postalCode,country,salesRepEmployeeNumber,creditLimit
0,1,103,Atelier graphique,Schmitt,Carine,40.32.2555,"54, rue Royale",,Nantes,,44000,France,1370.0,21000.0
1,2,112,Signal Gift Stores,King,Jean,7025551838,8489 Strong St.,,Las Vegas,NV,83030,USA,1166.0,71800.0
2,3,114,"Australian Collectors, Co.",Ferguson,Peter,03 9520 4555,636 St Kilda Road,Level 3,Melbourne,Victoria,3004,Australia,1611.0,117300.0
3,4,119,La Rochelle Gifts,Labrune,Janine,40.67.8555,"67, rue des Cinquante Otages",,Nantes,,44000,France,1370.0,118200.0
4,5,121,Baane Mini Imports,Bergulfsen,Jonas,07-98 9555,Erling Skakkes gate 78,,Stavern,,4110,Norway,1504.0,81700.0


In [36]:
sql_test = """
    SELECT customers.`last_name` AS `Customer Name`,
        SUM(orders.`quantity`) AS `Total Quantity`,
        SUM(orders.`unit_price`) AS `Total Unit Price`
    FROM `{0}`.`fact_orders` AS orders
    INNER JOIN `{0}`.`dim_customers` AS customers
    ON orders.customer_key = customers.customer_key
    GROUP BY customers.`last_name`
    ORDER BY `Total Unit Price` DESC;
""".format(dst_dbname)

df_test = get_sql_dataframe(user_id, pwd, host_name, src_dbname, sql_test)
df_test.head()

NameError: name 'user_id' is not defined

In [34]:
sql_purchase_orders = """
    SELECT s.company AS 'company', 
        SUM(po_detail_quantity) AS 'Total Order Detail Quantity', 
        SUM(po_detail_unit_cost) AS 'Total Detail Unit Cost'
    FROM northwind_dw2.fact_purchase_orders AS po
    INNER JOIN northwind_dw2.dim_suppliers AS s
    ON po.supplier_key = s.supplier_key
    GROUP BY company
    ORDER BY SUM(po_detail_quantity) DESC;

"""

NameError: name 'sfjahsfjas' is not defined

In [None]:
df_fact_purchase_orders = get_sql_dataframe(mysql_uid, mysql_pwd, mysql_hostname, dst_dbname, sql_purchase_orders)
df_fact_purchase_orders

### 3.1 Extra Credit: Author a Query that Returns the Inventory Transaction Data
- Each Product Category
- Each TransactionType
- The total number of transactions associated with each Product Category and Transaction Type
- The total amount (quantity) associated with each Product Category and Transaction Type

In [None]:
sql_inventory_transactions = """
    SELECT p.category AS 'Category', 
        it.inventory_transaction_type AS 'Transaction Type',
        COUNT(it.inventory_transaction_type) 'Number of Transactions',
        SUM(it.quantity) AS 'Total Quantity'
    FROM northwind_dw2.fact_inventory_transactions AS it
    INNER JOIN northwind_dw2.dim_products AS p
    ON p.product_key = it.product_key
    GROUP BY p.category, 
        it.inventory_transaction_type
    ORDER BY SUM(it.quantity) DESC;
"""

In [None]:
df_fact_inventory_transactions = get_sql_dataframe(mysql_uid, mysql_pwd, mysql_hostname, dst_dbname, sql_inventory_transactions)
df_fact_inventory_transactions