# Source Database and CDC

## Table of Contents:

1. [Overview](#Overview)
2. [Aurora MySQL as Source Database](#Aurora-MySQL-as-Source-Database)
3. [Accessing EMR Web UIs](#Accessing-EMR-Web-UIs)
4. [AWS DMS Full Load](#AWS-DMS-Full-Load)
5. [Simulate Random Updates](#Simulate-Random-Updates)

### Overview

We will use this notebook to verify that our Aurora MySQL database is up and running. This database will serve as the source of transactions, and we will execute random updates from this notebook to simulate incoming transactions.

![Architecture](resources/lab_architecture.png)

### Aurora MySQL as Source Database

Let's first test connectivity to our database:

Let's run some SQL statements. We will use the following helper functions to execute SQL statements:

In [1]:
import MySQLdb,random,time

host = 'hudidb.cbpvchb38o9m.us-east-1.rds.amazonaws.com'
user = 'master'
password = 'S3cretPwd99'
port = 3306
db = 'salesdb'

conn = MySQLdb.Connection(
    host=host,
    user=user,
    passwd=password,
    port=port,
    db=db
)

In [2]:
def execute_sql(sql):
    conn.query(sql)
    result = conn.store_result()
    for i in range(result.num_rows()):
        print(result.fetch_row())
        
def execute_dml(sql):
    conn.query(sql)
    rowcount = conn.affected_rows()
    print ("Rows updated: %d"%rowcount)
    conn.commit()

In [3]:
execute_sql("show tables")

(('CUSTOMER',),)
(('CUSTOMER_SITE',),)
(('PRODUCT',),)
(('PRODUCT_CATEGORY',),)
(('SALES_ORDER',),)
(('SALES_ORDER_ALL',),)
(('SALES_ORDER_DETAIL',),)
(('SALES_ORDER_DETAIL_DS',),)
(('SALES_ORDER_V',),)
(('SUPPLIER',),)


This is a generic SALES OLTP schema. Of the tables above, the SALES_ORDER_DETAIL is the one for which we will simulate updates.

Let's execute one update on a random order in the SALES_ORDER_DETAIL table.

In [4]:
# Example of how to update a random order.
order_id=random.randint(1,29000)
print ("Original Values: ")
execute_sql("SELECT ORDER_ID, LINE_NUMBER, QUANTITY FROM SALES_ORDER_DETAIL WHERE ORDER_ID = %d"%order_id)
execute_dml("UPDATE SALES_ORDER_DETAIL set QUANTITY = QUANTITY + 1 WHERE ORDER_ID = %d"%order_id)
print ("Updated Values: ")
execute_sql("SELECT ORDER_ID, LINE_NUMBER, QUANTITY FROM SALES_ORDER_DETAIL WHERE ORDER_ID = %d"%order_id)

Original Values: 
((3037, 1, 99),)
((3037, 1, 93),)
((3037, 1, 68),)
((3037, 1, 58),)
Rows updated: 4
Updated Values: 
((3037, 1, 100),)
((3037, 1, 94),)
((3037, 1, 69),)
((3037, 1, 59),)


### Accessing EMR Web UIs

- Setup extension 'FoxyProxy Standard' in Chrome:

    - Install and configure extension 'FoxyProxy Standard' using Google Chrome (Recommended)
    - Download this file locally : [Foxy Proxy Config](resources/fox-proxy-config.xml)
    - Go to More Tools -> Extensions in Chrome (go to chrome://extensions)
    - Make sure extension 'FoxyProxy Standard' is enabled.
    - Click on 'Details' for the extension and then on 'Extension options'.
    - On the Import/Export page, choose Choose File, browse to the location of the foxyproxy-settings.xml file you created, select the file, and choose Open.
    - Choose Replace when prompted to overwrite the existing settings.
    - For Proxy mode, choose 'Use proxies based on their predefined patterns and priorities.'
    
    
- Opening an SSL Tunnel to the EMR Spark Cluster:

    - Open Terminal and Run the command below to start an SSL tunnel.
    - Choose 'yes' when asked if the host should be added to trusted hosts.
    - The command should not return which indicates that the tunnel is open.
    - Now you should be able to open [http://ec2-54-158-247-127.compute-1.amazonaws.com:8088](http://ec2-54-158-247-127.compute-1.amazonaws.com:8088) in Chrome.
    
    - **For Windows Users only**, please follow instructions in the links below:
        - Converting Your Private Key Using PuTTYgen:
[https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html)

        - Use Putty to start an SSL Tunnel to EMR Master node:
[https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-ssh-tunnel.html#emr-ssh-tunnel-win](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-ssh-tunnel.html#emr-ssh-tunnel-win)

     
```bash
$> ssh -i ee-default-keypair.pem  hadoop@ec2-54-158-247-127.compute-1.amazonaws.com -ND 8157
```

    

### AWS DMS Full Load

In this step we will execute a full load of data from this database to S3 using AWS DMS:

- Navigate to the DMS Console by clicking on Services -> DMS. 
- Locate the menu item Conversion & migration->Database migration tasks from the left-hand panel of the DMS Console.
- Select the only Replication Task item and click on the button Actions -> Restart/Resume to start this task.
- You can monitor the progress of this task by clicking on the task link. 

### Simulate Random Updates

Let's perform some random updates to our data. We will use the helper function below to perform these updates.

In [5]:
def perform_random_updates():
    order_id=random.randint(1,29000)
    execute_dml("UPDATE SALES_ORDER_DETAIL set QUANTITY = QUANTITY + 1 WHERE ORDER_ID = %d"%order_id)

In [6]:
while (True):
    perform_random_updates()

Rows updated: 3
Rows updated: 1
Rows updated: 8
Rows updated: 2
Rows updated: 2
Rows updated: 7
Rows updated: 5
Rows updated: 4
Rows updated: 4
Rows updated: 1
Rows updated: 3
Rows updated: 1
Rows updated: 3
Rows updated: 4
Rows updated: 1
Rows updated: 6
Rows updated: 4
Rows updated: 2
Rows updated: 6
Rows updated: 2
Rows updated: 9
Rows updated: 3
Rows updated: 2
Rows updated: 2
Rows updated: 2
Rows updated: 5
Rows updated: 1
Rows updated: 4
Rows updated: 5
Rows updated: 5
Rows updated: 7
Rows updated: 1
Rows updated: 3
Rows updated: 2
Rows updated: 4
Rows updated: 4
Rows updated: 6
Rows updated: 2
Rows updated: 2
Rows updated: 7
Rows updated: 3
Rows updated: 1
Rows updated: 5
Rows updated: 1
Rows updated: 4
Rows updated: 4
Rows updated: 6
Rows updated: 4
Rows updated: 5
Rows updated: 4
Rows updated: 2
Rows updated: 6
Rows updated: 8
Rows updated: 6
Rows updated: 3
Rows updated: 2
Rows updated: 7
Rows updated: 4
Rows updated: 1
Rows updated: 6
Rows updated: 5
Rows updated: 4
Rows upd

KeyboardInterrupt: 

<div class="alert alert-block alert-danger">
<b>Note:</b> Please stop the execution of the above cell after some time. We will switch to the other notebooks for the remaining of the lab. But please keep this notebook open in a tab as we will come back to it to simulate more updates later.
</div>