## Setup

In [27]:
import boto3
import json

def get_secret(secret_name, region_name="us-east-1"):
    session = boto3.session.Session()
    client = session.client(
        service_name='secretsmanager',
        region_name=region_name)
    get_secret_value_response = client.get_secret_value(SecretId=secret_name)
    get_secret_value_response = json.loads(get_secret_value_response['SecretString'])
    return get_secret_value_response

creds = get_secret("wysde")
USERNAME = creds["RDS_MYSQL_USERNAME"]
PASSWORD = creds["RDS_MYSQL_PASSWORD"]
HOST = creds["RDS_MYSQL_HOST"]
DATABASE = 'sparsh'
conn_str = 'mysql+mysqlconnector://{0}:{1}@{2}/{3}'.format(USERNAME, PASSWORD, HOST, DATABASE)

%config SqlMagic.autopandas=True
%config SqlMagic.displaycon=False
%config SqlMagic.feedback=False
%config SqlMagic.displaylimit=5
%reload_ext sql
%sql {conn_str}

Let's create a table named `orders` with three columns (orderId, orderValue and orderTimestamp)

In [5]:
%%sql

CREATE TABLE orders (
    orderId INT NOT NULL AUTO_INCREMENT,
    orderValue FLOAT NOT NULL,
    orderTimestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    PRIMARY KEY (orderId)
)

- The orderId column is of the integer data type and can not have a missing record. The first record starts from 1, while each new record after that automatically increments on the previous record by 1. orderId is also the unique identifier for the table.
- The orderValue column is of the float data type. The column can take integer and decimal numerical values as records.
- The orderTimestamp is of the TIMESTAMP data type. By default, the value for each data record is the timestamp at which data was added to the table orders. If an update is made to a data record, the orderTimestamp column changes to the timestamp when the update was made.

Let’s insert some random values into table orders

In [6]:
%%sql

INSERT INTO orders (orderValue) values(900.5);
INSERT INTO orders (orderValue) values(500);
INSERT INTO orders (orderValue) values(200.5);
INSERT INTO orders (orderValue) values(450.5);
INSERT INTO orders (orderValue) values(700);

SELECT * FROM orders

Unnamed: 0,orderId,orderValue,orderTimestamp
0,1,900.5,2023-03-16 12:27:26
1,2,500.0,2023-03-16 12:27:28
2,3,200.5,2023-03-16 12:27:31
3,4,450.5,2023-03-16 12:27:33
4,5,700.0,2023-03-16 12:27:36


Let’s create a table named latest_timestamp. This table stores the latest orderTimestamp data or value in the orders table.

In [9]:
%%sql

CREATE TABLE latest_timestamp (
    orderTimestamp TIMESTAMP
)

Create a table `orders2` which will hold the batch and incremental data

In [12]:
%%sql

CREATE TABLE orders2 AS
SELECT *
FROM orders;
TRUNCATE orders2;

1. create a table called orders_incremental_schema which is used to serve as the data structure of the table to be created

2. create a table called orders_incremental_schema2 which is used to validate data structure of the data to be inserted in the latest_timestamp table

In [13]:
%%sql

INSERT INTO orders2
SELECT *
FROM orders
WHERE orderTimestamp > '1900-01-01 00:00:00';

SELECT *
FROM orders2;

Unnamed: 0,orderId,orderValue,orderTimestamp
0,1,900.5,2023-03-16 12:27:26
1,2,500.0,2023-03-16 12:27:28
2,3,200.5,2023-03-16 12:27:31
3,4,450.5,2023-03-16 12:27:33
4,5,700.0,2023-03-16 12:27:36


In [15]:
%%sql

TRUNCATE TABLE latest_timestamp;

INSERT INTO latest_timestamp
SELECT MAX(orderTimestamp) orderTimestamp
FROM orders;

SELECT *
FROM latest_timestamp;

Unnamed: 0,orderTimestamp
0,2023-03-16 12:27:36


## Incremental - INSERT

Let’s add new records to the orders table

In [16]:
%%sql

INSERT INTO orders (orderValue) values(1000.5);
INSERT INTO orders (orderValue) values(1500);

SELECT * FROM orders;

Unnamed: 0,orderId,orderValue,orderTimestamp
0,1,900.5,2023-03-16 12:27:26
1,2,500.0,2023-03-16 12:27:28
2,3,200.5,2023-03-16 12:27:31
3,4,450.5,2023-03-16 12:27:33
4,5,700.0,2023-03-16 12:27:36
5,6,1000.5,2023-03-16 15:01:16
6,7,1500.0,2023-03-16 15:01:19


We inserted two new records with orderId 6 and orderId 7.

Now let's run the incremental logic

In [21]:
%%sql

INSERT INTO orders2
SELECT *
FROM orders
WHERE orderTimestamp > (SELECT * FROM latest_timestamp);

TRUNCATE TABLE latest_timestamp;

INSERT INTO latest_timestamp
SELECT MAX(orderTimestamp) orderTimestamp
FROM orders;

Let’s check to see what’s in the `orders2` table

In [22]:
%%sql

SELECT * FROM orders2;

Unnamed: 0,orderId,orderValue,orderTimestamp
0,1,900.5,2023-03-16 12:27:26
1,2,500.0,2023-03-16 12:27:28
2,3,200.5,2023-03-16 12:27:31
3,4,450.5,2023-03-16 12:27:33
4,5,700.0,2023-03-16 12:27:36
5,6,1000.5,2023-03-16 15:01:16
6,7,1500.0,2023-03-16 15:01:19


As shown above, the destination orders table has data records that exactly match the updated orders table in the source system.

In incremental data ingestion, new records (created by using INSERT syntax) are captured during the incremental run and the records are appended to the specified table in the destination system.

## Incremental - UPDATE

Let’s demonstrate when updates are made to existing data records in the orders table in MySQL.

Let’s update two existing records in the orders table in MySQL

In [24]:
%%sql

UPDATE orders
set orderValue = 350
WHERE orderId = 1;

UPDATE orders
set orderValue = 1000
WHERE orderId = 5;

SELECT *
FROM orders;

Unnamed: 0,orderId,orderValue,orderTimestamp
0,1,350.0,2023-03-16 15:10:17
1,2,500.0,2023-03-16 12:27:28
2,3,200.5,2023-03-16 12:27:31
3,4,450.5,2023-03-16 12:27:33
4,5,1000.0,2023-03-16 15:10:20
5,6,1000.5,2023-03-16 15:01:16
6,7,1500.0,2023-03-16 15:01:19


Now let's run the incremental logic

In [28]:
%%sql

INSERT INTO orders2
SELECT *
FROM orders
WHERE orderTimestamp > (SELECT * FROM latest_timestamp);

SELECT * FROM orders2;

Unnamed: 0,orderId,orderValue,orderTimestamp
0,1,900.5,2023-03-16 12:27:26
1,2,500.0,2023-03-16 12:27:28
2,3,200.5,2023-03-16 12:27:31
3,4,450.5,2023-03-16 12:27:33
4,5,700.0,2023-03-16 12:27:36
5,6,1000.5,2023-03-16 15:01:16
6,7,1500.0,2023-03-16 15:01:19
7,1,350.0,2023-03-16 15:10:17
8,5,1000.0,2023-03-16 15:10:20


We have duplicates for orderId 1 and orderId 5 which are the rows that were updated in the orders table in MySQL.

To fix it, we need to update our incremental logic

In [None]:
%%sql

INSERT INTO orders2
SELECT *
FROM orders
WHERE orderTimestamp > (SELECT * FROM latest_timestamp);

TRUNCATE TABLE latest_timestamp;

INSERT INTO latest_timestamp
SELECT MAX(orderTimestamp) orderTimestamp
FROM orders;

## Incremental - DELETE

Let’s demonstrate when a delete is made to an existing data record in the orders table in MySQL.

Let’s delete a record in the orders table in MySQL

In [25]:
%%sql

DELETE FROM orders WHERE orderId = 3;

SELECT * FROM orders;

Unnamed: 0,orderId,orderValue,orderTimestamp
0,1,350.0,2023-03-16 15:10:17
1,2,500.0,2023-03-16 12:27:28
2,4,450.5,2023-03-16 12:27:33
3,5,1000.0,2023-03-16 15:10:20
4,6,1000.5,2023-03-16 15:01:16
5,7,1500.0,2023-03-16 15:01:19


In [29]:
%%sql

INSERT INTO orders2
SELECT *
FROM orders
WHERE orderTimestamp > (SELECT * FROM latest_timestamp);

SELECT * FROM orders2;

Unnamed: 0,orderId,orderValue,orderTimestamp
0,1,900.5,2023-03-16 12:27:26
1,2,500.0,2023-03-16 12:27:28
2,3,200.5,2023-03-16 12:27:31
3,4,450.5,2023-03-16 12:27:33
4,5,700.0,2023-03-16 12:27:36
5,6,1000.5,2023-03-16 15:01:16
6,7,1500.0,2023-03-16 15:01:19
7,1,350.0,2023-03-16 15:10:17
8,5,1000.0,2023-03-16 15:10:20
9,1,350.0,2023-03-16 15:10:17
