<div id="singlestore-header" style="display: flex; background-color: rgba(209, 153, 255, 0.25); padding: 5px;">
    <div id="icon-image" style="width: 90px; height: 90px;">
        <img width="100%" height="100%" src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/header-icons/notes.png" />
    </div>
    <div id="text" style="padding: 5px; margin-left: 10px;">
        <h1 style="font-weight: 500; margin: 8px 0 0 4px;">Setup CDC Replication from MongoDB® to SingleStore using SQL commands</h1>
    </div>
</div>

<table style="border: 0; border-spacing: 0; width: 100%; background-color: #03010D"><tr>
    <td style="padding: 0; margin: 0; background-color: #03010D; width: 33%; text-align: center"><img src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-logo-vertical.png" style="height: 200px;"/></td>
    <td style="padding: 0; margin: 0; width: 66%; background-color: #03010D; text-align: right"><img src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/notebooks/atlas-and-kai/images/cdc-in-header.png" style="height: 250px"/></td>
</tr></table>

## When do you use SingleStore's native replication capability from MongoDB ? 

SingleStore's native data replication gives you the ability to do one-time snapshot, or continuous change data capture CDC from MongoDB® to SingleStoreDB. This provides a quick and easy way to replicate data and power up analytics on MongoDB® data.

## What you will learn in this notebook:

Replicate MongoDB® collections to SingleStore
1. Directly without transformations
2. Flattening required fields into columns of a table
3. Normalizing collection into multiple tables 


## 1. Replicate directly without transformations

To replicate the required collections, provide the list of collections using `"collection.include.list": "<Collection list>"` at the time of link creation, the parameter takes a comma-separated list of regular expressions that match collection names (in databaseName.collectionName format) 

In [238]:
%%sql
DROP DATABASE IF EXISTS sample_analytics;
CREATE DATABASE sample_analytics;

In [239]:
%%sql
USE sample_analytics;
CREATE LINK cdclink as MONGODB
CONFIG '{"mongodb.hosts":"ac-t7n47to-shard-00-00.tfutgo0.mongodb.net:27017,ac-t7n47to-shard-00-01.tfutgo0.mongodb.net:27017,ac-t7n47to-shard-00-02.tfutgo0.mongodb.net:27017",
"collection.include.list": "sample_analytics.customers",
"mongodb.ssl.enabled":"true",
"mongodb.authsource":"admin",
"mongodb.members.auto.discover": "true"    
    }'
CREDENTIALS '{
    "mongodb.user":"mongo_sample_reader",
    "mongodb.password":"SingleStoreRocks27017"
    }'

Check if the link got created

In [242]:
%%sql
SHOW LINKS on sample_analytics;

Link,Type,Description
cdclink,MONGODB,


The following step automatically creates the required tables and pipelines on SingleStoreDB for every collection configured for replication

In [243]:
%%sql
USE sample_analytics;
CREATE TABLES AS INFER PIPELINE AS LOAD DATA LINK cdclink '*' FORMAT AVRO;

Start pipelines to begin replicating the data 

In [250]:
%%sql
USE sample_analytics;
START ALL PIPELINES;

In [252]:
%%sql
USE sample_analytics;
show tables

Tables_in_sample_analytics
customers


The customer collection from MongoDB are replicated into SingleStore in the default format of _id and _more BSON columns that are compatible with Kai API

In [254]:
%%sql
USE sample_analytics;
select (_id :> JSON),(_more :> JSON) from customers limit 2;

(_id :> JSON),(_more :> JSON)
{'$oid': '5ca4bbcea2dd94ee58162c3f'},"{'accounts': [341830, 412203, 240787, 493235, 485840, 167440], 'address': '9808 Miller Mountain Suite 716\nDanielberg, MD 93803', 'birthdate': {'$date': '1994-07-26T20:23:37.000Z'}, 'email': 'aliciagilbert@yahoo.com', 'name': 'Destiny Miller', 'tier_and_details': {'c389367845644ecb9587a9a28aa60f67': {'active': True, 'benefits': ['sports tickets', 'concierge services'], 'id': 'c389367845644ecb9587a9a28aa60f67', 'tier': 'Bronze'}, 'cf563e54d52c475b9c0e786e281ba57d': {'active': True, 'benefits': ['airline lounge access'], 'id': 'cf563e54d52c475b9c0e786e281ba57d', 'tier': 'Bronze'}}, 'username': 'christophersnyder'}"
{'$oid': '5ca4bbcea2dd94ee58162bd7'},"{'accounts': [816225, 501213, 960469, 950785, 344107], 'address': '73451 Thomas Flat Apt. 779\nWest Davidport, WY 28035', 'birthdate': {'$date': '1970-08-23T02:37:09.000Z'}, 'email': 'jimeneztracey@gmail.com', 'name': 'Jeffrey Reeves', 'tier_and_details': {'b1b5212381ec4d46b86507456c3085b8': {'active': True, 'benefits': ['airline lounge access'], 'id': 'b1b5212381ec4d46b86507456c3085b8', 'tier': 'Platinum'}, 'b3d61f13292a492885f233b8c8ef3415': {'active': True, 'benefits': ['concert tickets', 'financial planning assistance'], 'id': 'b3d61f13292a492885f233b8c8ef3415', 'tier': 'Bronze'}, 'e56b345495034d0684f2b940783fbccc': {'active': True, 'benefits': ['airline lounge access', 'travel insurance'], 'id': 'e56b345495034d0684f2b940783fbccc', 'tier': 'Silver'}}, 'username': 'robin78'}"


<div id="singlestore-footer" style="background-color: rgba(194, 193, 199, 0.25); height:2px; margin-bottom:10px"></div>
<div><img src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-logo-grey.png" style="padding: 0px; margin: 0px; height: 24px"/></div>

## 2. Flattening required fields from document into columns
CDC replication also gives additional flexibility to define your own table structure at SingleStore as you bring in data from MongoDB collections. In the following examples data from MongoDB collections are transformed when brought to SingleStoreDB

Fields like `username`, `name`, `email` are flattened into columns of the table and rest of the document is stored in _more column.
The following commands create a table, a stored procedure and a pipeline required for the data replication 

In [255]:
%%sql
CREATE TABLE `sample_analytics`.`customers_flattened` (
  `_id` bson NOT NULL,
  `username` text CHARACTER SET utf8 COLLATE utf8_general_ci,
  `name` text CHARACTER SET utf8 COLLATE utf8_general_ci,
  `email` text CHARACTER SET utf8 COLLATE utf8_general_ci,
  `_more` bson NOT NULL COMMENT 'KAI_MORE' ,
  `$_id` as BSON_NORMALIZE_NO_ARRAY(`_id`) PERSISTED longblob COMMENT 'KAI_AUTO' ,
  SHARD KEY `__SHARDKEY` (`$_id`),
  UNIQUE KEY `__PRIMARY` (`$_id`) USING HASH,
  SORT KEY `__UNORDERED` ()
) 

In [256]:
%%sql
CREATE OR REPLACE PROCEDURE `sample_analytics`.`customers_apply_changes`(changes query(`__operation` int(11) NOT NULL, `_id` longblob NOT NULL, `_more` longblob NOT NULL))
RETURNS void AS 
DECLARE rowsDeleted INT;
BEGIN REPLACE INTO  `sample_analytics`.`customers_flattened` SELECT `_id`:>BSON AS `_id`, BSON_EXTRACT_STRING(`_more`,'username') AS `username`, BSON_EXTRACT_STRING(`_more`,'name') AS `name`, BSON_EXTRACT_STRING(`_more`,'email') AS `email`,
BSON_EXCLUDE_MASK(`_more`,'{"_id": 1,"username": 1,"name": 1,"email": 1}') AS `_more`FROM changes WHERE __operation != 1;
SELECT count(*) INTO rowsDeleted FROM changes WHERE changes.__operation = 1;
IF rowsDeleted > 0 THEN
DELETE dest FROM `sample_analytics`.`customers_flattened` AS dest INNER JOIN changes ON dest.`$_id` = BSON_NORMALIZE_NO_ARRAY(changes.`_id`) WHERE changes.__operation = 1; END IF;
END;

In [257]:
%%sql

CREATE AGGREGATOR PIPELINE `sample_analytics`.`customers_apply_changes`
AS LOAD DATA LINK cdclink 'sample_analytics.customers'
BATCH_INTERVAL 2500
MAX_PARTITIONS_PER_BATCH 1
DISABLE OFFSETS METADATA GC
REPLACE
KEY(`_id`)
INTO PROCEDURE `sample_analytics`.`customers_apply_changes`
FORMAT AVRO
(
    __operation <- `__operation`,
    _id <- `payload`::`_id`,
    _more <- `payload`::`_more`
)

In [264]:
%%sql
USE sample_analytics;
START ALL PIPELINES;

In [265]:
%%sql
USE sample_analytics;
show tables;

Tables_in_sample_analytics
customers
customers_flattened


In [267]:
%%sql
Select _id :> JSON,username, name, email, _more :> JSON from sample_analytics.customers_flattened limit 10;

_id :> JSON,username,name,email,_more :> JSON
{'$oid': '5ca4bbcea2dd94ee58162c3f'},christophersnyder,Destiny Miller,aliciagilbert@yahoo.com,"{'accounts': [341830, 412203, 240787, 493235, 485840, 167440], 'address': '9808 Miller Mountain Suite 716\nDanielberg, MD 93803', 'birthdate': {'$date': '1994-07-26T20:23:37.000Z'}, 'tier_and_details': {'c389367845644ecb9587a9a28aa60f67': {'active': True, 'benefits': ['sports tickets', 'concierge services'], 'id': 'c389367845644ecb9587a9a28aa60f67', 'tier': 'Bronze'}, 'cf563e54d52c475b9c0e786e281ba57d': {'active': True, 'benefits': ['airline lounge access'], 'id': 'cf563e54d52c475b9c0e786e281ba57d', 'tier': 'Bronze'}}}"
{'$oid': '5ca4bbcea2dd94ee58162bd7'},robin78,Jeffrey Reeves,jimeneztracey@gmail.com,"{'accounts': [816225, 501213, 960469, 950785, 344107], 'address': '73451 Thomas Flat Apt. 779\nWest Davidport, WY 28035', 'birthdate': {'$date': '1970-08-23T02:37:09.000Z'}, 'tier_and_details': {'b1b5212381ec4d46b86507456c3085b8': {'active': True, 'benefits': ['airline lounge access'], 'id': 'b1b5212381ec4d46b86507456c3085b8', 'tier': 'Platinum'}, 'b3d61f13292a492885f233b8c8ef3415': {'active': True, 'benefits': ['concert tickets', 'financial planning assistance'], 'id': 'b3d61f13292a492885f233b8c8ef3415', 'tier': 'Bronze'}, 'e56b345495034d0684f2b940783fbccc': {'active': True, 'benefits': ['airline lounge access', 'travel insurance'], 'id': 'e56b345495034d0684f2b940783fbccc', 'tier': 'Silver'}}}"
{'$oid': '5ca4bbcea2dd94ee58162b94'},karenfarrell,Charles Flores,rebecca51@hotmail.com,"{'accounts': [558623, 262488], 'address': '63432 Morton Mills\nAlexischester, MA 22487', 'birthdate': {'$date': '1972-02-02T09:42:02.000Z'}, 'tier_and_details': {'153755224a544887b611dc046ae8a701': {'active': True, 'benefits': ['concert tickets', 'airline lounge access'], 'id': '153755224a544887b611dc046ae8a701', 'tier': 'Bronze'}, '7842213621354f1e95d98486987c4e67': {'active': True, 'benefits': ['dedicated account representative', '24 hour dedicated line'], 'id': '7842213621354f1e95d98486987c4e67', 'tier': 'Bronze'}, 'b244bc087bba4d4db16826ab3ddabe4d': {'active': True, 'benefits': ['dedicated account representative'], 'id': 'b244bc087bba4d4db16826ab3ddabe4d', 'tier': 'Bronze'}}}"
{'$oid': '5ca4bbcea2dd94ee58162c15'},kathyjones,Heather Wilkins,paulfrazier@yahoo.com,"{'accounts': [134905, 734321, 731178, 981821, 229899], 'address': '789 Angela Mission Apt. 351\nPort Joseph, MA 25115', 'birthdate': {'$date': '1989-03-02T08:42:32.000Z'}, 'tier_and_details': {'6d691fffdf814363aaa0a746645b9681': {'active': True, 'benefits': ['financial planning assistance'], 'id': '6d691fffdf814363aaa0a746645b9681', 'tier': 'Gold'}}}"
{'$oid': '5ca4bbcea2dd94ee58162b98'},dpitts,Nicholas Brown,steven83@hotmail.com,"{'accounts': [532811, 701602], 'address': '547 Nunez Crossing\nPort Williamchester, CT 81646', 'birthdate': {'$date': {'$numberLong': '-82717769000'}}, 'tier_and_details': {}}"
{'$oid': '5ca4bbcea2dd94ee58162bfa'},brenda56,Austin Johnson,mcguirejennifer@yahoo.com,"{'accounts': [248380, 244782], 'address': '165 Brittany Green\nNorth Eric, MN 84627', 'birthdate': {'$date': '1971-06-12T23:52:56.000Z'}, 'tier_and_details': {'0f4bd4b6eb8e46a7a8db87a39f7bca8a': {'active': True, 'benefits': ['concert tickets', 'financial planning assistance'], 'id': '0f4bd4b6eb8e46a7a8db87a39f7bca8a', 'tier': 'Silver'}, 'bc2e9881858a4b21bf201d64b0142072': {'active': True, 'benefits': ['concert tickets', 'airline lounge access'], 'id': 'bc2e9881858a4b21bf201d64b0142072', 'tier': 'Silver'}, 'da555c3a4a15430e8b5377e949c84799': {'active': True, 'benefits': ['24 hour dedicated line', 'sports tickets'], 'id': 'da555c3a4a15430e8b5377e949c84799', 'tier': 'Gold'}}}"
{'$oid': '5ca4bbcea2dd94ee58162b5f'},madeline96,Phillip Molina,steven93@gmail.com,"{'accounts': [924182, 700899, 226865, 604215, 300405, 980056], 'address': '0101 Brown Grove Apt. 002\nCastilloville, MN 23427', 'birthdate': {'$date': '1976-04-03T23:19:19.000Z'}, 'tier_and_details': {}}"
{'$oid': '5ca4bbcea2dd94ee58162bca'},roconnor,Robert Obrien,sfreeman@gmail.com,"{'accounts': [497929], 'address': '9800 Camacho Lane\nSouth Benjaminburgh, WV 39265', 'birthdate': {'$date': '1989-05-13T18:18:25.000Z'}, 'tier_and_details': {}}"
{'$oid': '5ca4bbcea2dd94ee58162c51'},jacksoncolleen,Susan Davis,mmurray@hotmail.com,"{'accounts': [657218, 517824, 880595, 278669, 380304, 688134], 'address': '335 Lewis Land\nLake Johnburgh, RI 57620', 'birthdate': {'$date': '1994-04-25T14:59:48.000Z'}, 'tier_and_details': {}}"
{'$oid': '5ca4bbcea2dd94ee58162b00'},marccolon,Sarah Lowery,gomeztonya@hotmail.com,"{'accounts': [85228, 404845, 155327], 'address': '6599 Martin Roads Apt. 624\nKatherinestad, SD 60790', 'birthdate': {'$date': '1983-04-27T14:48:21.000Z'}, 'tier_and_details': {'35f3364970214bb786b79b592b4d39b8': {'active': True, 'benefits': ['dedicated account representative', 'travel insurance'], 'id': '35f3364970214bb786b79b592b4d39b8', 'tier': 'Bronze'}}}"


## 3. Normalize a collection into multiple tables 
In the following example a collection of MongoDB is normalized into two different tables on SingleStore.

In [88]:
%%sql
DROP DATABASE IF EXISTS sample_airbnb;

In [89]:
%%sql
CREATE DATABASE sample_airbnb;

In [90]:
%%sql
use sample_airbnb;
CREATE LINK source_listingsAndReviews as MONGODB
CONFIG '{"mongodb.hosts":"ac-t7n47to-shard-00-00.tfutgo0.mongodb.net:27017,ac-t7n47to-shard-00-01.tfutgo0.mongodb.net:27017,ac-t7n47to-shard-00-02.tfutgo0.mongodb.net:27017",
"collection.include.list": "sample_airbnb.*",
"mongodb.ssl.enabled":"true",
"mongodb.authsource":"admin",
"mongodb.members.auto.discover": "true"    
    }'
CREDENTIALS '{
    "mongodb.user":"mongo_sample_reader",
    "mongodb.password":"SingleStoreRocks27017"
    }'

In [91]:
%%sql
show LINKS on sample_airbnb;

Link,Type,Description
source_listingsAndReviews,MONGODB,


In [92]:
%%sql
CREATE TABLE `sample_airbnb`.`listings` (
  `_id` BSON NOT NULL,
  `name` text CHARACTER SET utf8 COLLATE utf8_general_ci,
  `access` text CHARACTER SET utf8 COLLATE utf8_general_ci,
  `accommodates` int(11) DEFAULT NULL,
  `_more` BSON NOT NULL,
  `$_id` as BSON_NORMALIZE_NO_ARRAY(`_id`) PERSISTED longblob,
  SHARD KEY `__SHARDKEY` (`$_id`),
  UNIQUE KEY `__PRIMARY` (`$_id`) USING HASH,
  SORT KEY `__UNORDERED` ()
) 

In [93]:
%%sql
CREATE TABLE `sample_airbnb`.`Reviews` (
  `listingid` BSON NOT NULL,
  `review_scores_accuracy` int(11) DEFAULT NULL,
  `review_scores_cleanliness` int(11) DEFAULT NULL,
  `review_scores_rating` text CHARACTER SET utf8 COLLATE utf8_general_ci,
  `$listingid` as BSON_NORMALIZE_NO_ARRAY(`listingid`) PERSISTED longblob,
  SHARD KEY `__SHARDKEY` (`$listingid`),
  UNIQUE KEY `__PRIMARY` (`$listingid`) USING HASH,
  SORT KEY `__UNORDERED` ()
) 

In [95]:
%%sql
CREATE OR REPLACE PROCEDURE `sample_airbnb`.`listingsAndReviews_apply_changes`(changes query(`__operation` int(11) NOT NULL, `_id` longblob NOT NULL, `_more` longblob NOT NULL))
RETURNS void AS 
DECLARE rowsDeleted INT;
BEGIN 
    
REPLACE INTO  `sample_airbnb`.`listings` SELECT `_id`:>BSON AS `_id`, BSON_EXTRACT_STRING(`_more`,'name') AS `name`, BSON_EXTRACT_STRING(`_more`,'access') AS `access`,
BSON_EXTRACT_BIGINT(`_more`,'accommodates') AS `accommodates`, BSON_EXCLUDE_MASK(`_more`,'{"_id": 1,"name": 1,"review_scores": 1,"access" : 1, "accommodates" : 1}') AS `_more`
FROM changes WHERE __operation != 1;

REPLACE INTO  `sample_airbnb`.`Reviews` SELECT `_id`:>BSON AS `listingid`, BSON_EXTRACT_BIGINT(`_more`,'review_scores','review_scores_accuracy') AS `review_scores_accuracy`,
BSON_EXTRACT_BIGINT(`_more`,'review_scores','review_scores_cleanliness') AS `review_scores_cleanliness`, BSON_EXTRACT_BIGINT(`_more`,'review_scores','review_scores_rating') AS `review_scores_rating`
FROM changes WHERE __operation != 1;

SELECT count(*) INTO rowsDeleted FROM changes WHERE changes.__operation = 1;
IF rowsDeleted > 0 THEN
DELETE dest FROM `sample_airbnb`.`listings` AS dest INNER JOIN changes ON dest.`$_id` = BSON_NORMALIZE_NO_ARRAY(changes.`_id`) WHERE changes.__operation = 1; 
DELETE dest FROM `sample_airbnb`.`Reviews` AS dest INNER JOIN changes ON dest.`$listingid` = BSON_NORMALIZE_NO_ARRAY(changes.`_id`) WHERE changes.__operation = 1; 
END IF;

END;

In [96]:
%%sql
CREATE AGGREGATOR PIPELINE `sample_airbnb`.`listingsAndReviews`
AS LOAD DATA LINK source_listingsAndReviews 'sample_airbnb.listingsAndReviews'
BATCH_INTERVAL 2500
MAX_PARTITIONS_PER_BATCH 1
DISABLE OFFSETS METADATA GC
REPLACE
KEY(`_id`)
INTO PROCEDURE `sample_airbnb`.`listingsAndReviews_apply_changes`
FORMAT AVRO
(
    __operation <- `__operation`,
    _id <- `payload`::`_id`,
    _more <- `payload`::`_more`
)

In [97]:
%%sql
Use sample_airbnb;
START ALL PIPELINES;

In [98]:
%%sql
using sample_airbnb show tables;

Tables_in_sample_airbnb
Reviews
listings


In [100]:
%%sql
Select _id:>JSON ,name, access, accommodates from sample_airbnb.listings limit 10;

_id:>JSON,name,access,accommodates
10084023,City center private room with bed,"Living Room , Kitchen and Toilet, All cooking equipment can be used too",1
10527243,Tropical Jungle Oasis,,4
10992286,Holoholo Inn: Rain Forest (Priv-2),Guest access to: Common Room with TV; Full Kitchen; Free WiFi.,2
1176693,BEST REVIEWS*BEST MALLS*SAFE STAY*DIMSUM*CWB*MTR,"The entire apartment with all amenities is accessible for my guests. All closets are empty but full of hangers, you will find toiletries and hair dryers in the bathrooms. the kitchen is fully equipped with pots, pans, china, glasses and cutlery. Basics like tea, coffee, water, milk, and capsules for the Nespresso Machine are provided to make sure your arrival will be relaxed. Detergents for washing machine and dish washer are provided for a stay of 3 nights. My stand-by guest manager will happily provide more supplies on request for small additional costs.",8
12552675,The Porto Concierge - White Martin,Guests have access to all the amenities of the apartment.,3
11007058,Alcam Colón 42 Apartment,,4
12183522,so homey and spacious and comfy room!,"Easy access to public transport, main road is just across to where building is located. Car park is also available, on hourly to day rate.",3
11567997,The Local Nook // Cozy Kona Hale,"The main entrance is to the right through the white gate in front of your parking stall. We ask that guest please enter this way to give our parents and yourselves some extra privacy, though there is another entry door to the left in front of the main house entrance.",2
13488308,Spacious & Cosy 1BR apt in Gramercy with balcony!,"Guests are provided complete privacy for the duration of their stay, however, in order to ensure a superior guest experience, I am available to address your needs 24 hours a day 7 days a week. If you have a question or concern, I encourage you to contact me by message. I am dedicated to making your stay the best possible, and I am happy to address your needs, whatever they may be.",2
10359729,,,4


In [101]:
%%sql
Select listingid:>JSON, review_scores_accuracy,review_scores_cleanliness, review_scores_rating from sample_airbnb.Reviews limit 10;

listingid:>JSON,review_scores_accuracy,review_scores_cleanliness,review_scores_rating
10084023,10,8,92
10527243,10,10,96
11567997,10,10,96
10992286,10,10,98
12552675,9,9,90
12183522,10,10,99
1176693,9,9,83
13488308,10,9,97
11007058,6,8,80
17001994,10,9,96
