PadoGrid | Catalogs | Manual | FAQ | Releases | Templates | Pods | Kubernetes | Docker | Apps | Quick Start
This bundle integrates Geode with Debezium for ingesting initial data and CDC records from MySQL into a Geode cluster via a Kafka sink connector included in the padogrid
distribution. It supports inserts, updates and deletes.
install_bundle -download bundle-geode-1-docker-debezium_kafka
❗ If you are running this bundle on WSL, make sure your workspace is on a shared folder. The Docker volume it creates will not be visible outside of WSL otherwise.
This use case ingests data changes made in the MySQL database into a Geode cluster via Kafka connectors: the Debezium MySQL source connector and the geode-addon
Debezium sink connector.
- Docker
- Maven 3.x
- Geode 1.x
- jq
The demo follows the Debezium Tutorial steps shown in the link below.
All the commands provided in the tutorial are wrapped in the scripts found in the bin_sh
directory. We'll use these scripts to simplify the demo.
✏️ This bundle builds the demo enviroment based on the Geode versions in your workspace. Make sure your workspace has been configured with the desired versions before building the demo environment.
First, change your cluster context to a Geode cluster. This is required in order to configure the Geode Docker containers.
# Create a Geode cluster if it does not already exist.
create_cluster -product geode
# Switch context
switch_cluster mygeode
Now, build the demo by running the build_app
command as shown below. This command compiles and packages the PdxSerializable
data (domain) classes found in the source directory src
. It also copies the Geode and geode-addon-core
jar files to the Docker container mounted volume in the padogrid
directory so that the Geode Debezium Kafka connector can include them in its class path.
cd_docker debezium_kafka/bin_sh
./build_app
Upon successful build, the padogrid
directory should have jar files similar to the following:
cd_docker debezium_kafka
tree padogrid
padogrid/
│ ├── etc
│ │ ├── cache.xml
│ │ ├── client-cache.xml
│ │ └── client-gemfire.properties
├── lib
│ ├── ...
│ ├── geode-addon-core-1.0.0.jar
│ ├── geode-addon-core-1.0.0.jar
│ ├── ...
│ ├── padogrid-common-1.0.0.jar
│ ├── padogrid-tools-1.0.0.jar
│ └── ...
├── log
└── plugins
├── geode-addon-core-1.0.0-tests.jar
└── geode-addon-debezium_kafka-1.0.0.jar
The create_docker
command creates a Docker Compose environment for running PadoGrid-supported products such as Geode. See the usage by specifying the -?
option.
create_docker -?
Let's create a Geode cluster to run on Docker containers.
# First, create 'my_network' which we will use for all containers
docker network create my_network
# Create a Geode cluster that joins the network we just created
create_docker -product geode -cluster geode -network my_network
cd_docker geode
The Kafka Connect container listens on Kafka streams for database updates and converts them to Geode objects before updating the Geode cluster. Take a look at the client-cache.xml
file which is loaded by the Kafka Connect container to connect to the Geode cluster. As you can see from below, the locator host is set to geode-locator-1
which is the host name of the locator set by Docker Compose.
cd_docker debezium_kafka
cat padogrid/etc/client-cache.xml
Output:
<client-cache ...>
...
<pool name="serverPool">
<locator host="geode-locator-1" port="10334" />
</pool>
...
</client-cache>
Copy the cache.xml
and demo data jar files from the debezium_kafaka
directory to the Geode Docker cluster directory as follows.
cd_docker geode
# Copy cache.xml
cp $PADOGRID_WORKSPACE/docker/debezium_kafka/padogrid/etc/cache.xml padogrid/etc/
# Copy debezium-demo-data-1.0.0.jar
cp $PADOGRID_WORKSPACE/docker/debezium_kafka/padogrid/plugins/geode-addon-debezium_kafka-1.0.0.jar padogrid/plugins/
The cache.xml
file defines the inventory
regions that we will need for our demo and thegeode-addon-debezium_kafka-1.0.0.jar
file is needed for query services.
There are numerous Docker containers to this demo. We'll first start the Geode cluster containers and then proceed with the Debezium containers. By default, the scripts provided run the containers in the foreground so that you can view the log events. You will need to launch a total of eight (8) terminals. If you have a screen splitter such as Windows Terminal, it will make things easier.
You can also run some of the scripts in the background by including the '-d' option. These scripts are mentioned below.
Start the geode
cluster containers from a new terminal.
cd_docker geode
docker compose up
Launch six (6) terminals and run each script from their own terminal as shown below. Each script must be run from their own terminal as they will block and display log messages.
cd_docker debezium_kafka/bin_sh
# 1. Start Zookeeper (include '-d' to run it in the background)
./start_zookeeper
# 2. Start Kafka (include '-d' to run it in the background)
./start_kafka
# 3. Start MySQL database (include '-d' to run it in the background)
./start_mysql
# 4. Start Kafka Connect (include '-d' to run it in the background)
./start_kafka_connect
# 5. Start topic watcher
./watch_topic_customers
# 6. Start MySQL CLI
./start_mysql_cli
There are two (2) Kafka connectors that we must register. The MySQL connector is provided by Debezium and the Geode connector is part of the PadoGrid distribution. From a new terminal run the following.
cd_docker debezium_kafka/bin_sh
./register_mysql_connector
./register_debezium_geode_connector
# Check status
curl -Ss -H "Accept:application/json" localhost:8083/ | jq
# List registered connectors
curl -Ss -H "Accept:application/json" localhost:8083/connectors/ | jq
The last command should display the inventory connector that we registered previously.
[
"debezium-geode-sink",
"inventory-connector"
]
Using the MySQL CLI, you can change table contents. The changes you make will be captured in the form of change events by the Debezium source connector. The Geode sink connector in turn receives the change events, transforms them into data objects and updates (or deletes) the assigned map, i.e., inventory/customers
.
use inventory;
SELECT * FROM customers;
UPDATE customers SET first_name='Anne Marie' WHERE id=1004;
-- Delete - First, delete the row in addresses that references the foreign key
DELETE FROM addresses WHERE customer_id=1004;
DELETE FROM customers WHERE id=1004;
INSERT INTO customers VALUES (default, "Sarah", "Thompson", "kitt@acme.com");
INSERT INTO customers VALUES (default, "Kenneth", "Anderson", "kander@acme.com");
To view the map contents, run the read_cache
command as follows:
cd_docker debezium_kafka/bin_sh
./read_cache /inventory/customers
Output:
Region Values [/inventory/customers]:
[email=annek@noanswer.org, firstName=Anne Marie, id=1004, lastName=Kretchmar]
[email=gbailey@foobar.com, firstName=George, id=1002, lastName=Bailey]
[email=sally.thomas@acme.com, firstName=Sally, id=1001, lastName=Thomas]
[email=ed@walker.com, firstName=Edward, id=1003, lastName=Walker]
You can also use gfsh
to browse region contents.
docker container exec -it geode-locator-1 gfsh
Connect to the geode
cluster and execute the query as shown below.
gfsh>connect --locator=localhost[10334]
gfsh>query --query="select * from /inventory/customers order by id"
Output:
Result : true
Limit : 100
Rows : 4
id | firstName | lastName | email
---- | ------------ | ----------- | -----------------------
1001 | "Sally" | "Thomas" | "sally.thomas@acme.com"
1002 | "George" | "Bailey" | "gbailey@foobar.com"
1003 | "Edward" | "Walker" | "ed@walker.com"
1004 | "Anne Marie" | "Kretchmar" | "annek@noanswer.org"
Pulse URL: http://localhost:7070/pulse
# Shutdown Debezium containers
cd_docker debezium_kafka/bin_sh
./cleanup
# Shutdown Geode containers
cd_docker geode
docker compose down
# Remove network
docker network rm my_network
# Prune all stopped containers
docker container prune
PadoGrid | Catalogs | Manual | FAQ | Releases | Templates | Pods | Kubernetes | Docker | Apps | Quick Start