## Demo Debezuim

### Шаг 1

На БД `postgres` выполняем
```sql
select * from pg_stat_replication;
```

Видим, что ответ пустой; Значит, сейчас репликации нет;

Далее выполняем
```sql
create table public.customers (
    id int,
    name text
);

alter table public.customers replica identity full;

insert into public.customers (id, name)
values
    (1, 'ilya'), (2, 'kate'), (3, 'ivan')
;
```

Это будут наши стартовые значения

### Шаг 2

Настройка debezium

In [1]:
!rm connector.json

rm: connector.json: No such file or directory


In [2]:
import json

config = {
  "name": "pg-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "plugin.name": "pgoutput",
    
    "database.hostname": "postgres",
    "database.port": "5432",
    "database.user": "postgres",
    "database.password": "postgres",
    "database.dbname" : "postgres",
    "database.server.name": "pg-dev",
    
    "table.include.list": "public.(.*)",
    "heartbeat.interval.ms": "5000",
    "slot.name": "dbname_debezium",
    "publication.name": "dbname_publication",
    
    "transforms": "AddPrefix",
    "transforms.AddPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter",
    "transforms.AddPrefix.regex": "pg-dev.public.(.*)",
    "transforms.AddPrefix.replacement": "data.cdc.dbname",
      
    "topic.creation.default.cleanup.policy": "delete",
    "topic.creation.default.partitions": "1",
    "topic.creation.default.replication.factor": "1",
    "topic.creation.default.retention.ms": "604800000",
    "topic.creation.enable": "true",
    "topic.prefix": "postgres"
  }
}

with open('connector.json', 'w') as f:
    json.dump(config, f)

In [3]:
!curl -X POST --location "http://localhost:8083/connectors" -H "Content-Type: application/json" -H "Accept: application/json" -d @connector.json

{"name":"pg-connector","config":{"connector.class":"io.debezium.connector.postgresql.PostgresConnector","plugin.name":"pgoutput","database.hostname":"postgres","database.port":"5432","database.user":"postgres","database.password":"postgres","database.dbname":"postgres","database.server.name":"pg-dev","table.include.list":"public.(.*)","heartbeat.interval.ms":"5000","slot.name":"dbname_debezium","publication.name":"dbname_publication","transforms":"AddPrefix","transforms.AddPrefix.type":"org.apache.kafka.connect.transforms.RegexRouter","transforms.AddPrefix.regex":"pg-dev.public.(.*)","transforms.AddPrefix.replacement":"data.cdc.dbname","topic.creation.default.cleanup.policy":"delete","topic.creation.default.partitions":"1","topic.creation.default.replication.factor":"1","topic.creation.default.retention.ms":"604800000","topic.creation.enable":"true","topic.prefix":"postgres","name":"pg-connector"},"tasks":[],"type":"source"}

Подключение создалось;<br>
Чтобы дополнительно в этом убедиться, можем посмотреть в UI: http://localhost:8080

### Шаг 3

Смотрим, что топик в kafka создался

In [4]:
!rm clusters.json

In [5]:
# Нас интересует cluster_id

!curl http://localhost:8082/v3/clusters -o clusters.json

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1001    0  1001    0     0   1284      0 --:--:-- --:--:-- --:--:--  1296


In [6]:
with open('clusters.json', 'r') as f:
    clusters = json.load(f)
clusters

{'kind': 'KafkaClusterList',
 'metadata': {'self': 'http://rest-proxy:8082/v3/clusters', 'next': None},
 'data': [{'kind': 'KafkaCluster',
   'metadata': {'self': 'http://rest-proxy:8082/v3/clusters/TCJFECM_Stio2wfa268A5g',
    'resource_name': 'crn:///kafka=TCJFECM_Stio2wfa268A5g'},
   'cluster_id': 'TCJFECM_Stio2wfa268A5g',
   'controller': {'related': 'http://rest-proxy:8082/v3/clusters/TCJFECM_Stio2wfa268A5g/brokers/1'},
   'acls': {'related': 'http://rest-proxy:8082/v3/clusters/TCJFECM_Stio2wfa268A5g/acls'},
   'brokers': {'related': 'http://rest-proxy:8082/v3/clusters/TCJFECM_Stio2wfa268A5g/brokers'},
   'broker_configs': {'related': 'http://rest-proxy:8082/v3/clusters/TCJFECM_Stio2wfa268A5g/broker-configs'},
   'consumer_groups': {'related': 'http://rest-proxy:8082/v3/clusters/TCJFECM_Stio2wfa268A5g/consumer-groups'},
   'topics': {'related': 'http://rest-proxy:8082/v3/clusters/TCJFECM_Stio2wfa268A5g/topics'},
   'partition_reassignments': {'related': 'http://rest-proxy:8082/v3/

Запоминаем cluster_id = `TCJFECM_Stio2wfa268A5g`

In [7]:
!rm topics.json

In [8]:
!curl http://localhost:8082/v3/clusters/TCJFECM_Stio2wfa268A5g/topics -o topics.json

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4682    0  4682    0     0  25658      0 --:--:-- --:--:-- --:--:-- 26303


In [9]:
with open('topics.json', 'r') as f:
    topics = json.load(f)
topics

{'kind': 'KafkaTopicList',
 'metadata': {'self': 'http://rest-proxy:8082/v3/clusters/TCJFECM_Stio2wfa268A5g/topics',
  'next': None},
 'data': [{'kind': 'KafkaTopic',
   'metadata': {'self': 'http://rest-proxy:8082/v3/clusters/TCJFECM_Stio2wfa268A5g/topics/__debezium-heartbeat.postgres',
    'resource_name': 'crn:///kafka=TCJFECM_Stio2wfa268A5g/topic=__debezium-heartbeat.postgres'},
   'cluster_id': 'TCJFECM_Stio2wfa268A5g',
   'topic_name': '__debezium-heartbeat.postgres',
   'is_internal': False,
   'replication_factor': 1,
   'partitions_count': 1,
   'partitions': {'related': 'http://rest-proxy:8082/v3/clusters/TCJFECM_Stio2wfa268A5g/topics/__debezium-heartbeat.postgres/partitions'},
   'configs': {'related': 'http://rest-proxy:8082/v3/clusters/TCJFECM_Stio2wfa268A5g/topics/__debezium-heartbeat.postgres/configs'},
   'partition_reassignments': {'related': 'http://rest-proxy:8082/v3/clusters/TCJFECM_Stio2wfa268A5g/topics/__debezium-heartbeat.postgres/partitions/-/reassignment'},
   

In [10]:
for i in topics['data']:
    print(i['topic_name'])

__debezium-heartbeat.postgres
_schemas
connect_configs
connect_offsets
connect_statuses
postgres.public.customers


### Шаг 3

Видим, что при init для таблицы, в которой у нас уже были строки, уже создался топик<br>
Давайте посмотрим, что туда записалось

In [25]:
import json
from kafka import KafkaConsumer

KAFKA_HOST = "localhost"
KAFKA_CONSUMER_GROUP = "backend"

def read_single_message(topic=None):
    assert topic is not None, 'You must specify topic name'
    
    consumer = KafkaConsumer(
        bootstrap_servers=f"{KAFKA_HOST}:9092",
        value_deserializer=lambda v: v if v is None else json.loads( v.decode("utf-8") ),
        auto_offset_reset="earliest",
        group_id=KAFKA_CONSUMER_GROUP
    )
    consumer.subscribe(topics=topic)

    try:
        for message in consumer:
            value = message.value
            return value
    except Exception as e:
            print("Closing consumer due to error\n")
            consumer.close()
            raise e
    finally:
        print("Closing consumer due to finish\n")
        consumer.close()

In [15]:
value = read_single_message('postgres.public.customers')
value

Closing consumer due to finish



{'schema': {'type': 'struct',
  'fields': [{'type': 'struct',
    'fields': [{'type': 'int32', 'optional': True, 'field': 'id'},
     {'type': 'string', 'optional': True, 'field': 'name'}],
    'optional': True,
    'name': 'postgres.public.customers.Value',
    'field': 'before'},
   {'type': 'struct',
    'fields': [{'type': 'int32', 'optional': True, 'field': 'id'},
     {'type': 'string', 'optional': True, 'field': 'name'}],
    'optional': True,
    'name': 'postgres.public.customers.Value',
    'field': 'after'},
   {'type': 'struct',
    'fields': [{'type': 'string', 'optional': False, 'field': 'version'},
     {'type': 'string', 'optional': False, 'field': 'connector'},
     {'type': 'string', 'optional': False, 'field': 'name'},
     {'type': 'int64', 'optional': False, 'field': 'ts_ms'},
     {'type': 'string',
      'optional': True,
      'name': 'io.debezium.data.Enum',
      'version': 1,
      'parameters': {'allowed': 'true,last,false,incremental'},
      'default': 'fa

Что у нас тут получилось:
1) Есть схема сообщения
2) Так как все строки новые, то `before` содержит `None`
3) В то же время, `after` содержит значения в строке

Посмотрим, что происходит при insert
Выполним на БД следующий скрипт
```sql
insert into public.customers (id, name)
values
    (4, 'david')
;
```

In [16]:
for i in range(2):
    read_single_message('postgres.public.customers')

Closing consumer due to finish

Closing consumer due to finish



In [18]:
value = read_single_message('postgres.public.customers')
value['payload']

{'before': None,
 'after': {'id': 4, 'name': 'david'},
 'source': {'version': '2.2.0.Alpha3',
  'connector': 'postgresql',
  'name': 'postgres',
  'ts_ms': 1697708537634,
  'snapshot': 'false',
  'db': 'postgres',
  'sequence': '[null,"22361320"]',
  'schema': 'public',
  'table': 'customers',
  'txId': 748,
  'lsn': 22361320,
  'xmin': None},
 'op': 'c',
 'ts_ms': 1697708538002,
 'transaction': None}

Получили все то же самое<br>
Теперь посмотрим на update
```sql
update public.customers
set name = 'diana'
where id = 4;
```

In [19]:
value = read_single_message('postgres.public.customers')
value['payload']

Closing consumer due to finish



{'before': {'id': 4, 'name': 'david'},
 'after': {'id': 4, 'name': 'diana'},
 'source': {'version': '2.2.0.Alpha3',
  'connector': 'postgresql',
  'name': 'postgres',
  'ts_ms': 1697708613049,
  'snapshot': 'false',
  'db': 'postgres',
  'sequence': '["22361640","22361696"]',
  'schema': 'public',
  'table': 'customers',
  'txId': 749,
  'lsn': 22361696,
  'xmin': None},
 'op': 'u',
 'ts_ms': 1697708613431,
 'transaction': None}

Что мы имеем:
1) `before` содержит старые значения в строке
2) `after` содержит новые значения в строке

Посмотрим, что произойдет при delete
```sql
delete
from public.customers
where id = 4;
```

In [26]:
value = read_single_message('postgres.public.customers')
value['payload']

Closing consumer due to finish



{'before': {'id': 4, 'name': 'diana'},
 'after': None,
 'source': {'version': '2.2.0.Alpha3',
  'connector': 'postgresql',
  'name': 'postgres',
  'ts_ms': 1697708712610,
  'snapshot': 'false',
  'db': 'postgres',
  'sequence': '["22361840","22362128"]',
  'schema': 'public',
  'table': 'customers',
  'txId': 750,
  'lsn': 22362128,
  'xmin': None},
 'op': 'd',
 'ts_ms': 1697708712810,
 'transaction': None}

Что мы имеем:
1) `before` содержит старые значения в строке
2) `after` содержит `None`

<b>ВАЖНО:</b> Debezium [не умеет](https://github.com/MaterializeInc/materialize/issues/6596#issuecomment-963479859) в обработку truncate

### Шаг 4

Посмотрим, что произойдет при создании новой таблицы
```sql
create table public.stores (
    id int,
    city text
);
ALTER TABLE public.stores REPLICA IDENTITY FULL;
```

In [30]:
!rm topics.json

In [33]:
!curl http://localhost:8082/v3/clusters/TCJFECM_Stio2wfa268A5g/topics -o topics.json

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  4682    0  4682    0     0   203k      0 --:--:-- --:--:-- --:--:--  240k


In [34]:
with open('topics.json', 'r') as f:
    topics = json.load(f)

for i in topics['data']:
    print(i['topic_name'])

__debezium-heartbeat.postgres
_schemas
connect_configs
connect_offsets
connect_statuses
postgres.public.customers


Видим, что топика нет<br>
Все потому, что таблица пустая<br>
Добавим в нее строк
```sql
insert into public.stores (id, city)
values
    (1, 'Moscow'),
    (2, 'London'),
    (3, 'Milano')
;
```

In [36]:
!rm topics.json
!curl http://localhost:8082/v3/clusters/TCJFECM_Stio2wfa268A5g/topics -o topics.json
        
with open('topics.json', 'r') as f:
    topics = json.load(f)

for i in topics['data']:
    print(i['topic_name'])

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  5464    0  5464    0     0   172k      0 --:--:-- --:--:-- --:--:--  222k
__debezium-heartbeat.postgres
_schemas
connect_configs
connect_offsets
connect_statuses
postgres.public.customers
postgres.public.stores


Теперь топик есть, и его можно почитать

In [39]:
for i in range(3):
    value = read_single_message('postgres.public.stores')
    print(value['payload'], end='\n---\n')

Closing consumer due to finish

{'before': None, 'after': {'id': 1, 'city': 'Moscow'}, 'source': {'version': '2.2.0.Alpha3', 'connector': 'postgresql', 'name': 'postgres', 'ts_ms': 1697709276568, 'snapshot': 'false', 'db': 'postgres', 'sequence': '["22362848","22481768"]', 'schema': 'public', 'table': 'stores', 'txId': 756, 'lsn': 22481768, 'xmin': None}, 'op': 'c', 'ts_ms': 1697709276645, 'transaction': None}
---
Closing consumer due to finish

{'before': None, 'after': {'id': 2, 'city': 'London'}, 'source': {'version': '2.2.0.Alpha3', 'connector': 'postgresql', 'name': 'postgres', 'ts_ms': 1697709276568, 'snapshot': 'false', 'db': 'postgres', 'sequence': '["22362848","22481840"]', 'schema': 'public', 'table': 'stores', 'txId': 756, 'lsn': 22481840, 'xmin': None}, 'op': 'c', 'ts_ms': 1697709276646, 'transaction': None}
---
Closing consumer due to finish

{'before': None, 'after': {'id': 3, 'city': 'Milano'}, 'source': {'version': '2.2.0.Alpha3', 'connector': 'postgresql', 'name': 'pos