notifier-service

The system process customer orders (multiple orders per customer), and each order has a current state that is changed in time by the progress of handling particular orders. Particular customers can register their webhook callbacks (notification handlers) to receive order state-changed notifications (one per customer). Order state-changed events come for processing via Kafka messages; one message contains one event.

Requirements

Order state-changed notifications within one order should be delivered in the same order as they were triggered (req1: ordering).
Potential performance degradation of one or more notification handlers must not affect the performance of the rest notification handlers (req2: unaffected performance).
Order state-changed notifications that failed to deliver must be attempted to re-deliver with respect to the retry policy (req3: retried delivery).
Horizontal scalability should mitigate (reduce) delivery latency for order state-changed notifications (req4: horizontal scalability).

Test Scenario for 'unaffected performance' requirement

Modes description:

Nightmare mode, which provides about 33% of probability of delay or exception (500 status code) will be returned.
Hell mode, which always returns an exception (500 status code).

Start configuration:

two notification handlers with nightmare mode, a possible delay of 500-1000 ms, and status code 500, with a probability about 33% of one of the assaults will be executed.
one notification handler will be looped by two modes, starting with a nightmare - which will last 1 minute, switch to hell - which will last 3 minutes, and return to the nightmare again, so these modes will change each other and again and again.
one notification handler without any modes, so it will work without expected delays or exceptions.

To be sure that external customer issues are not affecting the performance, we will start a few instances (described in [Start configuration section](#Start configuration)), and the additional catcher, which will start with nightmare but changed to hell, won't handle any requests, during this time other catchers will continue to work, and won't be affected. After the period when hell will be finished, everything should be backed to a normal state on catcher, others should continue to work as before. and we should have the same performance as before the whole experiment.

psql (cli client) on MAC

brew install libpq
brew link --force libpq

PGPASSWORD=password psql -U user -h localhost local

Kafka

http://localhost:8080

Prometheus

Check registered targets: http://localhost:9090/targets?search=&scrapePool

Grafana

http://localhost:3000

Thoughts

Kafka only provides a total order over messages within a partition, not between different partitions in a topic.
In order to decrease load on DB in kafka consumer (ingestor), better to use idempotent producers, otherwise there are will be attempts to ingest duplicated data that cause extra DB load.
Why consumer is implemented as it is: since it is a very sensitive part from the sake of performance, it MUST be implemented in the most efficient way to minimize any impact. And that means use batch processing for DB ingestion as awell as for Kafka message commits.

There is some materials about efficient batch inserts for PostgresSQL:
- How do I batch sql statements with package database/sql
- Choosing Between the pgx and database/sql Interfaces: The pgx interface is faster. Many PostgreSQL specific features such as LISTEN / NOTIFY and COPY are not available through the database/sql interface.
- Bulk INSERT in Postgres in GO using pgx
How does the customer simulator(catcher) choose what customer id to handle (self-registration): Here is a very hard requirement: the only one simulator must serve particular customer, that means that if there are more simulators are up and running then customers, some simulators should be in idle state (very similar to Kafka consumers, in case more consumer than partitions exists).

Implementation details: There are should be a record in customers table for each customer that is/was serving by a simulator.

Registration logic should take the first customer id that has unprocessed ( within the processing window) events, and cover the following cases:
- case1: insert customer id record if correspondent customer has never been served (customer table does not contain record for particular customer id);
- case2: update existing record if customer id was served before, but on whatever reason previously registered simulator does not handle events within processing window;
- case3: handle collisions to avoid situation when newly and previously registered agents will serve the same customer id.
Agent registration should be implemented as upsert query, that covers case1 and case2, but case3 should be covered separately. PostgresSQL supports kind of DO UPDATE SET, but usage of that syntax causes LOST UPDATE.
The better solution is Writeable CTE. In accordance with case1, the correspondent customer record may not exist, what makes it impossible to use Row Level Lock FOR UPDATE. As a result WCTE registration query can utilize 'SKIPP LOCKED' and it will fail with "duplicate key value violates unique constraint 'customers_pkey'" for all simulators except the first one that the same time will be trying to register themselves as servants for the same customer id, and those simulators will need to re-try registration till query will be succeeded with zero affected records (no suitable customer ids to serve) or with one affected record (successful registration).

TBD case3
??

TODO

Handle re-try messages in consumer, so far the entire batch of messages just ignored:
Ensure that consumer handles re-balancing properly
Optimize Queries to achieve better performance

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cmd		cmd
config		config
docker		docker
pkg		pkg
.dockerignore		.dockerignore
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
Notifier Service.pdf		Notifier Service.pdf
README.md		README.md
docker-compose-kafka.yml		docker-compose-kafka.yml
docker-compose-monitor.yml		docker-compose-monitor.yml
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
queries.sql		queries.sql
run.it		run.it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

notifier-service

Requirements

Test Scenario for 'unaffected performance' requirement

Modes description:

Start configuration:

psql (cli client) on MAC

Kafka

Prometheus

Grafana

Thoughts

TODO

Materials

About

Releases

Packages

Languages

shchiv/notifier-service

Folders and files

Latest commit

History

Repository files navigation

notifier-service

Requirements

Test Scenario for 'unaffected performance' requirement

Modes description:

Start configuration:

psql (cli client) on MAC

Kafka

Prometheus

Grafana

Thoughts

TODO

Materials

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages