An implementation using Kafka as a Service
The ai-kafka library does the following actions:
- monitors website availability over the network,
- produces metrics about the website availability,
- persists the events passing through an Aiven Kafka instance into an Aiven PostgreSQL database.
For this, it implements a Kafka producer which periodically checks the target websites and sends the check results to a Kafka topic. A Kafka consumer storing the data to an Aiven PostgreSQL database. For this local setup, these components run in the same machine.
The website checker should perform the checks periodically and collect the HTTP response time, error code returned, as well as optionally checking the returned page contents for a regexp pattern that is expected to be found on the page.
For the database writer we expect to see a solution that records the check results into one or more database tables and could handle a reasonable amount of checks performed over a longer period of time.
- sudo apt-get install postgresql
- sudo apt-get install libpq-dev
- sudo pip3 install psycopg2
- sudo pip3 install kafka-python
- sudo pip3 install python-requests
$ git clone https://github.com/mmnelemane/ai-kafka
$ cd ai-kafka
$ sudo python3 setup.py install
A binary aikafka
is installed in /usr/local/bin/
on the host.
-
Ensure that Aiven Kafka and Aiven PostgreSQL services are running.
-
Download and store the certificates for Aiven Kafka and PostgreSQL services. The files are expected to be stored in the following directory structure:
certs/ kafka/ ca.pem service.key service.cert pgsql/ ca.pem
-
Update the Config file with the details about the Aiven services. A sample
ai-kafka.conf.sample
is found in the package. Refer to this file for help on filling up the config. -
Write an input file in the format of `weburls.json' listing all the URLs and a searchable text.
-
Start aikafka application as:
$ aikafka --configfile <configfile_name> --inputfile <inputfile_name>
- To check if the configuration has been read properly
$ aikafka --configfile <configfile_name> --inputfile <inputfile_name> --printconfig
- To print help text for the application
$ aikafka --help
- Shortcuts for options
"--configfile" == "-c"
"--inputfile" == "-i"
"--printconfig" == "-p"
"--help" == "-h"
- The recorded website information can be obtained by logging into the pgsql database
defaultdb
The entries are recorded inweb_metrics
table which can be fetched with:SELECT * from web_metrics;
-
Create an
events
table which will record changes in theweb_metrics
. The entries in theevents
table could be done through a trigger inweb_metrics
table. -
A cleaner way for user to fetch database tables
-
A completed Debian or RPM package (.deb or .rpm) to install on several platforms.
-
To be able to run aikafka as a systemd service daemon.
-
An
aikafkactl
API that can interact with a systemd daemon to provide functionalities for the user -
A way to clear old entries (e.g: older than a few days) to ensure scalability.
-
An improved logging mechanism with multithreading to help troubleshooting.
-
Complete and improve tests.
- Basic parts of the producer and consumer code were taken from:
- Basic parts of the postgresql client was taken from:
- Several stack overflow and python blogs were used to learn about specific usage syntax