Full stack with lots of features written in bash
This is still in testing! I am not a professional coder so expect some typos and not-efficient approaches so don't blame me :)
Features to be added
- Alpine Linux support
- ÈS security option (optional)
- Logstash (optional)
- ES Fleet Server + Agent (optional)
- Beats (optional)
It is currently tested on Debian/Ubuntu and Centos/Redhat. Alpine Linux support will be added too.
The main goal is to create a backend data-lake for various purposes for people who struggle and do not want involve themselves with complex choices and tunings. There is also an optional content to be enabled for Vectra Stream Platform. I also tried to add an automatic sizing calculation for Vectra Stream output so during the planning, the system will automatically calculate the required storage space and warn you if something is wrong.
I tried to make Elasticsearch Cluster deployment flexible and simple therefor I have decided to use docker environment with some additional flavors. The challenges during a standart ES Cluster deployment are
- System tunings
- Memory assignments
- Storage issues
- Planning
- Performance
- Security
Several components have been used in this stack.
- Fluent-Bit (as ETL)
- HAProxy (load balancing and reverse proxy)
- Dozzle (container monitoring)
- Portainer (container management)
- Kibana
- Elasticsearch Nodes
Because this is a closed environment, I did not enable security in ES side so everything is working on http but HAProxy is used for reverse proxying and I have implemented a http basic authentication with a self-signed certificate. It is possible to create CSR and sign it with your local CA and then import back into stack host or you can directly put your PEM certificate into it. You can also add your own authentication mechanism into haproxy.cfg like mTLS, Oath2, etc. since every conf file of each component will be placed into the same folder for easier management. You can even directly edit the file within this script. Keep in mind that every component's config file can be modified through menus during runtime.
When you call one of these URLs below, it will ask for user/password. Default credential:
username: admin
password: Elastic123!
HAProxy Stats (realtime metrics)
https://<HOST IP/FQDN>/stats
Dozzle (container monitoring)
https://<HOST IP/FQDN>/dozzle
Portainer
https://<HOST IP/FQDN>:1443
Kibana
https://<HOST IP/FQDN>
RAW TCP Listener (can be changed)
<HOST IP/FQDN>:9009
These directories below are selected for testing. The script will offer you to choose each storage destination individually so you can easily separate different data tiers into different mounts of OS by using local, SMB, NFS, etc.
You can easily access every config file from a single directory so if you want to change or add something, you just need to restart it container. By default after a config file modification, the script will ask for if you want to restart the related container.
I preferred Fluent-Bit rather than Logstash as it is really lightweight, very high performance with a very small memory footprint. For my needs, Fluent-Bit is more than enough but feel free to implement your own Logstash instance.
As you can see from the topology below, HAProxy will also provide load balancing across ES hot nodes. I did not prefer ingest nodes because I do not need pipeline operations. The HAProxy conf file will be automatically generated according to the number of hot nodes.
Keep in mind that, every config file will be dynamically generated according to you choices prior to deployment. These choices can be done through menus.
When you run the script, everything is quite straightforward and you can navigate yourself through menus. You may want to read the warnings in some menus and act accordingly. Unless you choose deployment option in Elasticsearch menu, you can change your deployment plan and the script will deploy automatically.
As I mentioned, this is a simple script (but more than 3K lines), its main purpose is to make life easier. I believe with no or just a few minor changes, you can even use it for your production environments too. Since Docker is a very solid environment and if you have a sufficient host with some cpu, mem and disk io/speed, it will handle lots of data and EPS rates.