Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce network bandwidth with log variables #95025

Closed
nathanrstacey opened this issue Apr 4, 2023 · 3 comments
Closed

Reduce network bandwidth with log variables #95025

nathanrstacey opened this issue Apr 4, 2023 · 3 comments
Labels
:Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement feedback_needed Team:Distributed Meta label for distributed team

Comments

@nathanrstacey
Copy link

nathanrstacey commented Apr 4, 2023

Description

Issue to resolve:
Cloud ingestion is expensive! I think we can reduce our ESS ingest costs significantly with a little work. My first test reduced log ingest by 80%!
This will also help reduce traffic in buildings/datacenters with many logs being created and a single pipe out of the building

How:
Most logs consist of the same data, only a few variables are changed between logs. Timestamp, CPU, packets sent, alert name, etc. All other data in the log is always the same

What if we only send those variables across the network. This can be done in many ways, below is a method. Not that it is the best method but simply to show what I am thinking

  1. Let Elasticsearch auto-define templates of logs that it sees over and over again
  2. When Elasticsearch sees a new template, have it define the template and give the template an ID
  3. Have Elasticsearch send this template to the Agent/Beat sending the logs
  4. When a new log is generated that matches a template, only send over the variables with the template ID
  5. When Elasticsearch sees this new variable-log, it enters the variables into the template to create a true raw log. It will look exactly like it should for Elasticsearch to process it, the only difference is that only the necessary data was sent across the network

Attached is an example:
The "Raw JSON" is 5000B and the "Variable JSON" is 612 Bytes or 87% smaller
The pdf is the "Raw JSON" file but with variable lines highlighted to show what was and what was not part of the variable JSON
The Variable JSON starts with a line defining the template it is utilizing. Everything else in the JSON are variables. I imagine that I missed a few lines required to really make this work. I am sure that my above process may be faulty but the general idea is here. The general idea is to send over the changing variables in logs, not the whole log

rawauditbeat.pdf

variablesjson.txt
rawauditbeat.txt

@nathanrstacey nathanrstacey added >enhancement needs:triage Requires assignment of a team area label labels Apr 4, 2023
@DaveCTurner DaveCTurner added :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed needs:triage Requires assignment of a team area label labels Apr 5, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label Apr 5, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Apr 5, 2023

In principle I think this could work but it sounds awfully complex to me. Moreover I would expect that Content-Encoding: gzip would achieve roughly the same effect, and that works today.

Relates #94319.

(edit: I said Transfer-Encoding but I meant Content-Encoding)

@DaveCTurner
Copy link
Contributor

Closing this as there was no further feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement feedback_needed Team:Distributed Meta label for distributed team
Projects
None yet
Development

No branches or pull requests

3 participants