You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Although the ELS API currently does allow a count of items to be specified along with a timestamp start and end rage, it does not return any type of header returning how many logs items in total are within given time range. Due to this, a large number of logs may be downloaded which can become quite heavy for in-memory processing.
As a solution, the logs should initially be saved to a gzip file and then read from this file into smaller chunks.
The text was updated successfully, but these errors were encountered:
The previously proposed solution was applied although still wasn't effective enough in terms of total time for downloading/processing.
Description of the Issue:
Current log file download times can exceed 5 minutes and processing time can go up to 10 minutes.
Each ticker itteration is currently defaulted to 30 minutes but unfortunately the ticket for the next itteration doesn't start until the current processing is completed, so in this case it's (30 + 5 + 10) = 45 minutes.
Now keep on adding this time over the period of a day and within 24 hours you can easily end up with a 4 to 5 hour delay in processing time, instead of a more reasonable 30 minutes. These delays will only grow as the traffic increases and the log files increase in size.
Proposed Solution:
Once the period has passed for beat's ticker, two functions would be called asynchronously to perform the following
Download ELS Log File: This function creates X goroutine(s) (# yet to be determined, maybe a pool) to each download the log files (sequentially/in-parallel) parts and place them on log_files_ready channel once completed.
If 2 minute segments, then 15 files total for 30 minutes
If 5 minute segements, then 6 files total for 30 minutes
Process/Publish Individual Log Entries: Have another function (also via a goroutine) process these files asynchronously from the log_files_ready channel as they are ready
X goroutine(s) (# yet to be determined, maybe a pool) are created so that each can open the log file and then send of its processed events via PublishEvent
This may not be the absolute best solution, but it should be more affective than the current one. If more optimizations need to be done later, I'll deal with it then.
Although the ELS API currently does allow a
count
of items to be specified along with a timestamp start and end rage, it does not return any type of header returning how many logs items in total are within given time range. Due to this, a large number of logs may be downloaded which can become quite heavy for in-memory processing.As a solution, the logs should initially be saved to a gzip file and then read from this file into smaller chunks.
The text was updated successfully, but these errors were encountered: