Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not support large amounts of streams? #9

Closed
mfonsen opened this issue May 22, 2015 · 4 comments
Closed

Does not support large amounts of streams? #9

mfonsen opened this issue May 22, 2015 · 4 comments

Comments

@mfonsen
Copy link

mfonsen commented May 22, 2015

Hi,

Awslogs seems like a very promising tool. It's the first I've found that handles throttling.

I tried awslogs LOGGROUP ALL --watch with a log group that contains thousands of streams. In this case awslogs does not seem to return any results. It would seem like that awslogs tries to fetch all streams which will take a very long time. AWS Lambda is a service that floods log groups with streams.

Cloudwatch Logs API supports OrderBy parameter [1]. This would allow fetching of updated streams. Unfortunately Boto does not allow using of this parameter.

Would you have any suggestions on how to fine tune log stream fetching for this use case?

-mfonsen
[1] http://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_DescribeLogStreams.html

@jorgebastida
Copy link
Owner

When you call describe_log_streams from boto you get some useful information back:

{u'firstEventTimestamp': 1428133161913, 
 u'lastEventTimestamp': 1428219579747, 
 u'creationTime': 1428133168061,
 u'uploadSequenceToken': u'XXX', 
 u'logStreamName': u'NAME',
 u'lastIngestionTime': 1428219581611,
 u'arn': u'XXX', u'storedBytes': 2382365}

We could easily make AWSLogs.get_streams don't return as "posible candidate streams" streams which don't fall in the start, end window using firstEventTimestamp and lastIngestionTime.

Regardless of how many streams you have I don't thing listing them should be a problem. As far as we remove (and don't try to retrieve logs) from the ones we know we don't have interesting information, it should be "fine".

@jorgebastida
Copy link
Owner

Give it a look to: https://github.com/jorgebastida/awslogs/compare/feature/epic-streams It did the trick to me with a group with around 300 stream (I get 300 are not thousands!) (not all of them had logs in the date rage I was querying) so the number of streams awslogs needs to query for logs was smaller.

Completely useless benchmark:

  • master: 320 streams -> 33s
  • patch: 320 streams -> 41 streams -> 2s to get all logs

I get this is not a solution per se, but it will help to not waste time querying streams with no useful information.

OT question: It is the case the all of those streams have useful information in the date range?

@mfonsen
Copy link
Author

mfonsen commented Jun 11, 2015

Hi,

thank you for your response. I'm sorry for the delay on my behalf.

I tried the upgraded version via pip install. Unfortunately the results are the same. The process gets stuck in trying to retrieve all the streams.

All of the updated streams have useful information in the date range. There are at least thousands of streams that don't have updates. This would make the OrderBy parameter very handy.

Inspired by your work I wrote a proof-of-concept in Node.js by using OrderBy. It seems to work. Let's see if I have time for polishing. However this week Amazon made it possible to subscribe logs through Kinesis so that might be the best solution.

-mfonsen

@jorgebastida
Copy link
Owner

I've just pushed a new version (0.1.0) to pypi which should fix this issue. It basically use a new api available in boto3 which merge streams in their end.

https://pypi.python.org/pypi/awslogs/0.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants