s3consumer.py - use boto3, add boto3 collection helpers
This extends S3Cursor with boto3 collection helpers.
s3cursor.filter_collection(coll) - returns a new collection with marker
s3cursor.persist_progress(coll) - updates marker after each iteration
s3cursor.each(coll) - uses the above helpers to iterate collections
Here is a sample vanilla consumer:
for obj in S3Cursor('MyName').each(bucket_objects):
my_handler(obj)
The motivation behind this helper is to leverage boto3 collections,
which allow chaining.
This makes it possible to write consumers that control the s3 connection
details. It is also an experiment in creating an alternate API for
consumers.
Also, boto3 already lazy-loads s3 connections, so there is no longer a
need to lazy-load connections & buckets.
I confirmed that this works with the following script:
#!/usr/bin/env python
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('internal_analytics_test')
collection = bucket.objects.filter(Prefix='MUSKRAT')
Running the above with my network cable unplugged works fine -- boto3
doesn't make any connections until you actually list bucket contents or
fetch an object.
more details on boto3 buckets here:
https://boto3.readthedocs.io/en/latest/guide/migrations3.html#accessing-a-bucket
What is the behavior of this if the obj/key is not message. If this key points to something that is considered a namespace (something like a 'sub directory'. Not sure what to call it in S3) then what happens?
I just created a test for this. Just to be sure that I have it right:
Given that you are monitoring the prefix "FOO/BAR/".
And the bucket is empty.
When you add object A with key "FOO/BAR/2016-09-21T13:53:23.594894"
And you add object B with key "FOO/BAR/BAZ/2016-09-21T13:54:37.164853"
And the muskrat consumer is invoked
Then we should process object A
And we should not process object B
Is this the right test? If so, the implementation in this PR fails -- it processes both A and B.
One more thing to point out -- this PR doesn't use the "delimiter" parameter when listing S3 entries, so the collection shouldn't yield prefix objects.
ProTip!
Use n and p to navigate between commits in a pull request.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.
We use optional third-party analytics cookies to understand how you use GitHub.com so we can build better products.
Learn more.
We use optional third-party analytics cookies to understand how you use GitHub.com so we can build better products.
You can always update your selection by clicking Cookie Preferences at the bottom of the page.
For more information, see our Privacy Statement.
Essential cookies
We use essential cookies to perform essential website functions, e.g. they're used to log you in.
Learn more
Always active
Analytics cookies
We use analytics cookies to understand how you use our websites so we can make them better, e.g. they're used to gather information about the pages you visit and how many clicks you need to accomplish a task.
Learn more
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add s3collection_marker_each() helper for s3 consumers #3
add s3collection_marker_each() helper for s3 consumers #3
Changes from 1 commit
698660ebad422fFile filter...
Jump to…
s3consumer.py - use boto3, add boto3 collection helpers
This extends S3Cursor with boto3 collection helpers. s3cursor.filter_collection(coll) - returns a new collection with marker s3cursor.persist_progress(coll) - updates marker after each iteration s3cursor.each(coll) - uses the above helpers to iterate collections Here is a sample vanilla consumer: for obj in S3Cursor('MyName').each(bucket_objects): my_handler(obj) The motivation behind this helper is to leverage boto3 collections, which allow chaining. This makes it possible to write consumers that control the s3 connection details. It is also an experiment in creating an alternate API for consumers. Also, boto3 already lazy-loads s3 connections, so there is no longer a need to lazy-load connections & buckets. I confirmed that this works with the following script: #!/usr/bin/env python import boto3 s3 = boto3.resource('s3') bucket = s3.Bucket('internal_analytics_test') collection = bucket.objects.filter(Prefix='MUSKRAT') Running the above with my network cable unplugged works fine -- boto3 doesn't make any connections until you actually list bucket contents or fetch an object. more details on boto3 buckets here: https://boto3.readthedocs.io/en/latest/guide/migrations3.html#accessing-a-bucketsirsgriffinSep 21, 2016
•
edited
Contributor
What is the behavior of this if the obj/key is not message. If this key points to something that is considered a namespace (something like a 'sub directory'. Not sure what to call it in S3) then what happens?
ender672Sep 21, 2016
Author
Member
I just created a test for this. Just to be sure that I have it right:
Given that you are monitoring the prefix "FOO/BAR/".
And the bucket is empty.
When you add object A with key "FOO/BAR/2016-09-21T13:53:23.594894"
And you add object B with key "FOO/BAR/BAZ/2016-09-21T13:54:37.164853"
And the muskrat consumer is invoked
Then we should process object A
And we should not process object B
Is this the right test? If so, the implementation in this PR fails -- it processes both A and B.
ender672Sep 21, 2016
Author
Member
This is fixed in 698660e
ender672Sep 21, 2016
•
edited
Author
Member
Is satisfying this test the behavior that we want?
When a consumer is monitoring "FOO/BAR" do we want to ignore "FOO/BAR/BAZ/2016-09-21T13:54:37.164853" ?
ender672Sep 21, 2016
Author
Member
One more thing to point out -- this PR doesn't use the "delimiter" parameter when listing S3 entries, so the collection shouldn't yield prefix objects.
ender672Sep 21, 2016
Author
Member
The test I mentioned is here:
ender672/muskrat@cca4b32