This extends S3Cursor with boto3 collection helpers.
s3cursor.filter_collection(coll) - returns a new collection with marker
s3cursor.persist_progress(coll) - updates marker after each iteration
s3cursor.each(coll) - uses the above helpers to iterate collections
Here is a sample vanilla consumer:
for obj in S3Cursor('MyName').each(bucket_objects):
my_handler(obj)
The motivation behind this helper is to leverage boto3 collections,
which allow chaining.
This makes it possible to write consumers that control the s3 connection
details. It is also an experiment in creating an alternate API for
consumers.
Also, boto3 already lazy-loads s3 connections, so there is no longer a
need to lazy-load connections & buckets.
I confirmed that this works with the following script:
#!/usr/bin/env python
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('internal_analytics_test')
collection = bucket.objects.filter(Prefix='MUSKRAT')
Running the above with my network cable unplugged works fine -- boto3
doesn't make any connections until you actually list bucket contents or
fetch an object.
more details on boto3 buckets here:
https://boto3.readthedocs.io/en/latest/guide/migrations3.html#accessing-a-bucket