Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add s3collection_marker_each() helper for s3 consumers #3

Merged
merged 2 commits into from Sep 26, 2016

Commits on Sep 19, 2016

  1. s3consumer.py - use boto3, add boto3 collection helpers

    This extends S3Cursor with boto3 collection helpers.
    
    s3cursor.filter_collection(coll) - returns a new collection with marker
    s3cursor.persist_progress(coll) - updates marker after each iteration
    s3cursor.each(coll) - uses the above helpers to iterate collections
    
    Here is a sample vanilla consumer:
    
      for obj in S3Cursor('MyName').each(bucket_objects):
          my_handler(obj)
    
    The motivation behind this helper is to leverage boto3 collections,
    which allow chaining.
    
    This makes it possible to write consumers that control the s3 connection
    details. It is also an experiment in creating an alternate API for
    consumers.
    
    Also, boto3 already lazy-loads s3 connections, so there is no longer a
    need to lazy-load connections & buckets.
    
    I confirmed that this works with the following script:
    
      #!/usr/bin/env python
    
      import boto3
    
      s3 = boto3.resource('s3')
      bucket = s3.Bucket('internal_analytics_test')
      collection = bucket.objects.filter(Prefix='MUSKRAT')
    
    Running the above with my network cable unplugged works fine -- boto3
    doesn't make any connections until you actually list bucket contents or
    fetch an object.
    
    more details on boto3 buckets here:
    
    https://boto3.readthedocs.io/en/latest/guide/migrations3.html#accessing-a-bucket
    ender672 committed Sep 19, 2016

Commits on Sep 21, 2016

  1. s3consumer.py - tell boto3 to use a delimiter

    This maintains the existing behavior, and discourages the use of
    top-level prefixes, which wouldn't work anyway because it would
    break timestamp ordering.
    
    test_s3consumer.py - add test for muskrat entry w/ extra levels
    ender672 committed Sep 21, 2016
You can’t perform that action at this time.