## Get number of s3 objects

Let us go through the details about how we can get number of s3 objects. We will understand the relevance of **Marker** to paginate `list_objects` output using boto3.

* One of the way to get s3 object metadata from a given bucket is to use `list_objects`.
* However, `list_objects` gets metadata only for 1000 objects at max.
* We need to paginate using `Marker` and iterate until we get details about all the objects.

Here are the steps we can follow to get the number of s3 objects with in a bucket.
* Create s3 client with appropriate profile.
* Invoke list_objects incrementally using `Marker` until you get details about all the objects.
* Get number of elements in the `Contents` and add it to object count. We can break the loop when the size of `Contents` list is less than 1000 or when `Contents` does not exists as part of the response.

In [1]:
import boto3

In [2]:
import os
os.environ.setdefault('AWS_PROFILE', 'itvgenlogs')

'itvgenlogs'

In [3]:
s3_client = boto3.client('s3')

In [4]:
s3_objects = s3_client.list_objects(
    Bucket='itv-genlogs',
    Prefix='logs/year'
)

In [5]:
s3_objects.keys()

dict_keys(['ResponseMetadata', 'IsTruncated', 'Marker', 'Contents', 'Name', 'Prefix', 'MaxKeys', 'EncodingType'])

In [6]:
s3_objects['Marker']

''

In [7]:
s3_objects['MaxKeys']

1000

In [8]:
len(s3_objects['Contents'])

1000

In [9]:
s3_objects['Contents'][-1]['Key']

'logs/year=2021/month=01/day=20/gen_logs_s3-3-2021-01-20-21-26-01-e132440e-9f75-4c02-a94d-8c6d09f2c087'

In [10]:
marker = s3_objects['Contents'][-1]['Key']

In [11]:
s3_objects = s3_client.list_objects(
    Bucket='itv-genlogs',
    Prefix='logs/year',
    Marker=marker
)

In [12]:
s3_objects['Marker']

'logs/year=2021/month=01/day=20/gen_logs_s3-3-2021-01-20-21-26-01-e132440e-9f75-4c02-a94d-8c6d09f2c087'

In [14]:
len(s3_objects['Contents'])

380

In [21]:
marker = ''
object_count = 0
while True:
    s3_objects = s3_client.list_objects(
        Bucket='itv-genlogs',
        Prefix='logs/year',
        Marker=marker,
        MaxKeys=200
    ).get('Contents')
    if not s3_objects:
        break
    object_count += len(s3_objects)
    marker = s3_objects[-1]['Key']
    print(marker)

logs/year=2021/month=01/day=20/gen_logs_s3-3-2021-01-20-03-50-42-3dee3808-1a7c-4d75-b435-b97d97c04d17
logs/year=2021/month=01/day=20/gen_logs_s3-3-2021-01-20-08-13-00-2ac79807-47d1-4d43-8929-9cd492e1be0d
logs/year=2021/month=01/day=20/gen_logs_s3-3-2021-01-20-12-40-24-359c1ed3-ed6e-4465-a5ac-cbbe04ea6fb6
logs/year=2021/month=01/day=20/gen_logs_s3-3-2021-01-20-16-57-37-37df71b3-9571-46b5-afa3-723d42323fd4
logs/year=2021/month=01/day=20/gen_logs_s3-3-2021-01-20-21-26-01-e132440e-9f75-4c02-a94d-8c6d09f2c087
logs/year=2021/month=01/day=21/gen_logs_s3-3-2021-01-21-01-52-25-8585ddba-c5e1-4b0d-8491-7c730dd09f6c
logs/year=2021/month=01/day=21/gen_logs_s3-3-2021-01-21-06-15-22-4d987e84-d7c2-45a4-abf1-0c0b3d646e1c


In [22]:
object_count

1380