## Get Size of s3 objects

Let us go through the details about how we can get size of s3 objects using `MaxKeys` and `Marker`. We will improvise on top of getting count of s3 objects.

* Here is the code used to get count of objects in s3.

```python
marker = ''
object_count = 0
while True:
    s3_objects = s3_client.list_objects(
        Bucket='itv-genlogs',
        Prefix='logs/year',
        Marker=marker,
        MaxKeys=200
    ).get('Contents')
    if not s3_objects:
        break
    object_count += len(s3_objects)
    marker = s3_objects[-1]['Key']
    print(marker)
```

* Create client with appropriate profile.
* Invoke `list_objects` in pages using `MaxKeys` and `Marker`.
* Each entry in the output of `list_objects` contain `Size` along with `Key` and other details.
* Add the Size of each entry to get the total size of our s3 Bucket. The size in each entry will be in Bytes and you might have to convert to mega bytes.

In [1]:
import boto3

In [2]:
import os
os.environ.setdefault('AWS_PROFILE', 'itvgenlogs')

'itvgenlogs'

In [3]:
s3_client = boto3.client('s3')

In [4]:
s3_objects = s3_client.list_objects(
    Bucket='itv-genlogs',
    Prefix='logs/year'
)

In [5]:
s3_objects.keys()

dict_keys(['ResponseMetadata', 'IsTruncated', 'Marker', 'Contents', 'Name', 'Prefix', 'MaxKeys', 'EncodingType'])

In [6]:
s3_objects['Contents'][0]

{'Key': 'logs/year=2021/month=01/day=19/gen_logs_s3-3-2021-01-19-23-25-20-5e0bdd17-4852-4923-8bda-907badd4f180',
 'LastModified': datetime.datetime(2021, 1, 19, 23, 26, 22, tzinfo=tzutc()),
 'ETag': '"63414c2398f48cd7c5affe0ae3af2132"',
 'Size': 24460,
 'StorageClass': 'STANDARD'}

In [7]:
s3_objects['Contents'][0]['Size']

24460

In [9]:
objects_size = 0.0

for s3_object in s3_objects['Contents']:
    objects_size += s3_object['Size']

In [10]:
objects_size

15745760.0

In [11]:
!pip install hurry.filesize

Collecting hurry.filesize
  Using cached hurry.filesize-0.9.tar.gz (2.8 kB)
Using legacy setup.py install for hurry.filesize, since package 'wheel' is not installed.
Installing collected packages: hurry.filesize
    Running setup.py install for hurry.filesize ... [?25ldone
[?25hSuccessfully installed hurry.filesize-0.9
You should consider upgrading via the '/Users/itversity/Projects/Internal/bootcamp/itversity-material/data-engineering-on-aws/mastering-kinesis/genlogss3/genlogss3-venv/bin/python3.7 -m pip install --upgrade pip' command.[0m


In [12]:
from hurry.filesize import size
size(objects_size)

'15M'

In [16]:
marker = ''
objects_size = 0.0
while True:
    s3_objects = s3_client.list_objects(
        Bucket='itv-genlogs',
        Prefix='logs/year',
        Marker=marker,
        MaxKeys=200
    ).get('Contents')
    if not s3_objects:
        break
    for s3_object in s3_objects:
        objects_size += s3_object['Size']
    marker = s3_objects[-1]['Key']
    print(marker)

logs/year=2021/month=01/day=20/gen_logs_s3-3-2021-01-20-03-50-42-3dee3808-1a7c-4d75-b435-b97d97c04d17
logs/year=2021/month=01/day=20/gen_logs_s3-3-2021-01-20-08-13-00-2ac79807-47d1-4d43-8929-9cd492e1be0d
logs/year=2021/month=01/day=20/gen_logs_s3-3-2021-01-20-12-40-24-359c1ed3-ed6e-4465-a5ac-cbbe04ea6fb6
logs/year=2021/month=01/day=20/gen_logs_s3-3-2021-01-20-16-57-37-37df71b3-9571-46b5-afa3-723d42323fd4
logs/year=2021/month=01/day=20/gen_logs_s3-3-2021-01-20-21-26-01-e132440e-9f75-4c02-a94d-8c6d09f2c087
logs/year=2021/month=01/day=21/gen_logs_s3-3-2021-01-21-01-52-25-8585ddba-c5e1-4b0d-8491-7c730dd09f6c
logs/year=2021/month=01/day=21/gen_logs_s3-3-2021-01-21-06-15-22-4d987e84-d7c2-45a4-abf1-0c0b3d646e1c


In [14]:
objects_size

21736595.0

In [15]:
size(objects_size)

'20M'