## Reading Content from s3 Object

Let us understand how we can read the content from s3 Object or file using Python boto3.
* Create s3 client using appropriate profile.
* Get one of the object name. We can use `list_objects` to get the object names. It can get up to 1000 object keys or names in each iteration.
* We can pick one of the object key or name and pass it on to `get_object` along with bucket name.
* The response will contain `Body` of type byte stream. We can decode the `Body` to string.
* We can further process the data using relevant string manipulation functions as per our requirements.

In [1]:
import boto3

In [2]:
import os
os.environ.setdefault('AWS_PROFILE', 'itvgenlogs')

'itvgenlogs'

In [3]:
s3_client = boto3.client('s3')

In [None]:
s3_client.list_objects?

In [6]:
s3_objects = s3_client.list_objects(
    Bucket='itv-genlogs',
    Prefix='logs/year'
)

In [None]:
s3_objects

In [None]:
s3_objects['Contents']

In [9]:
len(s3_objects['Contents'])

1000

In [10]:
s3_objects['Contents'][0]

{'Key': 'logs/year=2021/month=01/day=19/gen_logs_s3-3-2021-01-19-23-25-20-5e0bdd17-4852-4923-8bda-907badd4f180',
 'LastModified': datetime.datetime(2021, 1, 19, 23, 26, 22, tzinfo=tzutc()),
 'ETag': '"63414c2398f48cd7c5affe0ae3af2132"',
 'Size': 24460,
 'StorageClass': 'STANDARD'}

In [11]:
s3_objects['Contents'][0]['Key']

'logs/year=2021/month=01/day=19/gen_logs_s3-3-2021-01-19-23-25-20-5e0bdd17-4852-4923-8bda-907badd4f180'

In [12]:
s3_object_key = s3_objects['Contents'][0]['Key']

In [None]:
s3_client.get_object?

In [22]:
s3_object = s3_client.get_object(
    Bucket='itv-genlogs',
    Key=s3_object_key
)

In [16]:
type(s3_object)

dict

In [17]:
s3_object

{'ResponseMetadata': {'RequestId': '18DF74A6099C14E8',
  'HostId': 'Zc7O64jogDyT5G2UpD03lHMitBPl+f5+jYjTHLhw1XF253zz0BNK4ZpjA2pxALbcd+EZDMx5s3g=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'Zc7O64jogDyT5G2UpD03lHMitBPl+f5+jYjTHLhw1XF253zz0BNK4ZpjA2pxALbcd+EZDMx5s3g=',
   'x-amz-request-id': '18DF74A6099C14E8',
   'date': 'Wed, 20 Jan 2021 23:28:49 GMT',
   'last-modified': 'Tue, 19 Jan 2021 23:26:22 GMT',
   'etag': '"63414c2398f48cd7c5affe0ae3af2132"',
   'accept-ranges': 'bytes',
   'content-type': 'application/octet-stream',
   'content-length': '24460',
   'server': 'AmazonS3'},
  'RetryAttempts': 0},
 'AcceptRanges': 'bytes',
 'LastModified': datetime.datetime(2021, 1, 19, 23, 26, 22, tzinfo=tzutc()),
 'ContentLength': 24460,
 'ETag': '"63414c2398f48cd7c5affe0ae3af2132"',
 'ContentType': 'application/octet-stream',
 'Metadata': {},
 'Body': <botocore.response.StreamingBody at 0x117d5a690>}

In [18]:
s3_object['Body']

<botocore.response.StreamingBody at 0x117d5a690>

In [None]:
help(s3_object['Body'])

In [None]:
s3_object['Body'].read()

In [None]:
s3_object['Body'].read().decode('utf-8')

In [24]:
import boto3

import os
os.environ.setdefault('AWS_PROFILE', 'itvgenlogs')

s3_client = boto3.client('s3')

s3_objects = s3_client.list_objects(
    Bucket='itv-genlogs',
    Prefix='logs/year'
)

s3_object_key = s3_objects['Contents'][0]['Key']
s3_object = s3_client.get_object(
    Bucket='itv-genlogs',
    Key=s3_object_key
)

file_contents = s3_object['Body'].read().decode('utf-8')

In [25]:
file_records = file_contents.splitlines()

In [26]:
file_records[:3]

['22.160.191.22 - - [19/Jan/2021:18:24:20 -0800] "GET /departments HTTP/1.1" 200 1338 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36"',
 '148.185.154.242 - - [19/Jan/2021:18:24:21 -0800] "GET /checkout HTTP/1.1" 200 326 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0"',
 '160.130.14.64 - - [19/Jan/2021:18:24:22 -0800] "GET /support HTTP/1.1" 200 749 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36"']