Listing the existing shards in kinesis stream https://docs.aws.amazon.com/cli/latest/reference/kinesis/list-shards.html. This displays shard-id and seqeunce number which required for generating shard-iterator which is required as parameter for kinesis get-records action

In [34]:
%%bash

aws kinesis list-shards \
--stream-name kinesis-twitter-stream                    

{
    "Shards": [
        {
            "ShardId": "shardId-000000000000",
            "HashKeyRange": {
                "StartingHashKey": "0",
                "EndingHashKey": "85070591730234615865843651857942052863"
            },
            "SequenceNumberRange": {
                "StartingSequenceNumber": "49628804485637199052498835133288337056183523870576214018"
            }
        },
        {
            "ShardId": "shardId-000000000001",
            "HashKeyRange": {
                "StartingHashKey": "85070591730234615865843651857942052864",
                "EndingHashKey": "170141183460469231731687303715884105727"
            },
            "SequenceNumberRange": {
                "StartingSequenceNumber": "49628804485659499797697365756429872774456172232082194450"
            }
        },
        {
            "ShardId": "shardId-000000000002",
            "HashKeyRange": {
                "StartingHashKey": "170141183460469231731687303715884105728",
                "Endi



Get Shard iterator using action from https://docs.aws.amazon.com/cli/latest/reference/kinesis/get-shard-iterator.html

**Note that once iterator returned, it expires after  5mins !**

A shard iterator specifies the shard position from which to start reading data records sequentially. The position is specified using the sequence number of a data record in a shard. We pass this in as arg to command as well as shardid - both can be obtained from output of previous run cell above.

The --shard-iterator-type can also be set to LATEST to start reading after the latest records. 

In [56]:
%%bash

 
aws kinesis get-shard-iterator \
--stream-name kinesis-twitter-stream \
--shard-iterator-type AT_SEQUENCE_NUMBER  --starting-sequence-number 49628804485659499797697365756429872774456172232082194450 \
--shard-id shardId-000000000001


# ------ Generating shard iterator using LATEST below. We can then stream records and 
# -- then use get-records in next cell to retrieve the latest records streamed

# aws kinesis get-shard-iterator \
# --stream-name kinesis-twitter-stream \
# --shard-iterator-type LATEST \
# --shard-id shardId-000000000001

{
    "ShardIterator": "AAAAAAAAAAFjXi7SwE13ybBWJPMLPegv//1TdQa8Ee0k75WIDHiijKNoKtnuO63PgStj/M8tsMJe/0otC8bzP1h0Z97/nKkzdHYbwEMPzcYALfJ2n5qVz7FrnZ3ZO5Dk6bmF/FBKtlXdwa4+EaamHtf2RmG5fIkTvjUuk8eguv0fyOPddCjAD+a4mFOnskjlhaNINZhVAvrPH33kjxpZMO0VoL/Mkms847SDTWy0B4zZd5DP64ykneCLuQ3wK/APMw/8IGvDOIw="
}


 We can get records from the iterator by using aws kinesis get-records action. The iterator value output from running command in cell above, can be passed as parameter --shard-iterator for get-records command https://docs.aws.amazon.com/cli/latest/reference/kinesis/get-records.html. This sequentially moves through the sequence numbers and output into json in datasets folder

In [57]:
%%bash 

# if data exists in json get rid of it (as command below will append messing up json format)
SHARD_RECORDS_PATH="../datasets/outputs/kinesis/sample_shard_records.json"
>$SHARD_RECORDS_PATH

aws kinesis get-records \
--shard-iterator "AAAAAAAAAAFjXi7SwE13ybBWJPMLPegv//1TdQa8Ee0k75WIDHiijKNoKtnuO63PgStj/M8tsMJe/0otC8bzP1h0Z97/nKkzdHYbwEMPzcYALfJ2n5qVz7FrnZ3ZO5Dk6bmF/FBKtlXdwa4+EaamHtf2RmG5fIkTvjUuk8eguv0fyOPddCjAD+a4mFOnskjlhaNINZhVAvrPH33kjxpZMO0VoL/Mkms847SDTWy0B4zZd5DP64ykneCLuQ3wK/APMw/8IGvDOIw=" \
>> $SHARD_RECORDS_PATH

Reading from the shard records json into python dict and then checking records

In [58]:

import json

SHARD_RECORDS_PATH="../datasets/outputs/kinesis/sample_shard_records.json"
with open(SHARD_RECORDS_PATH) as f:
    shard_data = json.load(f)



In [73]:
random_record = shard_data['Records'][-5]['Data']
print(random_record)

eyJjcmVhdGVkX2F0IjogIkZyaSBBcHIgMjIgMDA6NDg6MjQgKzAwMDAgMjAyMiIsICJoYW5kbGUiOiAiS1RheWl0aSIsICJ0ZXh0IjogIkhleSBUaGVyZSEgSGVyZSdzIGEgTmV3IFBvc3QgT246IFRoZSBUaHVuZGVyJ3MgT2tsYWhvbWEgU3Bpcml0IHwgIGh0dHBzOi8vdC5jby8wTEtiQ0VGMGVPLi4uLkNoZWNrIGl0IG91dC4gaHR0cHM6Ly90LmNvL1VGekhPVE40OXMgICNOQkEgI05ITCAgI05GTCBodHRwczovL3QuY28vZUJZUEdva1YySyIsICJmYXZvdXJpdGVfY291bnQiOiAwLCAicmV0d2VldF9jb3VudCI6IDAsICJyZXR3ZWV0ZWQiOiBmYWxzZSwgImZvbGxvd2Vyc19jb3VudCI6IDIsICJmcmllbmRzX2NvdW50IjogMjgsICJsb2NhdGlvbiI6IG51bGwsICJsYW5nIjogbnVsbH0=


In [74]:
import base64
json.loads(base64.b64decode(random_record).decode('utf-8'))

{'created_at': 'Fri Apr 22 00:48:24 +0000 2022',
 'handle': 'KTayiti',
 'text': "Hey There! Here's a New Post On: The Thunder's Oklahoma Spirit |  https://t.co/0LKbCEF0eO....Check it out. https://t.co/UFzHOTN49s  #NBA #NHL  #NFL https://t.co/eBYPGokV2K",
 'favourite_count': 0,
 'retweet_count': 0,
 'retweeted': False,
 'followers_count': 2,
 'friends_count': 28,
 'location': None,
 'lang': None}