Listing the existing shards in kinesis stream https://docs.aws.amazon.com/cli/latest/reference/kinesis/list-shards.html. This displays shard-id and seqeunce number which required for generating shard-iterator which is required as parameter for kinesis get-records action

In [12]:
%%bash

aws kinesis list-shards \
--stream-name kinesis-twitter-stream                    

{
    "Shards": [
        {
            "ShardId": "shardId-000000000000",
            "HashKeyRange": {
                "StartingHashKey": "0",
                "EndingHashKey": "85070591730234615865843651857942052863"
            },
            "SequenceNumberRange": {
                "StartingSequenceNumber": "49628678101426762247045407541007456333915361328442114050"
            }
        },
        {
            "ShardId": "shardId-000000000001",
            "HashKeyRange": {
                "StartingHashKey": "85070591730234615865843651857942052864",
                "EndingHashKey": "170141183460469231731687303715884105727"
            },
            "SequenceNumberRange": {
                "StartingSequenceNumber": "49628678101449062992243938164148992052188009689948094482"
            }
        },
        {
            "ShardId": "shardId-000000000002",
            "HashKeyRange": {
                "StartingHashKey": "170141183460469231731687303715884105728",
                "Endi



Get Shard iterator using action from https://docs.aws.amazon.com/cli/latest/reference/kinesis/get-shard-iterator.html
Note that once iterator returned, it expires after  5mins

A shard iterator specifies the shard position from which to start reading data records sequentially. The position is specified using the sequence number of a data record in a shard. We pass this in as arg to command as well as shardid - both can be obtained from output of previous run cell above.

The --shard-iterator-type can also be set to LATEST to start reading after the latest records. 

In [61]:
%%bash

 
aws kinesis get-shard-iterator \
--stream-name kinesis-twitter-stream \
--shard-iterator-type AT_SEQUENCE_NUMBER  --starting-sequence-number 49628678101449062992243938164148992052188009689948094482 \
--shard-id shardId-000000000001


# ------ Generating shard iterator using LATEST below. We can then stream records and 
# -- then use get-records in next cell to retrieve the latest records streamed

# aws kinesis get-shard-iterator \
# --stream-name kinesis-twitter-stream \
# --shard-iterator-type LATEST \
# --shard-id shardId-000000000001

{
    "ShardIterator": "AAAAAAAAAAH9LVFqeiNJkoe5M+D4J3W018CHWD2W6+/FC9Mf3vdBGncpW8jmNOJA1kb0zA9z/6ouFWbKd703J64Ir160jWrE4jK2fbsbZXUYCynrUhiBT6GzV4/hPyC/dcqEW8oasCVUTsbKq/uWZs/IiuvflX+bHg1GiHGfuNtHq8hx3XJx6WuCacGbN8rHXuB1WZTuqBHicptQT7RKmd27SQU3cB6HxKOQj/kI0VJgytQ/76p2gTxsSpjGX5ps0oS1yFBhjh8="
}


 We can get records from the iterator by using aws kinesis get-records action. The iterator value output from running command in cell above, can be passed as parameter --shard-iterator for get-records command https://docs.aws.amazon.com/cli/latest/reference/kinesis/get-records.html. This sequentially moves through the sequence numbers and output into json in datasets folder

In [71]:
%%bash 

# if data exists in json get rid of it (as command below will append messing up json format)
SHARD_RECORDS_PATH=../datasets/outputs/sample_shard_records.json
>$SHARD_RECORDS_PATH

aws kinesis get-records \
--shard-iterator "AAAAAAAAAAH9LVFqeiNJkoe5M+D4J3W018CHWD2W6+/FC9Mf3vdBGncpW8jmNOJA1kb0zA9z/6ouFWbKd703J64Ir160jWrE4jK2fbsbZXUYCynrUhiBT6GzV4/hPyC/dcqEW8oasCVUTsbKq/uWZs/IiuvflX+bHg1GiHGfuNtHq8hx3XJx6WuCacGbN8rHXuB1WZTuqBHicptQT7RKmd27SQU3cB6HxKOQj/kI0VJgytQ/76p2gTxsSpjGX5ps0oS1yFBhjh8=" \
>> $SHARD_RECORDS_PATH

Reading from the shard records json into python dict and then checking records

In [72]:

import json


with open("../datasets/outputs/sample_shard_records.json") as f:
    shard_data = json.load(f)



In [75]:
first_record = shard_data['Records'][0]['Data']
print(first_record)

eyJkYXkiOiAxOCwgIm1vbnRoIjogNCwgInllYXIiOiAyMDIyLCAidGltZSI6ICIwMTowNTowMyIsICJoYW5kbGUiOiAiYm5kd2R0aCIsICJ0ZXh0IjogIkB2aXJnaW5tZWRpYSBFeGNlbGxlbnQ6IG15IGJyb2FkYmFuZCAjYmFuZHdpZHRoIGluICNOb3J0aGFudHMgaXMgNjEwTWJwcyBkb3duIGFuZCA0ME1icHMgdXAhICNNNjAwRmlicmUgI3N1cGVyZmFzdCBodHRwczovL3QuY28vV0k4WDRqZnZLQyIsICJmYXZvdXJpdGVfY291bnQiOiA0LCAicmV0d2VldF9jb3VudCI6IDAsICJyZXR3ZWV0ZWQiOiBmYWxzZSwgImZvbGxvd2Vyc19jb3VudCI6IDEyLCAiZnJpZW5kc19jb3VudCI6IDQsICJsb2NhdGlvbiI6ICJOb3J0aGFtcHRvbnNoaXJlLCBFbmdsYW5kIiwgImxhbmciOiBudWxsfQ==


In [76]:
import base64
json.loads(base64.b64decode(first_record).decode('utf-8'))

{'day': 18,
 'month': 4,
 'year': 2022,
 'time': '01:05:03',
 'handle': 'bndwdth',
 'text': '@virginmedia Excellent: my broadband #bandwidth in #Northants is 610Mbps down and 40Mbps up! #M600Fibre #superfast https://t.co/WI8X4jfvKC',
 'favourite_count': 4,
 'retweet_count': 0,
 'retweeted': False,
 'followers_count': 12,
 'friends_count': 4,
 'location': 'Northamptonshire, England',
 'lang': None}