# Amazon Kinesis Data Stream

Amazon Kinesis Data Streams ingests a large amount of data in real time, durably stores the data, and makes the data available for consumption. The unit of data stored by Kinesis Data Streams is a data record. A data stream represents a group of data records. The data records in a data stream are distributed into shards.

A shard has a sequence of data records in a stream. When you create a stream, you specify the number of shards for the stream. The total capacity of a stream is the sum of the capacities of its shards. You can increase or decrease the number of shards in a stream as needed. However, you are charged on a per-shard basis. 

The producers continually push data to Kinesis Data Streams, and the consumers process the data in real time. Consumers (such as a custom application running on Amazon EC2 or an Amazon Kinesis Data Firehose delivery stream) can store their results using an AWS service such as Amazon DynamoDB, Amazon Redshift, or Amazon S3.

<img src="img/kinesis_data_stream_docs.png" width="90%" align="left">

In [1]:
import boto3
import sagemaker
import pandas as pd
import json

sess   = sagemaker.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = boto3.Session().region_name

sm = boto3.Session().client(service_name='sagemaker', region_name=region)
kinesis = boto3.Session().client(service_name='kinesis', region_name=region)
sts = boto3.Session().client(service_name='sts', region_name=region)

# Create a Kinesis Data Stream

In [2]:
%store -r stream_name

In [3]:
try:
    stream_name
except NameError:
    print('+++++++++++++++++++++++++++++++')
    print('[ERROR] Please run all previous notebooks in this section before you continue.')
    print('+++++++++++++++++++++++++++++++')

In [4]:
print(stream_name)

dsoaws-kinesis-data-stream


In [5]:
shard_count = 2

In [6]:
from botocore.exceptions import ClientError

try: 
    response = kinesis.create_stream(
        StreamName=stream_name, 
        ShardCount=shard_count
    )
    print('Data Stream {} successfully created.'.format(stream_name))
    print(json.dumps(response, indent=4, sort_keys=True, default=str))
    
except ClientError as e:
    if e.response['Error']['Code'] == 'ResourceInUseException':
        print('Data Stream {} already exists.'.format(stream_name))
    else:
        print('Unexpected error: %s' % e)

Data Stream dsoaws-kinesis-data-stream successfully created.
{
    "ResponseMetadata": {
        "HTTPHeaders": {
            "content-length": "0",
            "content-type": "application/x-amz-json-1.1",
            "date": "Sat, 26 Sep 2020 20:51:14 GMT",
            "x-amz-id-2": "B8xm/QIXjNiaoLFKcAZoIJvfQA0YB52BzSouvPD6M/dccIDS5b1inX1SdaWr8ZnLqKaFLdI2zSrPpEMQC3HroFeV/f02xPbhNjUQdeoGVJs=",
            "x-amzn-requestid": "fe89d536-3843-e7bd-a960-7934c5453f7c"
        },
        "HTTPStatusCode": 200,
        "RequestId": "fe89d536-3843-e7bd-a960-7934c5453f7c",
        "RetryAttempts": 0
    }
}


In [7]:
import time

status = ''
while status != 'ACTIVE':    
    r = kinesis.describe_stream(StreamName=stream_name)
    description = r.get('StreamDescription')
    status = description.get('StreamStatus')
    time.sleep(5)
    
print('Stream {} is active'.format(stream_name))

Stream dsoaws-kinesis-data-stream is active


## _This may take a minute.  Please be patient._

In [8]:
stream_response = kinesis.describe_stream(
    StreamName=stream_name
)

print(json.dumps(stream_response, indent=4, sort_keys=True, default=str))

{
    "ResponseMetadata": {
        "HTTPHeaders": {
            "content-length": "865",
            "content-type": "application/x-amz-json-1.1",
            "date": "Sat, 26 Sep 2020 20:51:24 GMT",
            "x-amz-id-2": "aWFO4dWyTL90TrnTWFO+zRK+H1LHvaniyizaSCxqUz7E0zNfPIp1fVlqNQ5j4EyXjJ9HrhAoIUNgpKsvGJRv1lzrd6gpAE+Z45Qy/YveuE8=",
            "x-amzn-requestid": "e98bc2d7-6610-8350-be62-6edb9b165b91"
        },
        "HTTPStatusCode": 200,
        "RequestId": "e98bc2d7-6610-8350-be62-6edb9b165b91",
        "RetryAttempts": 0
    },
    "StreamDescription": {
        "EncryptionType": "NONE",
        "EnhancedMonitoring": [
            {
                "ShardLevelMetrics": []
            }
        ],
        "HasMoreShards": false,
        "RetentionPeriodHours": 24,
        "Shards": [
            {
                "HashKeyRange": {
                    "EndingHashKey": "170141183460469231731687303715884105727",
                    "StartingHashKey": "0"
                },
   

In [9]:
stream_arn = stream_response['StreamDescription']['StreamARN']
print(stream_arn)

arn:aws:kinesis:us-west-2:085964654406:stream/dsoaws-kinesis-data-stream


In [10]:
%store stream_arn

Stored 'stream_arn' (str)


# Review Kinesis Data Stream

In [11]:
from IPython.core.display import display, HTML
    
display(HTML('<b>Review <a target="blank" href="https://console.aws.amazon.com/kinesis/home?region={}#/streams/details/{}/details"> Kinesis Data Stream</a></b>'.format(region, stream_name)))


# Store Variables for the Next Notebooks

In [12]:
%store

Stored variables and their in-db values:
auto_ml_job_name                                      -> 'automl-dm-26-16-00-25'
autopilot_endpoint_name                               -> 'automl-dm-ep-26-16-21-49'
autopilot_train_s3_uri                                -> 's3://sagemaker-us-west-2-085964654406/data/amazon
balance_dataset                                       -> True
experiment_name                                       -> 'Amazon-Customer-Reviews-BERT-Experiment-160114585
firehose_arn                                          -> 'arn:aws:firehose:us-west-2:085964654406:deliverys
firehose_name                                         -> 'dsoaws-kinesis-data-firehose'
iam_kinesis_role_name                                 -> 'DSOAWS_Kinesis'
iam_kinesis_role_passed                               -> True
iam_lambda_role_name                                  -> 'DSOAWS_Lambda'
iam_lambda_role_passed                                -> True
iam_role_kinesis_arn                             

In [13]:
%%javascript
Jupyter.notebook.save_checkpoint();
Jupyter.notebook.session.delete();