Real-Time Clickstream Anomaly Detection Kinesis Analytics

This lab is provided as part of AWS Innovate Data Edition, it has been adapted from an AWS Workshop

ℹ️ You will run this lab in your own AWS account and running this lab will incur some costs. Please follow directions at the end of the lab to remove resources to avoid future costs.

Overview

This lab helps you to analyze streaming data using Amazon Kinesis Data Analytics Studio to get timely insights and react quickly to new information you receive from your business and your applications.

This is data that must usually be processed sequentially and incrementally on a record-by-record basis or over sliding time windows, and can be used for a variety of analytics including correlations, aggregations, filtering, and sampling.

Duration - Approximately 2 hours

Create Kinesis Data Stream

Navigate to Amazon Kinesis console
Choose Create data stream.
For Data stream name, enter my-input-stream.
For Capacity mode, select On-demand and click Create Data Stream

Create Kinesis Data Generator

Click here to start with the CloudFormation stack creation screen. Kinesis Data Generator uses a service called Amazon Cognito at the backend for login authentication and authorization of log sending permissions. By creating this CloudFormation stack, you can create the necessary Cognito resources.
In "Step 1: Specify template", make sure that the Amazon S3 URL where the template source is located has already entered. Click [Next] without any changes.
In "Step 2: Specify stack details", enter the appropriate value for "Username" and "Password" for "Kinesis Data Generator". The username and password specified here will be used to log in to Kinesis Data Gnerator later. Once you have entered, click [Next].
In "Step 3: Configure stack options", click [Next] without any changes.
In "Step 4: Review", check the check-box of "I acknowledge that AWS CloudFormation might create IAM resources with custom names " at to bottom of the screen, and then click [Create stack] button to start the stack creation.
Wait for a few minutes until the stack status changes CREATE_COMPLETE.

Sending Data from Kinesis Data Generator

Choose [Output] tab of the CloudFormation stack you have created. You can open the setting screen of Kinesis Data Generator by clicking the URL of "KinesisDataGeneratorUrl" displayed.
Enter the user name and password you have created in the the above step to "Username" and "Password" in the top right of the screen, and then login to it.
Configure the log transfer setting actually in this step. In "Region", choose [us-east-1] ( N. Virginia region), and then choose my-input-stream you have created earlier in Stream/delivery stream.
Enter "5" to Records per second (the number of log records generated per second). This means that 5 records are created per 1 second. As a result 300 records are generated in one minute, and then sent to Kinesis Data Stream.
In "Record template" below, copy and paste the following codes into Templete 1 field. This specifies the format for logging sent from clients. It automatically generates dummy sensors data to send to client.

{
    "sensor_id": {{random.number(150)}},
    "current_temperature": {{random.number(
        {
            "min":10,
            "max":150
        }
    )}},
    "status": "{{random.arrayElement(
        ["OK","FAIL","WARN"]
    )}}",
    "event_time": "{{date.now("YYYY-MM-DDTHH:mm:ss.SSS")}}"
}

Click [Send data] button at last to start sending the data. The Data continues to be sent to Kinesis Data Stream until you click [Stop Sending Data to Kinesis] displayed in the pop-up menu or close the browser tab.

Set Up Kinesis Data Analytics Studio Notebook

From the Kinesis console, select my-input-stream Kinesis data stream and choose Process data in real time from the Process drop-down. In this way, the stream is configured as a source for the notebook.
Choose Apache Flink – Studio notebook and click Create
Enter my-notebook as name and a description for the notebook. And choose to Create an AWS Glue Database
In the AWS Glue console, create an empty database named my_database
Navigate back to the Kinesis Data Analytics Studio console, refresh the list and select the new database. And choose Create Studio notebook.
Now that notebook has been created, choose Run

Analyze Streaming Data

When the notebook is running, choose Open in Apache Zeppelin to get access to the notebook and write code in SQL, Python, or Scala to interact with streaming data and get insights in real time.

Choose Import Note and upload the following notebook and name it Sensors
Open the imported note
Follow the steps in the Notebook to perform streaming data analysis
Please stop and Sending Data from Kinesis Data Generator again if you don't see the result in the queries

Cleanup

Follow the below steps to cleanup your account to prevent any aditional charges:

Navigate to the Kinesis Data Analytics Notebooks. Select the 'my-notebook' and click on Delete.
Navigate to Kinesis Data Streams Console, select my-input-stream and click on Delete.
Navigate to the CloudFormation and find the stack that was deployed in step Create Kinesis Data Generator Select the stack and delete.
Navigate to the AWS Glue Databases Console and delete my_database

Conclusion

Throughout the lab, you've learnt how to use Kinesis Data Analytics Studio to analyze streaming data.

Streaming ingest and stream processing is one of the scenarios in the Well-Architected Framework Data Analytics Lens

We highly recommend you to deep dive the Well-Architected Data Analytics Lens to understand the pros and cons of decisions while building analytics systems and workloads on AWS.

³Participants will be required to provide their business email addresses to receive the gift code for AWS credits.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
images		images
scripts		scripts
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time Clickstream Anomaly Detection Kinesis Analytics

Table of Contents

Overview

Create Kinesis Data Stream

Create Kinesis Data Generator

Sending Data from Kinesis Data Generator

Set Up Kinesis Data Analytics Studio Notebook

Analyze Streaming Data

Cleanup

Conclusion

About

Releases

Packages

License

phonghuule/Real-Time-Clickstream-Anomaly-Detection-Kinesis-Analytics

Folders and files

Latest commit

History

Repository files navigation

Real-Time Clickstream Anomaly Detection Kinesis Analytics

Table of Contents

Overview

Create Kinesis Data Stream

Create Kinesis Data Generator

Sending Data from Kinesis Data Generator

Set Up Kinesis Data Analytics Studio Notebook

Analyze Streaming Data

Cleanup

Conclusion

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages