## Rockset Pairs with DynamoDB for Complex Analytics

#### [Workshop](https://rockset.awsworkshop.io/)

While DynamoDB is great for real-time transactions, it can be paired with Rockset for analytical workloads, like complex aggregations and JOINs. Rockset is a real-time analytics database that’s able to ingest data with a data freshness of 1 - 2 seconds and execute heavy analytical SQL queries that JOIN, aggregate, and searche in milliseconds. When data is ingested from DynamoDB, it’s indexed via Rockset’s Converged Index™, so terabytes of deeply nested data are returned in under a second. Rockset’s Converged Index™ indexes all fields in the document via 3 different ways: a row index, columnar index, and an inverted index. Rockset also supports real-time updates, inserts, and deletes.

Below is an architecture diagram of what sources you can integrate with Rockset to write and execute queries that JOIN, search, and aggregate. Once the queries are executed on Rockset, you can power real-time applications, like leaderboards, dashboards, personalization, and much more within seconds:

![image](https://rockset.awsworkshop.io/images/Picture1.png)

### Build Real-time Dashboards

We’ll be simulating real-time transactional data that’ll be stored in DynamoDB. Our goal is to analyze that data on Rockset with SQL and then build a real-time dashboard with Grafana. Here’s what our architecture will look like:

![image2](https://rockset.awsworkshop.io/images/Picture2.png)

In [43]:
import boto3
import pandas as pd
import json
import random
from botocore.exceptions import ClientError
from spdynamodb import DynamoTable

s3 = boto3.resource('s3')
sts = boto3.client('sts')

In [None]:
bucket_name = 'rockset-integration-'+str(random.randint(100000,999999))

In [3]:
# Create a bucket S3
try:
    s3.create_bucket(Bucket=bucket_name)
    print(f"Bucket created successfully: {bucket_name}")
except ClientError as e:
    print(f"Error creating bucket: {e}")

Bucket created successfully: rockset-integration7391


In [15]:
dt=DynamoTable()
table_name='RocksetTable'
try:
    dt.select_table(table_name)
    print(dt)
except:
    dt.create_table(
        table_name='table_name',
        partition_key='id',
        partition_key_type='S',
    ) 

- Table name: RocksetTable            
- Table arn: arn:aws:dynamodb:us-east-1:572722647136:table/RocksetTable            
- Table creation: 2023-12-22 08:15:39            
- [{'AttributeName': 'id', 'KeyType': 'HASH'}]            
- [{'AttributeName': 'id', 'AttributeType': 'S'}]            
- Point-in-time recovery status: DISABLED  |  Delete protection: False


In [14]:
df = pd.DataFrame({
    'id': ["1001","1002","1003","1004","1005"],
    'Title': ['The Great Gatsby','To Kill a Mockingbird','1984', 'Pride and Prejudice', 'The Odyssey'],
    'Author': ['F. Scott Fitzgerald','Harper Lee','George Orwell', 'Jane Austen', 'Homer'],
    'Length': [180, 281, 328, 226, 374],
    'Published': [1925, 1960, 1949, 1813, 1922],
    'Publisher': ['Charles Scribner\'s Sons','J.B. Lippincott & Co.','Secker & Warburg', 'Penguin','Scribner']
})
dt.batch_pandas(df)

### Configure AWS IAM Policy to setup integration with [Rockset](https://console.rockset.com/)

In [18]:
# Create a policy
iam = boto3.client('iam')
policy_name = 'RocksetPolicy'

policy = {
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Action": [
          "dynamodb:GetShardIterator",
          "dynamodb:Scan",
          "dynamodb:DescribeStream",
          "dynamodb:DescribeExport",
          "dynamodb:GetRecords",
          "dynamodb:DescribeTable",
          "dynamodb:DescribeContinuousBackups",
          "dynamodb:ExportTableToPointInTime",
          "dynamodb:UpdateTable",
          "dynamodb:UpdateContinuousBackups",
          "s3:PutObject",
          "s3:GetObject",
          "s3:ListBucket"
      ],
      "Resource": [
          f"arn:aws:dynamodb:*:*:table/{table_name}",
          f"arn:aws:dynamodb:*:*:table/{table_name}/stream/*",
          f"arn:aws:dynamodb:*:*:table/{table_name}/export/*",
          f"arn:aws:s3:::{bucket_name}",
          f"arn:aws:s3:::{bucket_name}/*"
      ]
      }
    ]
  }

try:
    response = iam.create_policy(
        PolicyName=policy_name,
        PolicyDocument=json.dumps(policy)
    )
    print(f"Policy created successfully: {policy_name}")
except ClientError as e:
    print(f"Error creating policy: {e}")  

Policy created successfully: RocksetPolicy


In [19]:
# Create a role for Another AWS Account access
role_name = 'RocksetRole'
role_description = 'Allows Rockset to access resources in your AWS account'
rockset_account_id = '318212636800'
rockset_external_id = '4e0e65d62c5b511909afddbe6c904cfffd5097ab6a4266e548620dac4255b889'

try:
    response = iam.create_role(
        RoleName=role_name,
        Description=role_description,
        AssumeRolePolicyDocument=json.dumps({
            "Version": "2012-10-17",
            "Statement": [
              {
                "Effect": "Allow",
                "Principal": {
                  "AWS": f"arn:aws:iam::{rockset_account_id}:root"
                },
                "Action": "sts:AssumeRole",
                "Condition": {
                  "StringEquals": {
                    "sts:ExternalId": rockset_external_id
                  }
                }
              }
            ]
          })
    )
    print(f"Role created successfully: {role_name}")

except ClientError as e:
    print(f"Error creating role: {e}")

Role created successfully: RocksetRole


In [22]:
# Get aws account id
account_id = sts.get_caller_identity()['Account']

# Attach the policy to the role
try:
    response = iam.attach_role_policy(
        RoleName=role_name,
        PolicyArn=f"arn:aws:iam::{account_id}:policy/{policy_name}"
    )
    print(f"Policy attached successfully: {policy_name} to {role_name}")

except ClientError as e:
    print(f"Error attaching policy: {e}")

Policy attached successfully: RocksetPolicy to RocksetRole


In [26]:
# Get the role ARN
role_arn = iam.get_role(RoleName=role_name)['Role']['Arn']
print(f"Role ARN:\n {role_arn}")

Role ARN:
 arn:aws:iam::572722647136:role/RocksetRole


### S3 Integration

Use the role that was previously created.

In [48]:
# Upload a file to S3
file_name = 'IBM.csv'
file_path = './'
s3.Bucket(bucket_name).upload_file(file_path+file_name, file_name)

In [50]:
file_path = f"s3://{bucket_name}/{file_name}"
print(f"File path:\n {file_path}")

File path:
 s3://rockset-integration7391/IBM.csv
