As part of this topic we will get an overview of DynamoDB. We will create table and load data into the table.

## Create Dynamodb Table

Here are the steps you need to follow to create DynamoDB table.
* Go to AWS Web Console
* Go to tables and click on **Create table**
* You need to enter following details.
  * Table name - emails
  * Primary key (Hash) - email_id
  * We can also create composite Primary key with hash and then sort.
* For demo purpose we will try to upload the data from Google Sheet to Dynamo DB Table.
* Here is the table structure we are going to use. As Dynamodb tables does not have predefined schemas we will not specify columns while creating the tables.
  * Email Id (Primary Key)
  * First Name
  * Last Name
  * Forms Filled (list)

In [49]:
%%sh

aws dynamodb list-tables

{
    "TableNames": [
        "emails",
        "posts"
    ]
}


In [None]:
%%sh

aws dynamodb delete-table \
    --table-name posts

In [3]:
%%sh

aws dynamodb create-table \
    --table-name posts \
    --attribute-definitions AttributeName=content_url,AttributeType=S \
    --key-schema AttributeName=content_url,KeyType=HASH \
    --billing-mode PAY_PER_REQUEST

{
    "TableDescription": {
        "AttributeDefinitions": [
            {
                "AttributeName": "content_url",
                "AttributeType": "S"
            }
        ],
        "TableName": "posts",
        "KeySchema": [
            {
                "AttributeName": "content_url",
                "KeyType": "HASH"
            }
        ],
        "TableStatus": "CREATING",
        "CreationDateTime": "2021-01-02T19:43:44.332000+05:30",
        "ProvisionedThroughput": {
            "NumberOfDecreasesToday": 0,
            "ReadCapacityUnits": 0,
            "WriteCapacityUnits": 0
        },
        "TableSizeBytes": 0,
        "ItemCount": 0,
        "TableArn": "arn:aws:dynamodb:us-east-1:582845781536:table/posts",
        "TableId": "3a7ce881-9898-479b-b613-843229628007",
        "BillingModeSummary": {
            "BillingMode": "PAY_PER_REQUEST"
        }
    }
}


In [51]:
%%sh

aws dynamodb \
    describe-table \
    --table-name posts

{
    "Table": {
        "AttributeDefinitions": [
            {
                "AttributeName": "content_url",
                "AttributeType": "S"
            }
        ],
        "TableName": "posts",
        "KeySchema": [
            {
                "AttributeName": "content_url",
                "KeyType": "HASH"
            }
        ],
        "TableStatus": "ACTIVE",
        "CreationDateTime": "2021-01-02T19:43:44.332000+05:30",
        "ProvisionedThroughput": {
            "NumberOfDecreasesToday": 0,
            "ReadCapacityUnits": 0,
            "WriteCapacityUnits": 0
        },
        "TableSizeBytes": 0,
        "ItemCount": 0,
        "TableArn": "arn:aws:dynamodb:us-east-1:582845781536:table/posts",
        "TableId": "3a7ce881-9898-479b-b613-843229628007",
        "BillingModeSummary": {
            "BillingMode": "PAY_PER_REQUEST",
            "LastUpdateToPayPerRequestDateTime": "2021-01-02T19:43:44.332000+05:30"
        }
    }
}


## Inserts using Web Console

Here are the steps you can follow to insert items manually using AWS Web Console.
* Go to Dynamodb Dashboard
* Click on the table
* Go to Items
* Click on Create item
* Choose Tree (default) or Text
* You can enter JSON directly by clicking on Text.

Here is the example for insert using CLI command`aws dynamodb put-item`. Even though we can take care of updates (conditional put items) using CLI, it is a bit cumbersome.

In [10]:
{
    "content_url": {
        "S": "https://example.com"
    },
    "content_title": {
        "S": "Some Title"
    },
    "contents": {
        "S": "<h1>Hello World</h1>"
    },
    "post_details": {
        "M": {
            "post_id": {
                "N": "10"
            }, 
            "post_link": {
                "S": "https://post.com/some-title"
            }
        }
    }
}

{'content_url': {'S': 'https://example.com'},
 'content_title': {'S': 'Some Title'},
 'contents': {'S': '<h1>Hello World</h1>'},
 'post_details': {'M': {'post_id': {'N': 10},
   'post_link': {'S': 'https://post.com/some-title'}}}}

In [None]:
%%sh

aws dynamodb put-item \
              --table-name posts \
              --item file://post_item.json \
              --return-consumed-capacity TOTAL \
              --return-item-collection-metrics SIZE

In [11]:
%%sh

aws dynamodb get-item \
              --table-name posts \
              --key '{"content_url": {"S": "https://example.com"}}' \

{
    "Item": {
        "contents": {
            "S": "<h1>Hello World</h1>"
        },
        "content_title": {
            "S": "Some Title"
        },
        "post_details": {
            "M": {
                "post_id": {
                    "N": "10"
                },
                "post_link": {
                    "S": "https://post.com/some-title"
                }
            }
        },
        "content_url": {
            "S": "https://example.com"
        }
    }
}


## CRUD Operations - Prerequisites

Let us understand the prerequisites to perform CRUD operations on Dynamodb tables.
* We need to ensure boto library is installed - pip install boto3.
* We need to import boto3 and then follow these steps to insert an item to dynamodb table.
  * Create boto3 resource using dynamodb. Let's name it as dynamodb.
  * Once resource is created, we need to create table object invoking `Table`.
  * Using the table object, we should be able to perform CRUD Operations such as create or insert, read or query, update as well as delete.

In [18]:
!pip install boto3

Collecting boto3
  Downloading boto3-1.16.40-py2.py3-none-any.whl (130 kB)
[K     |████████████████████████████████| 130 kB 3.2 MB/s eta 0:00:01
[?25hCollecting botocore<1.20.0,>=1.19.40
  Downloading botocore-1.19.40-py2.py3-none-any.whl (7.1 MB)
[K     |████████████████████████████████| 7.1 MB 9.9 MB/s eta 0:00:01
[?25hCollecting s3transfer<0.4.0,>=0.3.0
  Using cached s3transfer-0.3.3-py2.py3-none-any.whl (69 kB)
Collecting jmespath<1.0.0,>=0.7.1
  Using cached jmespath-0.10.0-py2.py3-none-any.whl (24 kB)
Installing collected packages: jmespath, botocore, s3transfer, boto3
Successfully installed boto3-1.16.40 botocore-1.19.40 jmespath-0.10.0 s3transfer-0.3.3
You should consider upgrading via the '/Users/itversity/Projects/Internal/bootcamp/itversity-material/data-engineering-on-aws/deaws-env/bin/python3.7 -m pip install --upgrade pip' command.[0m


In [12]:
import boto3

In [13]:
dynamodb = boto3.resource('dynamodb')

In [14]:
type(dynamodb)

boto3.resources.factory.dynamodb.ServiceResource

In [15]:
posts = dynamodb.Table('posts')

In [16]:
type(posts)

boto3.resources.factory.dynamodb.Table

## Dynamodb put using Python

Let us take care of inserting data or items into dynamodb table using Python as Programming language. We need to use `put_item` on the dynamodb table.
* We need to import boto3 and then follow these steps to insert an item to dynamodb table.
  * Create boto3 resource using dynamodb. Let's name it as **dynamodb**.
  * Using resource **dynamodb** we can create table object by invoking `Table` - `emails = dynamodb.Table`.
  * Create JSON object
  * Invoke `put_item` on `emails` to insert the record
* We can read the entire table using `scan` to validate whether the record is inserted or not.

In [17]:
import boto3

In [18]:
dynamodb = boto3.resource('dynamodb')

In [19]:
posts = dynamodb.Table('posts')

In [None]:
# emails.put_item?

In [None]:
{
    "content_url": {
        "S": "https://example.com"
    },
    "content_title": {
        "S": "Some Title"
    },
    "contents": {
        "S": "<h1>Hello World</h1>"
    },
    "post_details": {
        "M": {
            "post_id": {
                "N": "10"
            }, 
            "post_link": {
                "S": "https://post.com/some-title"
            }
        }
    }
}

In [21]:
post_item = {
    'content_url': 'https://example.com/index.html',
    'content_title': 'Index Title',
    'contents': '<h1>Another Example</h1>',
    'post_details': {"post_id": "11", "post_link": "https://post.com/index-title"}
}

In [22]:
response = posts.put_item(Item=post_item)

In [23]:
response

{'ResponseMetadata': {'RequestId': '85SVO1C9IDBD0D7TM5ES61LBSFVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Sat, 02 Jan 2021 16:26:07 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '2',
   'connection': 'keep-alive',
   'x-amzn-requestid': '85SVO1C9IDBD0D7TM5ES61LBSFVV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '2745614147'},
  'RetryAttempts': 0}}

In [24]:
posts_all = posts.scan()

In [26]:
posts_all

{'Items': [{'contents': '<h1>Another Example</h1>',
   'content_title': 'Index Title',
   'post_details': {'post_id': '11',
    'post_link': 'https://post.com/index-title'},
   'content_url': 'https://example.com/index.html'},
  {'contents': '<h1>Hello World</h1>',
   'content_title': 'Some Title',
   'post_details': {'post_id': Decimal('10'),
    'post_link': 'https://post.com/some-title'},
   'content_url': 'https://example.com'}],
 'Count': 2,
 'ScannedCount': 2,
 'ResponseMetadata': {'RequestId': 'M9QS453C85AH815S7RV631UT8FVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Sat, 02 Jan 2021 16:26:23 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '477',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'M9QS453C85AH815S7RV631UT8FVV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '963829506'},
  'RetryAttempts': 0}}

In [28]:
posts_all['Count']

2

## Dynamodb get using Python

Let us take care of querying data from dynamodb table using Python as Programming language. We need to use `get_item` on the dynamodb table.
* We need to import boto3 and then follow these steps to get an item to dynamodb table.
  * Create boto3 resource using dynamodb. Let's name it as **dynamodb**.
  * Using resource **dynamodb** we can create table object by invoking `Table` - `emails = dynamodb.Table`.
  * Invoke `get_item` on `emails` to get the record. We need to pass a dict to get or query the item based up on the key

In [29]:
import boto3

In [30]:
dynamodb = boto3.resource('dynamodb')

In [31]:
posts = dynamodb.Table('posts')

In [None]:
# emails.get_item?

In [37]:
response = posts.get_item(Key={'content_url': 'https://example.com'})

In [38]:
response

{'Item': {'contents': '<h1>Hello World</h1>',
  'content_title': 'Some Title',
  'post_details': {'post_id': Decimal('10'),
   'post_link': 'https://post.com/some-title'},
  'content_url': 'https://example.com'},
 'ResponseMetadata': {'RequestId': 'N6ATKLIULVK0FRI7OPTJJJSLI7VV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Sat, 02 Jan 2021 16:28:11 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '219',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'N6ATKLIULVK0FRI7OPTJJJSLI7VV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '1775742202'},
  'RetryAttempts': 0}}

In [34]:
type(response)

dict

In [39]:
response['Item']

{'contents': '<h1>Hello World</h1>',
 'content_title': 'Some Title',
 'post_details': {'post_id': Decimal('10'),
  'post_link': 'https://post.com/some-title'},
 'content_url': 'https://example.com'}

In [40]:
posts.scan()['Count']

2

## Dynamodb delete using Python

Let us take care of deleting data from dynamodb table using Python as Programming language. We need to use `delete_item` on the dynamodb table.
* We need to import boto3 and then follow these steps to get an item to dynamodb table.
  * Create boto3 resource using dynamodb. Let's name it as **dynamodb**.
  * Using resource **dynamodb** we can create table object by invoking `Table` - `emails = dynamodb.Table`.
  * Create JSON object
  * Invoke `delete_item` on `emails` to get the record. We need to pass a dict to get or query the item based up on the key

In [41]:
import boto3

In [42]:
dynamodb = boto3.resource('dynamodb')

In [43]:
posts = dynamodb.Table('posts')

In [None]:
posts.delete_item?

In [45]:
response = posts.delete_item(Key={'content_url': 'https://example.com'})

In [46]:
response

{'ResponseMetadata': {'RequestId': '4GS52MEP8713VSHBEJB5F7J0JJVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Sat, 02 Jan 2021 16:29:21 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '2',
   'connection': 'keep-alive',
   'x-amzn-requestid': '4GS52MEP8713VSHBEJB5F7J0JJVV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '2745614147'},
  'RetryAttempts': 0}}

In [48]:
posts.scan()['Count']

1