## Today's Lab Agenda: 

1. cd to labs/week_5
2. connect to AWS using terminal 
3. set up DynamoDB
4. create Lambda Funtion to solve a problem
5. hints for the Assignment5 

### Connect to AWS: 

1. start AWS lab   
2. Terminal Command:   
cd ~/.aws/  
3. Change credential:   
cat >credentials  (or use nano if you like)
4. Control^D to exit  

Boto3 would work now

### Set up DynamoDB

- DynamoDB is a fully managed NoSQL database service provided by AWS  
  
Let's create a DynamoDB table to collect and stream Twitter (now X) data in our database. 

- create Dynamo DB 
- put_item()
- get_item()
- update_item()
- delete_item()

- In order to find the item, each items needs a primary key. (1.partition key 2. partition key + sort key)
We'll use Twitter 'username' as our primary key here, since this will be unique to each user and will make for a good input for DynamoDB's hash function (you can also specify a sort key if you would like, though).

** We'll also set our Read and Write Capacity down to the minimum for this demo, but you can [scale this up](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html) if you need more throughput for your application (just be careful, as increasing your Read/Write Capacity too far will rapidly deplete your AWS credits). 

In [4]:
import boto3

# Instantiate clients
aws_lambda = boto3.client('lambda')
iam_client = boto3.client('iam')
role = iam_client.get_role(RoleName='LabRole')

In [5]:
dynamodb = boto3.resource('dynamodb')
table = dynamodb.create_table(
    TableName='twitter',
    KeySchema=[
        {
            'AttributeName': 'username',
            'KeyType': 'HASH'
        }
    ],
    AttributeDefinitions=[
        {
            'AttributeName': 'username',
            'AttributeType': 'S'
        }
    ],
    ProvisionedThroughput={
        'ReadCapacityUnits': 1,
        'WriteCapacityUnits': 1
    }
)

# Wait until AWS confirms that table exists before moving on
table.meta.client.get_waiter('table_exists').wait(TableName='twitter')

# get data about table (should currently be no items in table)
print(table.item_count)
print(table.creation_date_time)

0
2025-04-24 18:19:38.217000-05:00


OK, so we currently have an empty DynamoDB table. Let's actually put some items into our table:

In [6]:
table.put_item(
   Item={
        'username': 'macs30113',
        'num_followers': 100,
        'num_tweets': 5
    }
)

table.put_item(
   Item={
        'username': 'jon_c',
        'num_followers': 10,
        'num_tweets': 0
    }
)

{'ResponseMetadata': {'RequestId': 'E8L6J1CI38M6QHPN8QJQVIFENNVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Thu, 24 Apr 2025 23:19:58 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '2',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'E8L6J1CI38M6QHPN8QJQVIFENNVV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '2745614147'},
  'RetryAttempts': 0}}

We can then easily get items from our table using the `get_item` method and providing our key:

In [None]:
# try to retrive an item you just added

{'num_tweets': Decimal('5'), 'num_followers': Decimal('100'), 'username': 'macs30113'}


We can also update existing items using the `update_item` method:

In [None]:
# update an value in the table

{'ResponseMetadata': {'RequestId': 'MI4AJJ2G7OMN7NTTKST9KVBF5BVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Thu, 24 Apr 2025 23:19:58 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '2',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'MI4AJJ2G7OMN7NTTKST9KVBF5BVV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '2745614147'},
  'RetryAttempts': 0}}

Then, if we take a look again at this item, we'll see that it's been updated (note, though, that DynamoDB tables are [*eventually consistent* unless we specify otherwise](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadConsistency.html), so this might not always return the expected result immediately):

In [None]:
# check the value of the item you just updated
response = table.get_item(
    Key={
        'username': 'your_key
    }
)
item = response['Item']
print(item) # num_tweets should now be 6

{'num_followers': Decimal('100'), 'num_tweets': Decimal('6'), 'username': 'macs30113'}


### Create Lambda Function for this case: 

Supposed we wanted to gather data, perform pre-processing steps, and then enter into our database -- all in the cloud. To do this, we can use `boto3` to access our DynamoDB database from within other AWS resources (such as Lambda or EC2). For instance, let's create a Lambda function that will process some data (username, as well raw follower and tweet data) and enter the results of this processing into our database without ever leaving the AWS cloud (see zipped Lambda deployment package in this directory):

In [None]:
# create Lambda client
aws_lambda = boto3.client('lambda')

# Access our class IAM role, which allows Lambda
# to interact with other AWS resources
iam_client = boto3.client('iam')
role = iam_client.get_role(RoleName='LabRole')

# Open our Zipped directory
with open('write_to_dynamodb.zip', 'rb') as f:
    lambda_zip = f.read()

try:
    # If function hasn't yet been created, create it
    response = aws_lambda.create_function(
        FunctionName='your_own_name', # whatever you like as long as you use the same name later 
        Runtime='python3.9',
        Role=role['Role']['Arn'],
        Handler='lambda_function.lambda_handler', # Name of the file (lambda_function) and function(lambda_handler)
        Code=dict(ZipFile=lambda_zip),
        Timeout=3
    )
except aws_lambda.exceptions.ResourceConflictException:
    # If function already exists, update it based on zip
    # file contents
    response = aws_lambda.update_function_code(
    FunctionName='your_own_name',
    ZipFile=lambda_zip
    )

In [None]:
# try to use your lambda function to dump to the table
import json
user1 = {
        "username": "jake_1",
        "followers": ["sally", "jim", "jane"],
        "tweets": ["this is fun!", "Let's tweet some more."]
    }
import json
## invoke the lambda function


Response: {'StatusCode': 200}


In [None]:
# Retrieve item by primary key
response = table.get_item(
    Key={'username': 'jake_1'}
)
item = response['Item']
print(item) # num_tweets should now be 6

{'num_tweets': Decimal('2'), 'num_followers': Decimal('3'), 'username': 'jake_'}


### Hints for the Assignment5 
- The workflow is to set up a Lambda function that processes survey submissions by storing the raw JSON payload in an S3 data lake and updating participant records in DynamoDB. 
- this data pipeline can automatically stores and organizes user-submitted survey data—raw inputs. S3 is used for long-term storage and auditing, while structured summaries are stored in DynamoDB for fast querying and analysis.
- This structure enables efficient data management and real-time feedback without managing infrastructure.


In [None]:
## set up S3 bucket

s3 = boto3.client('s3') 
## pay attention here: if you want to put items into your bucket, 
## you need to use the s3 client, not the resource

bucket = 'your-bucket-name' # replace with your bucket name 
s3.create_bucket(Bucket=bucket)
print("S3 Bucket created")

In [14]:
# Create tweet data
tweet_data = {
    "username": "joe_bloggs",
    "followers": ["lily", "jim", "susan", "bob"],
    "tweets": ["hello"]
}

key = "tweets/joe_bloggs.json" # S3 key (path) for the objec
s3.put_object(Body=json.dumps(tweet_data), Bucket=bucket, Key=key)

{'ResponseMetadata': {'RequestId': 'EMYY92CCQSJB52EF',
  'HostId': 'Et1AItBR23BkVWbrfEhebsCsUOJfxO5HS8pPMrL5t/U4UVK8cA6pUcukWFkehgkVplb3g7XvCV8=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'Et1AItBR23BkVWbrfEhebsCsUOJfxO5HS8pPMrL5t/U4UVK8cA6pUcukWFkehgkVplb3g7XvCV8=',
   'x-amz-request-id': 'EMYY92CCQSJB52EF',
   'date': 'Thu, 24 Apr 2025 23:20:01 GMT',
   'x-amz-server-side-encryption': 'AES256',
   'etag': '"944697dde17e63dab300164aa70eb2a9"',
   'x-amz-checksum-crc64nvme': 'PXpC8YsKK6M=',
   'x-amz-checksum-type': 'FULL_OBJECT',
   'content-length': '0',
   'server': 'AmazonS3'},
  'RetryAttempts': 0},
 'ETag': '"944697dde17e63dab300164aa70eb2a9"',
 'ServerSideEncryption': 'AES256'}

In [22]:
s3 = boto3.resource('s3') # need to use resource to list objects
print("Objects in bucket:")
bucket = s3.Bucket('twitter-123')
for obj in bucket.objects.all():
    print(" -", obj.key)

Objects in bucket:
 - tweets/joe_bloggs.json



### Finally, you should make sure to delete your table (if you no longer plan to use it), so that you do not incur further charges while it is running:

In [16]:
table.delete()

{'TableDescription': {'TableName': 'twitter',
  'TableStatus': 'DELETING',
  'ProvisionedThroughput': {'NumberOfDecreasesToday': 0,
   'ReadCapacityUnits': 1,
   'WriteCapacityUnits': 1},
  'TableSizeBytes': 0,
  'ItemCount': 0,
  'TableArn': 'arn:aws:dynamodb:us-east-1:211125736120:table/twitter',
  'TableId': '90623826-7072-478f-af83-de9a089acecb',
  'DeletionProtectionEnabled': False},
 'ResponseMetadata': {'RequestId': 'I63VI3MOHL8UU8CJ8EGHG1BG1NVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Thu, 24 Apr 2025 23:20:01 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '350',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'I63VI3MOHL8UU8CJ8EGHG1BG1NVV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '3834480598'},
  'RetryAttempts': 0}}

In [17]:
# Delete Lambda if it still exists:
try:
    aws_lambda.delete_function(FunctionName='write_to_dynamodb')
    print("Lambda Function Deleted")
except aws_lambda.exceptions.ResourceNotFoundException:
    print("AWS Lambda Function Already Deleted")

Lambda Function Deleted


In [None]:
s3 = boto3.client('s3')
bucket = "your_bucket_name" # replace with your bucket name
# S3 -- note you only need to delete objects, not entire bucket in your final assignment
try:
    response = s3.list_objects(Bucket=bucket)
    if 'Contents' in response:
        for item in response['Contents']:
            s3.delete_object(Bucket=bucket, Key=item['Key'])
    s3.delete_bucket(Bucket=bucket)
    print("S3 Bucket Deleted")
except s3.exceptions.NoSuchBucket:
    print("S3 Bucket Already Deleted")

S3 Bucket Deleted
