# Your first boto3 client
Sam wants to cast off the shackles of only being able to use her computer for storage and compute. She is learning how to use the awesome power of the cloud to create data pipelines and automatically generate reports.

Before she can do all that, she needs to create her first boto3 client and check out what buckets already exist in S3.

Her AWS key and AWS secret key have been stored in AWS_KEY_ID and AWS_SECRET respectively.

In this exercise, you will help Sam by creating your first boto3 client to AWS!

In [1]:
# Generate the boto3 client for interacting with S3
s3 = boto3.client('s3', region_name='us-east-1', 
                        # Set up AWS credentials 
                        aws_access_key_id=AWS_KEY_ID, 
                         aws_secret_access_key=AWS_SECRET)
# List the buckets
buckets = s3.list_buckets()

# Print the buckets
print(buckets)

NameError: name 'boto3' is not defined

# Multiple clients
Sam knows that she will often have to work with more than one service at once. She wants to practice creating two separate clients for two different services in boto3.

When she is building her workflows, she will make multiple Amazon Web Services interact with each other, with a script executed on her computer.

Her AWS key id and AWS secret have been stored in AWS_KEY_ID and AWS_SECRET respectively.

You will help Sam initialize a boto3 client for S3, and another client for SNS.

She will use the S3 client to list the buckets in S3. She will use the SNS client to list topics she can publish to (you will learn about SNS topics in Chapter 3).

In [2]:
# Generate the boto3 client for interacting with S3 and SNS
s3 = boto3.client('s3', region_name='us-east-1', 
                         aws_access_key_id=AWS_KEY_ID, 
                         aws_secret_access_key=AWS_SECRET)

sns = boto3.client('sns', region_name='us-east-1', 
                        # Set up AWS credentials 
                        aws_access_key_id=AWS_KEY_ID, 
                         aws_secret_access_key=AWS_SECRET)

# List S3 buckets and SNS topics
buckets = s3.list_buckets()
topics = sns.list_topics()

# Print out the list of SNS topics
print(topics)

NameError: name 'boto3' is not defined

# Creating a bucket
Sam is dipping her toes in the water, getting ready to build her first pipeline.

Get It Done is the app the City released for residents to report problems. There are lots of problems to report, and lots of data gets generated.

She will be picking up daily reports generated by Get It Done and placing them in the 'gim-staging' bucket. Then, she will clean the data and place the new dataset in the 'gim-processed' bucket.

She also wants to create a 'gim-test' bucket to experiment with.

Help Sam take the first step to her pipeline dreams. Help her create her first bucket, 'gim-staging'!

In [3]:
import boto3

# Create boto3 client to S3
s3 = boto3.client('s3', region_name='us-east-1', 
                         aws_access_key_id=AWS_KEY_ID, 
                         aws_secret_access_key=AWS_SECRET)

# Create the buckets
response_staging = s3.create_bucket(Bucket='gim-staging')
response_processed = s3.create_bucket(Bucket='gim-processed')
response_test = s3.create_bucket(Bucket='gim-test')

# Print out the response
print(response_staging)

ModuleNotFoundError: No module named 'boto3'

One small line of code, one giant step for your cloud knowledge. You just created your first buckets in the cloud! Make it rain!

# Listing buckets
Sam has successfully created the buckets for her pipeline. Often, data engineers build in checks into the pipeline to make sure their previous operation succeeded. Sam wants to build in a check to make sure her buckets actually got created.

She also wants to practice listing buckets. Listing buckets will let her perform operations on multiple buckets using a for loop.

She has already created the boto3 client for S3, and assigned it to the s3 variable.

Help Sam get a list of all the buckets in her S3 account and print their names!



In [4]:
# Get the list_buckets response
response = s3.list_buckets()

# Iterate over Buckets from .list_buckets() response
for bucket in response['Buckets']:
  
  	# Print the Name for each bucket
    print(bucket['Name'])

NameError: name 's3' is not defined

You have just listed your buckets with Python. This is key to performing operations on multiple buckets!

# Deleting a bucket
Sam is feeling more and more confident in her AWS and S3 skills. After playing around for a bit, she decides that the gim-test bucket no longer fits her pipeline and wants to delete it. It's starting to feel like dead weight, and Sam doesn't want it littering her beautiful bucket list.

She has already created the boto3 client for S3, and assigned it to the s3 variable.

Help Sam do some clean up, and delete the gim-test bucket.

In [5]:
# Delete the gim-test bucket
s3.delete_bucket(Bucket='gim-test')

# Get the list_buckets response
response = s3.list_buckets()

# Print each Buckets Name
for bucket in response['Buckets']:
    print(bucket['Name'])


NameError: name 's3' is not defined

# Deleting multiple buckets
The Get It Done app used to be called Get It Made. Sam always thought it was a terrible name, but it got stuck in her head nonetheless.

When she was making the pipeline buckets, she used the gim- abbreviation for the old name. She decides to switch her abbreviation to gid- to accurately reflect the app's real (and better) name.

She has already set up the boto3 S3 client and assigned it to the s3 variable.

Help Sam delete all the buckets in her account that start with the gim- prefix. Then, help her make a 'gid-staging' and a 'gid-processed' bucket.

In [6]:
# Get the list_buckets response
response = s3.list_buckets()

# Delete all the buckets with 'gim', create replacements.
for bucket in response['Buckets']:
  if 'gim' in bucket['Name']:
      s3.delete_bucket(Bucket=bucket['Name'])
    
s3.create_bucket(Bucket='gid-staging')
s3.create_bucket(Bucket='gid-processed')
  
# Print bucket listing after deletion
response = s3.list_buckets()
for bucket in response['Buckets']:
    print(bucket['Name'])

NameError: name 's3' is not defined

Excellent! You just did your first recursive bucket operation. You can combine loops and client calls to perform operations on many buckets at once. Just be careful and don't delete something you need.

# Putting files in the cloud
Now that Sam knows how to create buckets, she is ready to automate a tedious part of her job. Right now, she has to download the latest files from the City of San Diego Open Data Portal, aggregate them, and share them with management.

Sharing an analysis with others is a common, yet tedious data science task. Automating these steps will allow Sam to focus on cooler projects, while keeping her management happy.

In the last lesson, Sam has already created the gid-staging bucket. She has already downloaded the files from the URLs, analyzed them, and wrote the results to final_report.csv.

She has also already initialized the boto3 S3 client and assigned it to the s3 variable.

Help Sam upload final_report.csv to the gid-staging bucket!

In [7]:
# Upload final_report.csv to gid-staging
s3.upload_file(Bucket='gid-staging',
              # Set filename and key
               Filename ='final_report.csv', 
               Key='2019/final_report_01_01.csv')

# Get object metadata and print it
response = s3.head_object(Bucket='gid-staging', 
                       Key='2019/final_report_01_01.csv')

# Print the size of the uploaded object
print(response['ContentLength'])

NameError: name 's3' is not defined

Excellent! You have successfully uploaded your first file to S3! This is a big day - your first cloud file!

# Spring cleaning
Sam's pipeline has been running for a long time now. Since the beginning of 2018, her automated system has been diligently uploading her report to the gid-staging bucket.

In City governments, record retention is a huge issue, and many government officials prefer not to keep records in existence past the mandated retention dates.

As time has passed, the City Council asked Sam to clean out old CSV files from previous years that have passed the retention period. 2018 is safe to delete.

Sam has initialized the client and assigned it to the s3 variable. Help her clean out all records for 2018 from S3!

In [8]:
# List only objects that start with '2018/final_'
response = s3.list_objects(Bucket='gid-staging', 
                           Prefix='2018/final_')

# Iterate over the objects
if 'Contents' in response:
  for obj in response['Contents']:
      # Delete the object
      s3.delete_object(Bucket='gid-staging', Key=obj['Key'])

# Print the remaining objects in the bucket
response = s3.list_objects(Bucket='gid-staging')

for obj in response['Contents']:
  	print(obj['Key'])

NameError: name 's3' is not defined