## Interacting with Common AWS services using Python 3 ##

**This notebook will capture how to :**
1. connect to create buckets in s3.
2. Listing buckets in s3
3. How to connect to existing buckets in s3 and read files in them.
4. How to download files from s3 onto local computer
5. How to copy files from one bucket to another
6. Deleting s3 buckets

In [1]:
import boto3
import pandas as pd
from io import StringIO
import csv

### Connecting to s3 ###

In [2]:
s3_resource = boto3.resource('s3')

In [3]:
#list available buckets
for bucket in s3_resource.buckets.all():
    print(bucket.name)

aws-emr-resources-910991713532-us-west-1
aws-logs-910991713532-us-west-1
dataeng-capstone-1
faraz-bucket-a-20200712
faraz-test-bucket-20200712
fk-new-bucket-20200711
sparkify-fk
sparkify-fk3
sparkify-fk4


In [4]:
%%time
#df = pd.read_csv('s3://dataeng-capstone-1/h1b_disclosure_data_2017_2018.dat',sep="|")
s3 = boto3.client('s3')
obj = s3.get_object(Bucket='dataeng-capstone-1', Key='2017_NAICS_Descriptions.xlsx')
df = pd.read_excel(obj['Body'].read())
df = df[['Code','Title']]
df.head()

CPU times: user 228 ms, sys: 11 ms, total: 239 ms
Wall time: 1.16 s


Unnamed: 0,Code,Title
0,11,"Agriculture, Forestry, Fishing and HuntingT"
1,111,Crop ProductionT
2,1111,Oilseed and Grain FarmingT
3,11111,Soybean FarmingT
4,111110,Soybean Farming


In [5]:
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

In [6]:
df.head()

Unnamed: 0,Code,Title
0,11,"Agriculture, Forestry, Fishing and HuntingT"
1,111,Crop ProductionT
2,1111,Oilseed and Grain FarmingT
3,11111,Soybean FarmingT
4,111110,Soybean Farming


In [7]:
%%time
csv_buffer = StringIO()
df.to_csv(csv_buffer, sep="|",index=False)
s3_resource.Object('dataeng-capstone-1', 'clean/naics_codes.dat').put(Body=csv_buffer.getvalue())

CPU times: user 44.2 ms, sys: 4.06 ms, total: 48.3 ms
Wall time: 1.06 s


{'ResponseMetadata': {'RequestId': '77DBEE4E42AF61FE',
  'HostId': 'm7UsZo6TEwR/1EyFVUx28XW3mesgGTVUl7klv2ygURh8GeeMnunJAm2NPAGCM3p/nDcHfPoJL9M=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'm7UsZo6TEwR/1EyFVUx28XW3mesgGTVUl7klv2ygURh8GeeMnunJAm2NPAGCM3p/nDcHfPoJL9M=',
   'x-amz-request-id': '77DBEE4E42AF61FE',
   'date': 'Tue, 20 Oct 2020 22:00:18 GMT',
   'etag': '"4abf7358057752217e710b469165cb71"',
   'content-length': '0',
   'server': 'AmazonS3'},
  'RetryAttempts': 1},
 'ETag': '"4abf7358057752217e710b469165cb71"'}

In [8]:
%%time
#df = pd.read_csv('s3://dataeng-capstone-1/h1b_disclosure_data_2017_2018.dat',sep="|")
s3 = boto3.client('s3')
obj = s3.get_object(Bucket='dataeng-capstone-1', Key='clean/naics_codes.dat')
df = pd.read_csv(obj['Body'],sep="|")
df.head()

CPU times: user 45.7 ms, sys: 5.05 ms, total: 50.7 ms
Wall time: 369 ms


Unnamed: 0,Code,Title
0,11,"Agriculture, Forestry, Fishing and HuntingT"
1,111,Crop ProductionT
2,1111,Oilseed and Grain FarmingT
3,11111,Soybean FarmingT
4,111110,Soybean Farming


In [9]:
df.head(20)

Unnamed: 0,Code,Title
0,11,"Agriculture, Forestry, Fishing and HuntingT"
1,111,Crop ProductionT
2,1111,Oilseed and Grain FarmingT
3,11111,Soybean FarmingT
4,111110,Soybean Farming
5,11112,Oilseed (except Soybean) FarmingT
6,111120,Oilseed (except Soybean) Farming
7,11113,Dry Pea and Bean FarmingT
8,111130,Dry Pea and Bean Farming
9,11114,Wheat FarmingT
