# Introduction
<hr style = "border:2px solid black" ></hr>


**What?** How to interacte to an AWS S3 bucket



# What is AWS S3 and boto3
<hr style = "border:2px solid black" ></hr>


- S3 stands for Simple Storage Service. Amazon S3 is a key-value, object-based storage service.
- `boto3` is nothing more than a python library used to interact (create/upload/download and more) with an AWS S3 bucket.



# Import modules
<hr style = "border:2px solid black" ></hr>


- To install `boto3` and `s3fs` simply use:
    - `pip install boto3`
    - `pip install s3fs`



In [1]:
import boto3
import pandas as pd
import s3fs
import os

# Create a bucket via AWS console
<hr style = "border:2px solid black" ></hr>


- You can create a bucket in two ways:
    - Log into your AWS console and created a S3 bucket from there. This is what you are supposed to this in this tutorial.
    - Crate a bucket via boto3. This is not done here. Please see this other [notebook](https://github.com/kyaiooiayk/MLOps-Machine-Learning-Operations/blob/master/tutorials/AWS/AWS_Lambda/Tutorial_%231/tutorial_1_files/Step_2_Deploying%20k-NN%20model%20on%20AWS.ipynb) to know how to do it.
- Information such as `aws_access_key_id` and `aws_secret_access_key` can be obtained from the Identity and Access Management (IAM) in your AWS console. If you have done it before, you probably have saved this information locally in a .csv file
    


# Conncet to the S3 bucket
<hr style = "border:2px solid black" ></hr>

In [2]:
s3 = boto3.resource(
    service_name='s3',
    region_name='eu-west-2',
    aws_access_key_id='',
    aws_secret_access_key=''
)


- Alternatively, you can add your variable to your urrent environment once which then simplify your s3 conncetion via boto3.
- Please keep in mind that both `aws_access_key_id` and `aws_secret_access_key` should not shared! They are here reported just to remind you how the look like. The use associated to them no longer exists.



In [3]:
os.environ["AWS_DEFAULT_REGION"] = 'eu-west-2'
os.environ["AWS_ACCESS_KEY_ID"] = ''
os.environ["AWS_SECRET_ACCESS_KEY"] = ''

s3 = boto3.resource(
    service_name='s3'
)

# Inspect what buckets are available
<hr style = "border:2px solid black" ></hr>


- The bucket showns in the print out below are those that are available to a specific user whose `aws_access_key_id` and `aws_secrete_access_key` have been provided.
    


In [4]:
# Print out bucket names
for bucket in s3.buckets.all():
    print(bucket.name)

aws-s3-26


# Create a syntetic dataframe
<hr style = "border:2px solid black" ></hr>

In [5]:
# Make dataframes
foo = pd.DataFrame({'x': [1, 2, 3], 'y': ['a', 'b', 'c']})
bar = pd.DataFrame({'x': [10, 20, 30], 'y': ['aa', 'bb', 'cc']})

# Save locally to csv
foo.to_csv('foo.csv')
bar.to_csv('bar.csv')

In [6]:
foo.head()

Unnamed: 0,x,y
0,1,a
1,2,b
2,3,c


In [7]:
bar.head()

Unnamed: 0,x,y
0,10,aa
1,20,bb
2,30,cc


In [8]:
# Check the file were dumped
!ls 

How to interacte to an AWS S3 bucket.ipynb
bar.csv
bar1.csv
foo.csv
foo1.csv


# Upload files to S3 bucket
<hr style = "border:2px solid black" ></hr>


- `aws-s3-26` is our bucket.
- `foo.csv` and `bar.csv` are the files we want to upload to the S3 bucket.



In [9]:
s3.Bucket('aws-s3-26').upload_file(Filename='foo.csv', Key='foo.csv')
s3.Bucket('aws-s3-26').upload_file(Filename='bar.csv', Key='bar.csv')

In [10]:
# Verifying tfhe files were successfully uploaded
for obj in s3.Bucket('aws-s3-26').objects.all():
    print(obj)

s3.ObjectSummary(bucket_name='aws-s3-26', key='bar.csv')
s3.ObjectSummary(bucket_name='aws-s3-26', key='foo.csv')


# Retrieve file from S3 bucket
<hr style = "border:2px solid black" ></hr>

In [11]:
obj = s3.Bucket('aws-s3-26').Object('foo.csv').get()
foo = pd.read_csv(obj['Body'], index_col=0)

In [12]:
obj = s3.Bucket('aws-s3-26').Object('bar.csv').get()
bar = pd.read_csv(obj['Body'], index_col=0)

In [13]:
foo.head()

Unnamed: 0,x,y
0,1,a
1,2,b
2,3,c


In [14]:
bar.head()

Unnamed: 0,x,y
0,10,aa
1,20,bb
2,30,cc


# Download from S3 and dump locally
<hr style = "border:2px solid black" ></hr>

In [15]:
s3.Bucket('aws-s3-26').download_file(Key='foo.csv', Filename='foo1.csv')
s3.Bucket('aws-s3-26').download_file(Key='bar.csv', Filename='bar1.csv')

In [16]:
pd.read_csv('foo1.csv', index_col=0)

Unnamed: 0,x,y
0,1,a
1,2,b
2,3,c


In [17]:
pd.read_csv('bar1.csv', index_col=0)

Unnamed: 0,x,y
0,10,aa
1,20,bb
2,30,cc


# References
<hr style = "border:2px solid black" ></hr>


- [YouTube video | Tutorial 1- Cloud Computing-AWS-Introduction To S3(Simple Storage Services)](https://www.youtube.com/watch?v=G3adspFQ59I)
- [YouTube video | Tutorial 3- Deployment Of ML Models In AWS EC2 Instance](https://www.youtube.com/watch?v=JKlOlDFwsao)
- [GitHub code](https://github.com/krishnaik06/AWS/blob/main/boto3%20read%20S3.ipynb)

