<a href="https://colab.research.google.com/github/futureCodersSE/python-cyber/blob/main/Cloud/Reading_from_and_saving_to_S3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Connecting to AWS Simple Storage System (S3)
---
To connect to, read files from and write files to S3 you will need the following:



*   Get an access key (which you should keep hidden)  
*   install the python library `boto3`   
*   write a function get an S3 connection
*   read or write a file as you need to



### Use this code cell to install boto3 for use in this worksheet
---
Use this code to do this:
`!pip install boto3`

***Note 1***:  you will need to do this each time you come back to the worksheet.  If someone else copies your worksheet, they will need to install it.

***Note 2***: once you have installed it in a session, you won't need to install it again, so put the code in a cell that you only run once.

In [None]:
!pip install boto3

### Save your access key, and secret key in an environment variable
---

During testing you will save the keys in environment variables here.  When you create a function in lambda you will use the environment variables there.

1.  In the AWS console - go to the IAM service.  Follow instructions here to create your access keys: https://docs.google.com/document/d/1_FhKLVLSaBdck1e-Pm4mlUQkIj9BGwPfuquMPHpXdls/edit?usp=sharing   
2.  Once you have downloaded the keys in a CSV so that you have a permanent record to copy and paste from (kept on your own device, or in your own cloud storage if you can't store on the device), you will be able to use them to connect to S3.

3.  Create a bucket (a folder) to store your files.  Follow the instructions here: https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html  (there is quite a bit of information about changing setting, leave everything as the default for this exercise)  

3.  Use the code cell below to allow you to input the public key (AWS_ACCESS_KEY), the private key (AWS_SECRET_ACCESS_KEY), and the bucket name,  and save all three in environment variables only available in this worksheet while you are using it.

***Note 2***:  you will need to do this each time you come back to the worksheet.  If someone else copies your worksheet, they should not be able to upload to your S3 as they won't know the keys and they won't know the bucket URL (as you will also save that in an environment variable)

**Environment variables**  
When it is important not to disclose the value of certain data, you can 'decouple' it from the code by storing it in the operating system environment.   This is done, in Python, using os.environ[] to set a variable and os.environ.get() to get the value of a variable.  

The data is stored alongside the operating system's data so it is not visible to anyone reading the python code.  

Here, you will save the environment variables in the colab's operating system.
On a local device, you save them in your operating system, or in a virutal environment that you create.
On AWS Lambda, there is a space for creating environment variables, which are then used in the same way.

In [7]:
import os
from IPython.display import clear_output

def set_environment_variable_values():
  ACCESS_KEY = input("Please enter the AWS access key: ")
  SECRET_ACCESS_KEY = input("Please enter the AWS secret access key: ")
  BUCKET_NAME = input("Please enter the name of the bucket in S3: ")
  os.environ['ACCESS_KEY'] = ACCESS_KEY
  os.environ['SECRET_ACCESS_KEY'] = SECRET_ACCESS_KEY
  os.environ['BUCKET_NAME'] = BUCKET_NAME
  clear_output()
  return None

set_environment_variable_values()


### Create a connection to the S3 bucket
---

In order to work with files in the bucket you will need a 'client'.  This will be the worker that will do the fetching and storing.  The code below will set up this client and the output will show that a client has been created.

In [None]:
import boto3

def get_S3_client():
	resource = boto3.client(
     "s3",
		aws_access_key_id = os.environ.get('ACCESS_KEY'),
		aws_secret_access_key = os.environ.get('SECRET_ACCESS_KEY')
	)
	return resource

s3_client = get_S3_client()
print(s3_client)

### Opening a file from S3
---

You can upload this file to your bucket, through the AWS console.  

*  Download the file (population.csv) from here:  https://drive.google.com/file/d/1Mj2f56YrgWL6eYUF9zOf0Pph8NhJAIe0/view?usp=sharing

*  Open the AWS dashboard and select S3 as the service.  

*  Find your bucket and click on its link to open it.  

*  Click on **upload**, select the file and upload

Now that the file is in the bucket, use the code below to open it.

In [None]:
import io

def get_file(filename):
  # get the file from the bucket
  file_object = s3_client.get_object(Bucket=os.environ.get('BUCKET_NAME'), Key=filename)

  # convert the file object to a text-based csv file then read the file contents into a table using the pandas read_csv function
  data_file = file_object['Body'].read()
  return str(data_file.decode("utf-8"))

data = get_file('populations.csv')
print(data)

### Upload a new file into your bucket
---

For this exercise you are going to make a new data file (using pandas.to_csv to make the csv file, then BytesIO to convert it into a bytes object that can be stored on S3)

In [None]:
def save_a_copy(filedata, filename):
  # first copy the data into a new file (print it so that you know it has been done)
  new_data = filedata.copy()
  print(new_data)

  # make a text file object to store the data in, then convert the data csv format and place inside the file object
  file_object =  io.StringIO()
  new_data.to_csv(file_object, index=False)

  # upload the file to the bucket with the filename and the file contents
  response = s3_client.put_object(Bucket=os.environ.get('BUCKET_NAME'), Body=file_object.getvalue(), Key=filename)

response = save_a_copy(data, 'populations_copy.csv')
print(response)