# Day 2 - Working with S3

In this notebook we will use some S3 API to interact with [Minio](https://minio.io) a full-fledged service, based on *object storage* combining two protocols:
 * `s3://`, providing a multi-user service with per-user authentication for uploading and downloading files;
 * `http://`, providing a public service with per-operation authentication for uploading and downloading files.

The `s3` protocol is an **open standard** for *object storage* that was first released by Amazon Web Services (AWS) in 2006.

It provides a simple web interface that can be used to store and retrive any amount of data, at any time, from anywhere in the web.

The `s3` protocol is a popular choice for storing and managing large amounts of unstructured data such as images, videos and log files. 
It offers a range of storage classes designed for different use cases, requiring a frequent access or relying on cold storage for archiving data at the lowest cost. 

Minio, and in general object storage, is organized in ***buckets***. 
A bucket is a logical container for stored objects. It is more a flat structure that stores objects and their metadata than a file inside a folder.
Buckets are used to organize and manage objects in *object storage* systems.

Buckets can be created as needed and associated to policies determining what actions users can perform on a bucket and on all the objects in the bucket.
Example of policies include replication to other storage services (for disaster recovery) or lifecycle policies. 

In this notebook we will focus on the basics of S3, including bucket policies and metadata. 


## Accessing Minio console

Go to `https://console.131.154.99.220.myip.cloud.infn.it/` and login with the user and passwords got from the following cell:

In [2]:
import hashlib
import os
#username = os.environ['JUPYTERHUB_USER']
username='lia-2elavezzi'
hash_object = hashlib.md5(f'{username}'.encode())
password = hash_object.hexdigest()
print(f"Username: {username}\npassword: {password}")


Username: lia-2elavezzi
password: 957ba464403d7d86aa1e6d6b7c289da7


## Accessing *Minio* via `s3` in Python with the boto3 library

The `boto3`  enables more complicated authorization patterns and enables developing applications which are independent of the object storage provider. In other words, if you develop your application with `boto3` you can transparently migrate from a self-hosted Minio server, to an AWS object storage solution. Enable the S3 client by running the cell.


In [3]:
import boto3
import json

s3client = boto3.client('s3',
    aws_access_key_id=username,
    aws_secret_access_key=password,
    endpoint_url="https://minio.131.154.99.220.myip.cloud.infn.it",
    region_name='default',)

Then you can list buckets

In [5]:
resp = s3client.list_buckets()
print(resp)

{'ResponseMetadata': {'RequestId': '179101F034B11CFE', 'HostId': 'dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Tue, 24 Oct 2023 09:44:25 GMT', 'content-type': 'application/xml', 'content-length': '9259', 'connection': 'keep-alive', 'accept-ranges': 'bytes', 'strict-transport-security': 'max-age=15724800; includeSubDomains', 'vary': 'Origin, Accept-Encoding', 'x-amz-id-2': 'dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8', 'x-amz-request-id': '179101F034B11CFE', 'x-content-type-options': 'nosniff', 'x-xss-protection': '1; mode=block'}, 'RetryAttempts': 0}, 'Buckets': [{'Name': 'acostantini', 'CreationDate': datetime.datetime(2023, 10, 23, 9, 8, 55, 341000, tzinfo=tzlocal())}, {'Name': 'adavanzo', 'CreationDate': datetime.datetime(2023, 10, 23, 13, 39, 10, 95000, tzinfo=tzlocal())}, {'Name': 'alba-2d2d2d2d2d2d2d2d2d2d2d2egonzalvez', 'CreationDate': datetime.datetime(2023, 10, 23, 14, 3, 21, 261000, tzi

Create your own bucket (if you are allowed!)

In [6]:
bucket_name = 'lia_bucket1'
s3bucket = s3client.create_bucket(Bucket=bucket_name)
resp = s3client.list_buckets()
print(resp)
# non funziona perché possiamo scrivere solo il bucket con il nostro username, non abbiamo permessi di scrittura di altri bucket

ClientError: An error occurred (InvalidBucketName) when calling the CreateBucket operation: The specified bucket is not valid.

Print only the Bucket name(s)

In [7]:
resp = s3client.list_buckets()
for bucket in resp['Buckets']:
        print(bucket['Name'])

acostantini
adavanzo
alba-2d2d2d2d2d2d2d2d2d2d2d2egonzalvez
alba-2d2d2d2d2d2d2d2d2d2d2egonzalvez
alba-2d2d2d2d2d2d2d2d2d2egonzalvez
alba-2d2d2d2d2d2d2d2d2egonzalvez
alba-2d2d2d2d2d2d2d2egonzalvez
alba-2d2d2d2d2d2d2egonzalvez
alba-2d2d2d2d2d2egonzalvez
alba-2d2d2d2d2egonzalvez
alba-2d2d2d2egonzalvez
alba-2d2d2egonzalvez
alba-2d2egonzalvez
alba-2egonzalvez
anderlinil
andreaadelfio
andreaespis
annacalanca
atroja
augustotortora
bianco95
cmarcon-2d2d2d2d2d2d5fsosc
cmarcon-2d2d2d2d2d5fsosc
cmarcon-2d2d2d2d5fsosc
cmarcon-2d2d2d5fsosc
cmarcon-2d2d5fsosc
cmarcon-2d5fsosc
cmarcon-5fsosc
dciangot
dpelosi
dranieri
fdelcorso
federicocorchia
flizzi
giacomo-2d2d2d2d2ecoran
giacomo-2d2d2d2ecoran
giacomo-2d2d2ecoran
giacomo-2d2ecoran
giacomo-2ecoran
gianlucasabella
giorgiodho
gmalatesta
gvino
lia-2d2d2elavezzi
lia-2d2elavezzi
lia-2elavezzi
lmagenta
ltabarroni
lucamancini
marialisa-2d2d2d2d2d2d2d2d2ebrozzetti
marialisa-2d2d2d2d2d2d2d2ebrozzetti
marialisa-2d2d2d2d2d2d2ebrozzetti
marialisa-2d2d2d2d2d2ebro

Retrieve the policy for the specified bucket (check the MINIO console)

In [19]:
bucket_name = 'lia-2elavezzi'
resp = s3client.get_bucket_policy(Bucket=bucket_name,)
print(resp)
print(resp['Policy'])

{'ResponseMetadata': {'RequestId': '17910318AA25D7E8', 'HostId': 'dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Tue, 24 Oct 2023 10:05:38 GMT', 'content-type': 'application/json', 'content-length': '168', 'connection': 'keep-alive', 'accept-ranges': 'bytes', 'strict-transport-security': 'max-age=15724800; includeSubDomains', 'vary': 'Origin, Accept-Encoding', 'x-amz-id-2': 'dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8', 'x-amz-request-id': '17910318AA25D7E8', 'x-content-type-options': 'nosniff', 'x-xss-protection': '1; mode=block'}, 'RetryAttempts': 0}, 'Policy': '{"Version":"2012-10-17","Statement":[{"Sid":"AddPerm","Effect":"Allow","Principal":{"AWS":["*"]},"Action":["s3:ListBucket"],"Resource":["arn:aws:s3:::lia-2elavezzi"]}]}'}
{"Version":"2012-10-17","Statement":[{"Sid":"AddPerm","Effect":"Allow","Principal":{"AWS":["*"]},"Action":["s3:ListBucket"],"Resource":["arn:aws:s3:::lia-2elavezzi"]}]}


Create your own bucket policy

In [20]:
bucket_name = 'lia-2elavezzi'
bucket_policy = {
    'Version': '2012-10-17',
    'Statement': [{
        'Sid': 'AddPerm',
        'Effect': 'Allow',
        'Principal': '*',
        'Action': ['s3:ListBucket'],
        'Resource': f'arn:aws:s3:::{bucket_name}'
    }]
}

# Convert the policy from JSON dict to string
bucket_policy = json.dumps(bucket_policy)

# Set the new policy
s3client.put_bucket_policy(Bucket=bucket_name, Policy=bucket_policy)
resp = s3client.get_bucket_policy(Bucket=bucket_name,)
print(resp)


{'ResponseMetadata': {'RequestId': '17910319E3EE9076', 'HostId': 'dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Tue, 24 Oct 2023 10:05:44 GMT', 'content-type': 'application/json', 'content-length': '168', 'connection': 'keep-alive', 'accept-ranges': 'bytes', 'strict-transport-security': 'max-age=15724800; includeSubDomains', 'vary': 'Origin, Accept-Encoding', 'x-amz-id-2': 'dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8', 'x-amz-request-id': '17910319E3EE9076', 'x-content-type-options': 'nosniff', 'x-xss-protection': '1; mode=block'}, 'RetryAttempts': 0}, 'Policy': '{"Version":"2012-10-17","Statement":[{"Sid":"AddPerm","Effect":"Allow","Principal":{"AWS":["*"]},"Action":["s3:ListBucket"],"Resource":["arn:aws:s3:::lia-2elavezzi"]}]}'}


Upload an object (upload or create a couple of txt file such as test.txt and test2.txt)

In [21]:
bucket_name = 'lia-2elavezzi'
upload = s3client.upload_file('test2.txt', bucket_name, 'test/test2.txt')
resp = s3client.list_objects(Bucket=bucket_name)
print(resp)

{'ResponseMetadata': {'RequestId': '1791031B48B76EC8', 'HostId': 'dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Tue, 24 Oct 2023 10:05:50 GMT', 'content-type': 'application/xml', 'content-length': '920', 'connection': 'keep-alive', 'accept-ranges': 'bytes', 'strict-transport-security': 'max-age=15724800; includeSubDomains', 'vary': 'Origin, Accept-Encoding', 'x-amz-id-2': 'dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8', 'x-amz-request-id': '1791031B48B76EC8', 'x-content-type-options': 'nosniff', 'x-xss-protection': '1; mode=block'}, 'RetryAttempts': 0}, 'IsTruncated': False, 'Marker': '', 'Contents': [{'Key': 'test/test.txt', 'LastModified': datetime.datetime(2023, 10, 24, 9, 59, 34, 256000, tzinfo=tzlocal()), 'ETag': '"9b1529ddfd06b2046b2615f58ad2829f"', 'Size': 6, 'StorageClass': 'STANDARD', 'Owner': {'DisplayName': 'minio', 'ID': '02d6176db174dc93cb1b899f7c6078f08654445fe8cf1b6ce98d8855f66bdbf4'}

List Object in a bucket

In [22]:
bucket_name = 'lia-2elavezzi'
resp = s3client.list_objects(Bucket=bucket_name)
for object in resp['Contents']:
        print(object['Key'])

test/test.txt
test/test2.txt


List metadata of an Object

In [23]:
bucket_name = 'lia-2elavezzi'
resp = s3client.list_objects(Bucket=bucket_name)
##print(resp)
for object in resp['Contents']:
    print(object['Key'])
    metadata = s3client.head_object(Bucket=bucket_name, Key=object['Key'])
    print(metadata)

test/test.txt
{'ResponseMetadata': {'RequestId': '1791031F90E40AFD', 'HostId': 'dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Tue, 24 Oct 2023 10:06:08 GMT', 'content-type': 'binary/octet-stream', 'content-length': '6', 'connection': 'keep-alive', 'accept-ranges': 'bytes', 'etag': '"9b1529ddfd06b2046b2615f58ad2829f"', 'last-modified': 'Tue, 24 Oct 2023 09:59:34 GMT', 'strict-transport-security': 'max-age=15724800; includeSubDomains', 'vary': 'Origin, Accept-Encoding', 'x-amz-id-2': 'dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8', 'x-amz-request-id': '1791031F90E40AFD', 'x-content-type-options': 'nosniff', 'x-xss-protection': '1; mode=block'}, 'RetryAttempts': 0}, 'AcceptRanges': 'bytes', 'LastModified': datetime.datetime(2023, 10, 24, 9, 59, 34, tzinfo=tzutc()), 'ContentLength': 6, 'ETag': '"9b1529ddfd06b2046b2615f58ad2829f"', 'ContentType': 'binary/octet-stream', 'Metadata': {}}
test/test2.txt
{'Re

Add personalized metadata

In [23]:
bucket_name = 'lia-2elavezzi'
resp = s3client.list_objects(Bucket=bucket_name)
##print(resp)
for object in resp['Contents']:
    print(object['Key'])
    metadata = s3client.head_object(Bucket=bucket_name, Key=object['Key'])
    print(metadata)

test/test.txt
{'ResponseMetadata': {'RequestId': '1791031F90E40AFD', 'HostId': 'dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Tue, 24 Oct 2023 10:06:08 GMT', 'content-type': 'binary/octet-stream', 'content-length': '6', 'connection': 'keep-alive', 'accept-ranges': 'bytes', 'etag': '"9b1529ddfd06b2046b2615f58ad2829f"', 'last-modified': 'Tue, 24 Oct 2023 09:59:34 GMT', 'strict-transport-security': 'max-age=15724800; includeSubDomains', 'vary': 'Origin, Accept-Encoding', 'x-amz-id-2': 'dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8', 'x-amz-request-id': '1791031F90E40AFD', 'x-content-type-options': 'nosniff', 'x-xss-protection': '1; mode=block'}, 'RetryAttempts': 0}, 'AcceptRanges': 'bytes', 'LastModified': datetime.datetime(2023, 10, 24, 9, 59, 34, tzinfo=tzutc()), 'ContentLength': 6, 'ETag': '"9b1529ddfd06b2046b2615f58ad2829f"', 'ContentType': 'binary/octet-stream', 'Metadata': {}}
test/test2.txt
{'Re

In [28]:
bucket_name = 'lia-2elavezzi'
resp = s3client.list_objects(Bucket=bucket_name)
for object in resp['Contents']:
    print(object['Key'])
    metadata = s3client.head_object(Bucket=bucket_name, Key=object['Key'])
    print(metadata)
    new_meta = metadata['Metadata']
    new_meta['WhatIsIt2'] = 'this_is_my_file2'
    s3client.copy_object(Bucket=bucket_name, Key=object['Key'], CopySource=bucket_name + '/' + object['Key'], Metadata=new_meta, MetadataDirective='REPLACE')
metadata = s3client.head_object(Bucket=bucket_name, Key=object['Key'])
print(metadata)

test/test.txt
{'ResponseMetadata': {'RequestId': '1791037817B9C4E1', 'HostId': 'dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Tue, 24 Oct 2023 10:12:28 GMT', 'content-type': 'binary/octet-stream', 'content-length': '6', 'connection': 'keep-alive', 'accept-ranges': 'bytes', 'etag': '"9b1529ddfd06b2046b2615f58ad2829f"', 'last-modified': 'Tue, 24 Oct 2023 10:09:30 GMT', 'strict-transport-security': 'max-age=15724800; includeSubDomains', 'vary': 'Origin, Accept-Encoding', 'x-amz-id-2': 'dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8', 'x-amz-request-id': '1791037817B9C4E1', 'x-content-type-options': 'nosniff', 'x-xss-protection': '1; mode=block', 'x-amz-meta-whatisit': 'this_is_my_file'}, 'RetryAttempts': 0}, 'AcceptRanges': 'bytes', 'LastModified': datetime.datetime(2023, 10, 24, 10, 9, 30, tzinfo=tzutc()), 'ContentLength': 6, 'ETag': '"9b1529ddfd06b2046b2615f58ad2829f"', 'ContentType': 'binary/octet-st

Delete an Object

In [29]:
bucket_name = 'lia-2elavezzi'
resp = s3client.list_objects(Bucket=bucket_name)
for object in resp['Contents']:
    print(object['Key'])
    s3client.delete_object(Bucket=bucket_name, Key=object['Key'])
resp = s3client.list_objects(Bucket=bucket_name)
print(resp)

test/test.txt
test/test2.txt
{'ResponseMetadata': {'RequestId': '179103AFC3633ED6', 'HostId': 'dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Tue, 24 Oct 2023 10:16:27 GMT', 'content-type': 'application/xml', 'content-length': '271', 'connection': 'keep-alive', 'accept-ranges': 'bytes', 'strict-transport-security': 'max-age=15724800; includeSubDomains', 'vary': 'Origin, Accept-Encoding', 'x-amz-id-2': 'dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8', 'x-amz-request-id': '179103AFC3633ED6', 'x-content-type-options': 'nosniff', 'x-xss-protection': '1; mode=block'}, 'RetryAttempts': 0}, 'IsTruncated': False, 'Marker': '', 'Name': 'lia-2elavezzi', 'Prefix': '', 'MaxKeys': 1000, 'EncodingType': 'url'}
