# Using objects from your S3 buckets in OVHcloud AI Tools

This tutorial provides help to manage and use S3 buckets with AI Tools in Python, using the `boto3` library. We will show you how you can interact with your S3 Buckets and files by creating buckets, downloading objects, listing objects and reading their content when working with AI Notebooks, AI Training and AI Deploy.

## Requirements

To be able to follow this tutorial, you will need to have followed the [Data - S3 compliance with AI Tools documentation](https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-s3-compliance?id=kb_article_view&sysparm_article=KB0058011) first, in particular the following steps:

- Have created a S3 user
- Checked that this user has ***ObjectStore operator*** and ***AI Training Operator*** rights
- Have created a datastore with this user

## Code

The different steps are as follow:
- Setup the environment
- Set your S3 datastore
- List all S3 buckets in your S3 datastore
- Create a new bucket
- List all objects of a specific bucket
- Read content from objects
- Download object from S3 bucket

### Setup the environment

Let's install the libraries we will need, then import them:

In [None]:
!pip install pandas

In [None]:
!pip install boto3

In [5]:
import boto3
import json
import os 
import pandas as pd
from pathlib import Path 

### Set your S3 datastore

To interact with an S3 bucket, we need to initialize a S3 client and configure it with our user credentials (`s3_access_key`, `s3_secret_key`, the `endpoint URL`, and the selected region).

***Make sure to replace these credentials by yours.***

In [None]:
# Depending on the user
s3_access_key= "MY_KEY" 
s3_secret_key= "MY_SECRET" 

# Depeding on the region
s3_endpoint="https://s3.gra.io.cloud.ovh.net/"
s3_region = "gra"
  
s3_client = boto3.client( 
    "s3", 
    aws_access_key_id=s3_access_key, 
    aws_secret_access_key=s3_secret_key, 
    endpoint_url=s3_endpoint,
    region_name=s3_region
) 

Once the S3 client has been initialized, we are ready to communicate with the S3-compatible storage service. Many things can be done.

### List all S3 buckets in your S3 datastore

In [65]:
response = s3_client.list_buckets()

print(f"Existing buckets:\n{16 * '-'}")
for bucket in response['Buckets']:
    print(bucket['Name'])

Existing buckets:
----------------
my-bucket-s3-n2
my-bucket-s3-n3
my-bucket-s3.0


### Create a new bucket

Let's create a new bucket named `bucketname`:

In [66]:
response = s3_client.create_bucket(
    Bucket='bucketname',
    CreateBucketConfiguration={
        'LocationConstraint': s3_region,
    },
)

List buckets again, to see if the bucket `bucketname` has been created:

In [67]:
response = s3_client.list_buckets()
for bucket in response['Buckets']:
    print(bucket['Name'])

bucketname
my-bucket-s3-n2
my-bucket-s3-n3
my-bucket-s3.0


*Keep in mind that the bucket name must be between 3 and 63 characters, can consist only of lowercase letters, numbers, dots (.), and hyphens (-) and must start and end with lower-case alphanumeric characters (a to z and 0 to 9).*

### List all objects of a specific bucket

In [71]:
# Specify your bucket 
s3_bucket = "BUCKETNAME" 

In [72]:
response = s3_client.list_objects_v2(Bucket=s3_bucket) 

if 'Contents' in response: 
    objects = response['Contents'] 
    for obj in objects: 
        print(obj['Key']) 
else: 
   print("The bucket is empty or the list operation failed.")

audio_file.mp3
creds.json
graph.png
requirements.txt


`response['Contents']` contains information in addition to file names (last updated date, object size, ...). We can display them all in a dataframe:

In [76]:
df = pd.DataFrame(objects)
display(df)

Unnamed: 0,Key,LastModified,ETag,Size,StorageClass
0,audio_file.mp3,2023-08-21 15:10:15+00:00,"""418b1289a3efd2a601c1ab08341669f3""",10238080,STANDARD
1,creds.json,2023-08-21 15:10:13+00:00,"""dbcc4da589c34842250cb8f68acdfd51""",40,STANDARD
2,graph.png,2023-08-21 15:10:13+00:00,"""2a6ff4af303fed461ca7704c485536ce""",28249,STANDARD
3,requirements.txt,2023-08-21 15:10:15+00:00,"""c4b3351ae370ca6d30ea4d962ce81c8d""",59,STANDARD


### Read content from objects

You can read the contents of your various objects. Let's start by reading a simple text file:

In [78]:
# TXT example
object_key = 'requirements.txt'
response = s3_client.get_object(Bucket=s3_bucket, Key=object_key)
object_content = response['Body'].read().decode('utf-8')

print(object_content)

kaggle
matplotlib==3.7.2
torchvision==0.15.2
Pillow==10.0.0


This can be done for any file, adapting the code to the file format required. Here is another example with a json file

In [80]:
# json example
object_key = 'creds.json'
response = s3_client.get_object(Bucket=s3_bucket, Key=object_key)
object_content = response['Body'].read().decode('utf-8')
json_data = json.loads(object_content)

print(json_data)

{'username': 'ovhcloud', 'key': 'adminkey'}


### Download object from S3 bucket

You can download any object from your S3 bucket into your environment. Here is how to download the `requirements.txt` file under the name `local-object.txt`

In [15]:
object_filename = 'requirements.txt'
local_filename = 'local-object.txt'

s3_client.download_file(s3_bucket, object_filename, local_filename)

Once the file is downloaded, you should see it at the root of your notebook environment. 

### Conclusion

We hope this example has helped you to manipulate the objects in your S3 buckets directly from the OVHcloud AI Tools products. 

The operations presented here are not the only possible actions. Please consult the documentation for a full list of available commands.

More commands here : https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html