# Upload directories to S3

There is nothing in the boto3 library itself that would allow you to upload an entire directory. So I write my own code to traverse a directory tree using pathlib and upload each individual file using boto3.

S3 is a key value store with a flat structure and technically does not have folders (although it supports the concept). That's why one works with prefix "keys" in file names like `abc/xys/uvw/123.jpg`.


_Alternatively I could try some of these alternatives:_
- The command line utility in boto called `s3put` that handles such operations
- The AWS CLI tool has a lot of features that allow uploading entire directories or even [sync](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/sync.html) the S3 bucket with a local directory or vice-versa. 
- A python filesystem library called `s3sf` that provides high-level functionality over boto3 and enables filesystem-like operations on s3.

Helpful resources:
- [boto3 documentation on s3](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-examples.html)
- [S3 user guide on: How do I use folders in S3?](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/using-folders.html)

In [3]:
import sys
from pathlib import Path

import boto3
import pandas as pd

In [4]:
print(sys.executable)
print(sys.version)

C:\Users\r2d4\miniconda3\envs\pytorch\python.exe
3.8.5 (default, Sep  3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)]


In [5]:
# Instantiate the s3 client
s3 = boto3.client("s3")

# Check my existing buckets
my_buckets = s3.list_buckets()
for bucket in my_buckets["Buckets"]:
    print(bucket["Name"])

clone-a-consultant-20-09
elasticbeanstalk-us-east-2-873674308518


In [6]:
# If not exists create a new bucket for the project
new_bucket_name = "clone-a-consultant-20-09"
if not new_bucket_name in [bucket["Name"] for bucket in my_buckets["Buckets"]]:  
    response = s3.create_bucket(
        Bucket=new_bucket_name,
        CreateBucketConfiguration={
            'LocationConstraint': 'eu-west-1',
        },
    )
    print(response["location"])

In [7]:
def create_file_generator(path):
    return Path(path).rglob("*.*")


def upload_file_to_s3(local_path, client, bucket, s3_dir=False):
    if s3_dir:
        s3_path = str(Path(s3_dir) / local_path).replace("\\", "/")
    else: 
        s3_path = str(local_path).replace("\\", "/")
    # Check if file already exists, if yes skip, if no upload
    try:
        client.head_object(Bucket=bucket, Key=s3_path)
        print(f"File found in s3 bucket! Skipping {s3_path}")
    except:
#         print(f"Uploading {s3_path} ...")
#         s3.upload_file(local_path, bucket, s3_path) 
        print(local_path)
        print(f"{s3_path}\n")

In [8]:
gener = create_file_generator(r"data")

for local_path in gener:
    upload_file_to_s3(local_path, s3, new_bucket_name, s3_dir="extra\extra")

data\train\test.txt
extra/extra/data/train/test.txt

data\train2\test.txt
extra/extra/data/train2/test.txt

data\train3\test.txt
extra/extra/data/train3/test.txt



In [None]:
# from the s3 docs


def upload_file(file_name, bucket, object_name=None):
    """Upload a file to an S3 bucket

    :param file_name: File to upload
    :param bucket: Bucket to upload to
    :param object_name: S3 object name. If not specified then file_name is used
    :return: True if file was uploaded, else False
    """

    # If S3 object_name was not specified, use file_name
    if object_name is None:
        object_name = file_name

    # Upload the file
    s3_client = boto3.client('s3')
    try:
        response = s3_client.upload_file(file_name, bucket, object_name)
    except ClientError as e:
        logging.error(e)
        return False
    return True