# S3 Bucket Organizer
#### This code is meant to organize the S3 bucket into folders by ASIN.
#### Much of the code is following this guide: https://realpython.com/python-boto3-aws-s3/
#### Other resources: https://boto3.amazonaws.com/v1/documentation/api/latest/index.html 
#### https://aws.amazon.com/sdk-for-python/

In [50]:
import boto3
import numpy as np
from botocore.exceptions import ClientError
import time

In [5]:
s3 = boto3.resource('s3')

In [10]:
BUCKET_NAME = "bucknellrobotics"
IMG_BUCKET = s3.Bucket(name=BUCKET_NAME)

#### This is using the high-level interface, not the client interface.

In [8]:
# Print out bucket names
for bucket in s3.buckets.all():
    print(bucket.name)

s3.Bucket(name='bucknellrobotics')


In [17]:
#This should be the key to traversing the images. Next is how to sort into folders.
for obj in IMG_BUCKET.objects.all():
    print(obj.key)
    break

10_09_2019/0008242682_645712f5-c155-4ef5-a3e8-089bd44c0381.JPG


## Folders in S3

#### If you want to translate a directory structure on a disk to Amazon S3, you need to use the file path as the object key for the file. For instance, if you want to obtain the path /bar/foo/baz.jpg, you need to store ‘baz.jpg’ with an object key of ‘bar/foo/baz.jpg’. https://www.ludofischer.com/blog/how-to-upload-directory-amazon-s3/
#### In other words when assigning images to "folders" we will need to go through each image, parse for the ASIN, and say 
#### obj.key = "10_09_2019/ASIN#/original_filename"  (actual key assignment is more complicated)

#### This also suggests that it is unnecesary to have a list of unique ASINs.

#### This approach may work for the initial build but it'll mean somewhat more work for adding images to folders after the initial build.
#### This may also MAJORLY mess up what we were considering for the CNNs. We were assuming we could access the bucket more efficiently by only loading select folders, but the website above seems to suggest that folders don't actually exist? Correction- they can only be created by hand in the Administration Console.
#### Here is an approach for limiting the number of files we're viewing. IDK if it actually works, or even if it would be fast because we're still loading all objects before reducing the selection.
#### for obj in IMG_BUCKET.objects.all()[IMG_BUCKET.objects.all().key == "10_09_2019/ASIN/*"]: