# Migrate jpg files from Phase 1 to shared S3 bucket for SFI ALD project

We migrated code created during the Phase 1 exploration phase from https://github.com/sreece101/CIFF-ALD to the collaboration repository here: https://github.com/s22s/sfi-asset-level-data/tree/master.

The jpeg files are quite large, so we elected to keep them out of the git repository and store them on the S3 bucket. This notebook uploads the jpgs to this shared s3 bucket.

In [None]:
# Install aws command line if not already installed
! pip install awscli

In [None]:
# Import python packages
import os
import re

In [None]:
# Check s3 objects already there
! aws s3 ls sfi-shared-assets

We can also list the contents of the bucket using the [`boto3` Python library](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html). This is the official AWS Python API.

In [None]:
import boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket('sfi-shared-assets')

# list the first 10 objects
for o in bucket.objects.limit(10):
    print(o)
    if o.key[-4:] in ['.jp2', 'tif']:
        print(f's3://{o.bucket_name}/{o.key}')

## Put jpgs on bucket

*Note: I first uploaded the jpgs from https://github.com/sreece101/CIFF-ALD into EarthAI Notebook to do this step. I've since deleted them from my local storage but keeping this code for examples.*

In [None]:
# Get list of jpg's in the "images" directory
images_list = os.listdir('cement-microloc-phase1/images/data')

# Upload images jpgs to s3
for img in images_list:
    if re.search("CHN", img):
        bucket.upload_file('cement-microloc-phase1/images/data/'+img, 'cement-microloc-phase1/images/data/'+img)

In [None]:
# Get list of jpg's in the "masks" directory
masks_list = os.listdir('cement-microloc-phase1/masks/data')

# Upload masks jpgs to s3
for img in masks_list:
    if re.search("CHN", img):
        bucket.upload_file('cement-microloc-phase1/masks/data/'+img, 'cement-microloc-phase1/masks/data/'+img)

In [None]:
# Get list of jpg's in the "test" directory
test_list = os.listdir('cement-microloc-phase1/test')

# Upload test jpgs to s3
for img in test_list:
    if re.search("CHN", img):
        bucket.upload_file('cement-microloc-phase1/test/'+img, 'cement-microloc-phase1/test/'+img)

In [None]:
# Count files on s3 for sanity check
s3_objects = bucket.objects.all()
img_cnt = 0
msk_cnt = 0
tst_cnt = 0
for o in s3_objects:
    if re.search("images", o.key):
        img_cnt = img_cnt + 1
    if re.search("masks", o.key):
        msk_cnt = msk_cnt + 1
    if re.search("test", o.key):
        tst_cnt = tst_cnt + 1
        
        
print('Total number of image chips: ', img_cnt)
print('Total number of mask chips: ', msk_cnt)
print('Total number of test chips: ', tst_cnt)