# Listing Objects in S3 Using to `boto3`

In [None]:
import boto3
import requests
from getpass import getpass

## Enter Earthdata Login Credentials

In [None]:
user = getpass(prompt='Enter your NASA Earthdata Login Username')
password = getpass(prompt='Enter your NASA Earthdata Login Password')

## Get Earthdata Cloud Temporary Credentials

In [None]:
url = 'https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials'
url = requests.get(url, allow_redirects=False).headers['Location']
creds = requests.get(url, auth=(user, password)).json()

## Create a `boto3` Session

We will use a `session` to store our S3 credentials and other configurations options. Our `session` will be used to create a `boto3` client which act as our interface to AWS services used to, for example, download files or list objects in S3 specified S3 buckets.

**NOTE,** it is important to specify the `prefix` and `delimiter` parameter options. The `list_object_v2` methods will fail without those options being specified.

In [None]:
session = boto3.Session(aws_access_key_id=creds['accessKeyId'], 
                        aws_secret_access_key=creds['secretAccessKey'], 
                        aws_session_token=creds['sessionToken'], 
                        region_name='us-west-2')
client = session.client('s3')
bucket = 'lp-prod-protected'
prefix = ''
delimiter = '/'

Now we can list all of the collections within the `lp-prod-protected` bucket.

In [None]:
bucket_list=client.list_objects_v2(Bucket=bucket, Prefix=prefix, Delimiter=delimiter)
bucket_list

`bucket_list` is a dictionary where all available collections can be found with the `CommonPrefixes` key.

In [None]:
bucket_list.keys()

We can use the `CommonPrefixes` key to pull all of the collections into a list.

In [None]:
# Check for common prefixes (directories) found
if 'CommonPrefixes' not in bucket_list:
    print ('No directories found')
else:
    dir_list=[]
    for dir_name in bucket_list['CommonPrefixes']:
        dir_list.append('%s ' % (dir_name['Prefix']))
        print(dir_name['Prefix'])    

print('Dir count = ',len(dir_list))

To see what is contained within each collection, we'll update the `Prefix` option to include the a collection name. 

In [None]:
prefix = "ECO_L2_LSTE.002/"

In [None]:
col_prefix =client.list_objects_v2(Bucket=bucket, Prefix=prefix, Delimiter=delimiter)
#col_prefix

In [None]:
dir_list=[]
for dir_name in col_prefix['CommonPrefixes']:
            dir_list.append(f"{dir_name['Prefix']}")

You'll notice that the list of Prefixes (or granules) is quite long. The `list_objects_v2` method will return only 1000 objects by default. Often collections include well over 1000 granules. We can set up some code the 'page' through the entire collection and add the granules to `dir_list`.

In [None]:
# If the list is longer than the returned list (>1000) ask about pagination
if col_prefix['IsTruncated'] :
    cont = input('Continue (Y/n):')

# Paginate
#while col_prefix['IsTruncated'] and ( cont == 'Y' or cont == 'y' or cont == ''):
while 'NextContinuationToken' in col_prefix:
    continuation = col_prefix['NextContinuationToken']
    col_prefix = client.list_objects_v2(Bucket=bucket, Prefix=prefix, Delimiter=delimiter, ContinuationToken=continuation)

    # List Directories
    if 'CommonPrefixes' not in col_prefix:
        print ('No directories found')
    else:
        for dir_name in col_prefix['CommonPrefixes']:
            dir_list.append(f"{dir_name['Prefix']}")
            #print(dir_name['Prefix'])        
            #print('Dir count = ',len(dir_list))

    #cont = input('Continue (Y/n):')

We now have a list of Prefixes (granule paths) that we can use to find files

In [None]:
len(dir_list)

In [None]:
dir_list[:10]    # Print the first 10

We can find the files by updating the `prefix` again. This time we'll use the path from our `dir_list` to list the files associated with the first item in our list.

In [None]:
prefix = dir_list[0]

In [None]:
files = client.list_objects_v2(Bucket=bucket, Prefix=prefix, Delimiter=delimiter)

In [None]:
files

There are many files associated with this granule. Now we can get the `key` to a data asset in S3.

In [None]:
[f['Key'] for f in files['Contents'] if f['Key'].endswith('.h5')]