In [1]:
import itertools
from odc.aws.s3 import S3Fetcher

## Construct fetcher object

In [2]:
s3 = S3Fetcher()

`s3` is a callable that accepts a sequence of urls and generates a sequence of result objects with fields

- `url` -- requested url
- `data` -- bytes
- `last_modified` -- timestamp of the object
- `range=None` -- optional, range of bytes if requested partial read
- `error=None` -- on error this contains an exception object


Note that output order will not be the same as input order, you can not assume one to one correspondence between input and output sequences.

## Get some urls to fetch

Get 100 urls pointing to yaml documents for S2A/B NRT.

In [3]:
%%time
urls = (o.url for o in s3.find('s3://dea-public-data/L2/sentinel-2-nrt/S2MSIARD/', glob='*yaml'))
urls = itertools.islice(urls, 100)
urls = list(urls)

CPU times: user 2.68 s, sys: 348 ms, total: 3.03 s
Wall time: 2.88 s


In [4]:
len(urls), urls[:3]

(100,
 ['s3://dea-public-data/L2/sentinel-2-nrt/S2MSIARD/2018-08-12/S2B_OPER_MSI_ARD_TL_EPAE_20180813T012421_A007492_T56LPM_N02.06/ARD-METADATA.yaml',
  's3://dea-public-data/L2/sentinel-2-nrt/S2MSIARD/2018-08-12/S2B_OPER_MSI_ARD_TL_EPAE_20180813T012421_A007492_T56LPN_N02.06/ARD-METADATA.yaml',
  's3://dea-public-data/L2/sentinel-2-nrt/S2MSIARD/2018-08-12/S2B_OPER_MSI_ARD_TL_EPAE_20180813T012421_A007492_T56LPP_N02.06/ARD-METADATA.yaml'])

We didn't need to wait for `s3.find` to finish, as fetcher accepts an iterator, we could just pass in the sequence coming out of `s3.find` directly to the fetcher. This was mainly done to understand relative costs of the two operations.

In [5]:
%%time
rr = list(s3(urls))

CPU times: user 459 ms, sys: 40.8 ms, total: 500 ms
Wall time: 790 ms


In [6]:
r = rr[0]
len(rr), r.last_modified, len(r.data), type(r.data)

(100, datetime.datetime(2018, 8, 13, 7, 12, 11, tzinfo=tzutc()), 23877, bytes)

In [7]:
txt = r.data.decode('utf8')
txt = '\n'.join(txt.splitlines()[:10])
print(txt, '\n...')

algorithm_information:
    algorithm_version: 2.0
    arg25_doi: http://dx.doi.org/10.4225/25/5487CC0D4F40B
    nbar_doi: http://dx.doi.org/10.1109/JSTARS.2010.2042281
    nbar_terrain_corrected_doi: http://dx.doi.org/10.1016/j.rse.2012.06.018
extent:
    center_dt: '2018-08-12T23:57:34.459Z'
    coord:
        ll:
            lat: -12.750842155509027 
...
