In [1]:
import itertools
from dea.aws import s3_find
from dea.aws.aio import S3Fetcher

## Get some urls to fetch

Get 100 urls pointing to yaml documents for S2A/B NRT.

In [2]:
%%time
urls = s3_find('s3://dea-public-data/L2/sentinel-2-nrt/S2MSIARD/', glob='*yaml')
urls = itertools.islice(urls, 100)
urls = list(urls)

CPU times: user 1.51 s, sys: 61.8 ms, total: 1.57 s
Wall time: 2.77 s


In [3]:
len(urls), urls[:3]

(100,
 ['s3://dea-public-data/L2/sentinel-2-nrt/S2MSIARD/2018-07-30/S2B_OPER_MSI_ARD_TL_EPAE_20180730T055204_A007293_T51KVV_N02.06/ARD-METADATA.yaml',
  's3://dea-public-data/L2/sentinel-2-nrt/S2MSIARD/2018-07-30/S2B_OPER_MSI_ARD_TL_EPAE_20180730T055204_A007293_T51KWA_N02.06/ARD-METADATA.yaml',
  's3://dea-public-data/L2/sentinel-2-nrt/S2MSIARD/2018-07-30/S2B_OPER_MSI_ARD_TL_EPAE_20180730T055204_A007293_T51KWB_N02.06/ARD-METADATA.yaml'])

## Construct fetcher object

In [4]:
fetch = S3Fetcher()

`fetch` is a callable that accepts a sequence of urls and generates a sequence of result objects with fields

- `url` -- requested url
- `data` -- bytes
- `last_modified` -- timestamp of the object
- `range=None` -- optional, range of bytes if requested partial read
- `error=None` -- on error this contains an exception object


Note that output order will not be the same as input order, you can not assume one to one correspondence between input and output sequences.

We didn't need to wait for `s3_find` to finish, as fetcher accepts an iterator, we could just pass in the sequence coming out of `s3_find` directly to the fetcher. This was mainly done to understand relative costs of the two operations.

In [5]:
%%time
rr = list(fetch(urls))

CPU times: user 423 ms, sys: 36.2 ms, total: 459 ms
Wall time: 603 ms


In [6]:
r = rr[0]
len(rr), r.last_modified, len(r.data), type(r.data)

(100, datetime.datetime(2018, 8, 9, 5, 7, 56, tzinfo=tzutc()), 28061, bytes)

In [7]:
txt = r.data.decode('utf8')
txt = '\n'.join(txt.splitlines()[:10])
print(txt, '\n...')

algorithm_information:
    algorithm_version: 2.0
    arg25_doi: http://dx.doi.org/10.4225/25/5487CC0D4F40B
    nbar_doi: http://dx.doi.org/10.1109/JSTARS.2010.2042281
    nbar_terrain_corrected_doi: http://dx.doi.org/10.1016/j.rse.2012.06.018
extent:
    center_dt: '2018-07-30T02:01:34.939Z'
    coord:
        ll:
            lat: -19.97450997949223 
...
