# Read in data from the landsat8 bucket

---

## Imports

In [None]:
import xarray as xr
import s3fs
import glob

## Loading our data into xarray
First, let's open a single file.

Our data is stored in the cloud on Jetstream2. We'll load in one of the NetCDF files, recast it from an s3 `fsspec` object into something that `xarray` can open, and then open and examine the dataset.

In [None]:
jetstream_url = 'https://js2.jetstream-cloud.org:8001/'

s3 = s3fs.S3FileSystem(anon=True, client_kwargs=dict(endpoint_url=jetstream_url))

s3path = 's3://pythia/landsat8/iQuam/202302-STAR-L2i_GHRSST-SST-iQuam-V2.10-v01.0-fv02.0.nc'

# Open all files from folder
s3file = s3.open(s3path)

# Open with xarray
ds = xr.open_dataset(s3file)


In [None]:
ds

Next, try opening multiple files.

In [None]:
#Generate a list of matching files in the object store folder
s3path = 's3://pythia/landsat8/iQuam/20221*-STAR-L2i_GHRSST-SST-iQuam-V2.10-v01.0-fv02.0.nc'
remote_files = s3.glob(s3path)

In [None]:
remote_files

The next cell is necessary after using the `glob` method on the Jetstream2 object store.

In [None]:
s3.invalidate_cache()

Open all matching files and add them to a list that we'll pass into Xarray's `open_mfdataset` function.

In [None]:
fileset = [s3.open(file) for file in remote_files]

Open with xarray

<div class="alert alert-warning">
The next cell will fail because Xarray could not find any dimension coordinates to use to order the datasets for concatenation. Perhaps modifying the individual dataset objects such that the <code>time</code>, <code>lat</code>, and <code>lon</code> coordinate variables are also named dimensions will correct this.
</div>

In [None]:
ds = xr.open_mfdataset(fileset)
ds