# Reading data with the `Scene`

Satpy's main interface for working with data is the `Scene` class. We can provide the `Scene` with data files and load them with a "reader". In this notebook we'll explore the basic data loading and data access functionality provided by Satpy while also providing a basic introduction to xarray's `DataArray` objects and `dask` arrays.

Before importing and using Satpy, we run some python code to do some initial setup. This includes turning off warnings and limiting the number of resources we use. These are precautions to make these examples work on the most machines.

In [None]:
%run ../init_notebook.py
from satpy import Scene
from glob import glob

# Get the list of GOES-16 ABI files to open
filenames = glob('../data/abi_l1b/20180511_texas_fire_abi_l1b_conus/*.nc')
len(filenames)

In [None]:
scn = Scene(reader='abi_l1b', filenames=filenames)
scn.keys()

We've now created a `Scene` object. Under the hood Satpy has sorted the files and determined what we can access. We haven't actually loaded any data so our dict-like `Scene` object is empty. To find out what data can be loaded from the file we can use the `available_dataset_names`.

In [None]:
scn.available_dataset_names()

The `Scene` is telling us that we have all 16 ABI channels available to load. This list includes any product that we can load from the file that the "abi_l1b" reader is configured to access. If we didn't provide all of the necessary files or the data was missing from the file for some reason, that product would not be listed here.

| Channel     | Wavelength  |  Resolution  |
| ----------- | ----------- |  ----------- |
| C01         | 0.47µm      |  1000m       |
| C02         | 0.64µm      |  500m        |
| C03         | 0.86µm      |  1000m       |
| C04         | 1.37µm      |  2000m       |
| C05         | 1.60µm      |  1000m       |
| C06         | 2.20µm      |  2000m       |
| C07         | 3.90µm      |  2000m       |
| C08         | 6.20µm      |  2000m       |
| C09         | 6.90µm      |  2000m       |
| C10         | 7.30µm      |  2000m       |
| C11         | 8.40µm      |  2000m       |
| C12         | 9.60µm      |  2000m       |
| C13         | 10.30µm     |  2000m       |
| C14         | 11.20µm     |  2000m       |
| C15         | 12.30µm     |  2000m       |
| C16         | 13.30µm     |  2000m       |

Let's pick one of these channels, load it, and look what information is provided by Satpy.

In [None]:
my_channel = 'EDITME'
scn.load([my_channel])
# use brackets to access products like a normal dict
scn[my_channel]

## Xarray and Dask

Above we see an `xarray.DataArray` object with a lot of metadata.
There are a few elements to get familiar with when working with DataArray's from Satpy:

* `dask.array<...>`: We don't see any actual imagery data. Our data is stored in a `dask` array instead of a traditional numpy array. This means our data's loading and calculations are delayed.
* `Attributes`: A dictionary where the metadata is stored. Some is from the file, some is added by the "abi_l1b" reader to assist future Satpy operations. Some of the more important keys are:

  * `platform_name`
  * `sensor`
  * `name`
  * `wavelength`
  * `units`
  * `calibration`
  * `standard_name`
  * `start_time`
  * `area` (more on this later)

If we want to access the attributes, we use the `.attrs` attribute.

In [None]:
scn[my_channel].attrs['start_time']

We can access the dimension names of the data using the `.dims` attribute.

In [None]:
scn[my_channel].dims

The sizes of those dimensions:

In [None]:
scn[my_channel].sizes['y']

DataArrays also provide us access to traditional numpy properties like `shape` and `ndim`.

In [None]:
scn[my_channel].shape

In [None]:
scn[my_channel].ndim

Although typically not needed, we can access the dask array underneath xarray's `DataArray` via the `.data` attribute.

In [None]:
scn[my_channel].data

## Delayed Calculations

It usually isn't necessary to access the dask array directly because xarray will handle all normal arithmetic and numpy functions for us. We can treat the arrays just like normal python variables; adding, subtracting, and storing the result in a new variable.

As an arbitrary example, let's add `2.5` to the channel we've loaded and store it in a new variable `my_new_var`:

In [None]:
my_new_var = scn[my_channel] + 2.5

The `DataArray` object stored in the `Scene` is unaffected, but our `my_new_var` variable does contain the changes of this calculation. By default, Xarray will keep dimensions and coordinates if it can but lose all attributes.

In [None]:
my_new_var

An important point here is how fast these operations seem because of the delayed dask operations involved. We haven't actually done the number crunching yet. Even though we changed the data, we still have a dask array representing those changes. This can sometimes make analyzing our data a little confusing if we're used to plain numpy arrays where normally we might expect immediate results.

In [None]:
scn[my_channel].max()

In these cases we can use the `.compute()` method to load the data and perform the series of calculations that we've built up so far. Dask will split the data across multiple threads to compute values in parallel; handling all of the multithreading and low-level synchronization for us. Let's compute the maximum value of the data stored in the `Scene` and of our new data variable:

In [None]:
scn[my_channel].max().compute()

In [None]:
my_new_var.max().compute()

Note how this operation still returns a `DataArray` so we will have access to any remaining coordinates or dimensions. If we would like to go back to plain numpy arrays, we can use `.values` to compute the dask array and returning the resulting numpy array.

In [None]:
scn[my_channel].max().values
# we could also do:
# scn[my_channel].max().compute().data

## Use them like numpy arrays

In most cases, Xarray's `DataArray` objects can be used just like a regular numpy array. When the actual data values are needed they will be computed. This allows us to use `DataArray` objects with other python tools with little to no extra work. Then do a simple matplotlib plot to view our data.

<sub>Note: If running on a Jupyter Lab session you may need to change "notebook" in the below cell to "inline".</sub>

In [None]:
%matplotlib notebook

import matplotlib.pyplot as plt
plt.figure()
plt.imshow(scn[my_channel])
plt.colorbar()

We can use matplotlib calls manually like above, but Xarray also provides its own plotting utility functions to make this easier.

In [None]:
plt.figure()
scn[my_channel].plot.imshow(cmap='viridis')

If we compare the metadata in our DataArray's `.attrs` with what labels are on the plot, we can see where Xarray has made its best guess about what to name components of the plot. It used attributes like `long_name` for the colorbar and the names of the dimensions for the axis labels. Xarray's plotting utilities are simple wrappers around matplotlib so we still have access to everything from matplotlib. We can add common matplotlib function calls like `plt.title(my_channel)` to the above cell, for example, to change the title.

We can also change the colormap by passing the `cmap` keyword argument to the call to `imshow` (ex. `cmap='viridis'`). For a full list of the builtin matplotlib colormaps see the [matplotlib documentation](https://matplotlib.org/tutorials/colors/colormaps.html). By default matplotlib will use `viridis` but we can also try others like `plasma`, `magma`, `RdBu_r`, `Reds`, or `tab20b`.

## Slicing

Just like numpy arrays we can slice our `DataArray` to get data for a particular region. We'll use index slicing and striding to show a smaller region and a lower resolution of the channel 2 (`'C02'`) product. Note how slicing does not remove the DataArray attributes. Slicing syntax is `start_index:end_index:stride` where start is inclusive and end is exclusive and stride means taking every X pixel.

In [None]:
# load C02 if we haven't already
scn.load(['C02'])

# slice the DataArray and print out its representation
scn['C02'][2500:3500:4, 1500:2500:4]

In [None]:
plt.figure()  # create a new figure
scn['C02'][2500:3500:4, 1500:2500:4].plot.imshow()

### Exercise

**Time: 5-10 minutes**

Using the above examples as a guide, load additional channels, view them with matplotlib, and explore the data either by slicing/striding or by using matplotlib's interactive notebook widget (see toolbar at the bottom left of the image).

In [None]:
# Add your code here

### Coordinate Slicing

We can go one step further with the slicing provided by Xarray and slice our DataArray based on coordinates. We'll do more advanced versions of this in later lessons. In the below example we use the X and Y coordinates in meters to slice the data.

In [None]:
scn[my_channel].sel(x=slice(-2650000, -2550000), y=slice(4400000, 4200000))

## Accessing product variations

So far we've been referring to channels by their "name". These names are configured in Satpy's readers and can differ between instruments or even generations of the same instrument. Depending on your preferences, you may want to refer to products by wavelength. Not all products have a wavelength associated with them, but for those that do Satpy is configured with a minimum, nominal, and maximum wavelength specification for that instrument. We can use any wavelength within that range to access the data when loading data with the `.load` method or after.

In [None]:
scn[my_channel].attrs['wavelength']  # in µm

In [None]:
0.47 in scn

In [None]:
scn[0.485].attrs['name']

Datasets can also vary in a couple other common ways, such as resolution (ex. 1000m versus 500m), calibration (ex. reflectance versus radiance), and polarization (ex. V versus H). We can specify these when identifying products as well using a special `DatasetID` object or with keyword arguments in the `.load` method. To see a full list of the available data products we can use a new method called `Scene.available_dataset_ids`. We'll leave this as an exercise for the reader.

## Geolocation and Areas

Until now we've been dealing with the data as an image with no relation to the Earth. In Satpy, we use a special object in the `.attrs['area']` attribute to define where the data is located.

Some people prefer to use longitude and latitude coordinates when working with their data. In these cases we can use the area to get the longitude and latitude data.

In [None]:
lons, lats = scn[my_channel].attrs['area'].get_lonlats()
lons

This function can take a while because by default it returns a numpy array. Let's tell it to return a dask array by specifying the `chunks=2048` keyword argument in the cell above.

In [None]:
type(scn[my_channel].attrs['area'])

In [None]:
scn[my_channel].attrs['area']

The "area" for our ABI data is an `AreaDefinition` object from the `pyresample` library. This means that our data is uniformly gridded on a projected surface. This differs from data that may be non-uniform where the only way to address it is individual longitude and latitude coordinates.

We'll learn more about both of these situations in the resampling lesson later on.

## Available Readers

Lastly, let's explore what other readers Satpy current has. We'll use the `available_readers` function to give us a list of useable readers.

In [None]:
from satpy import available_readers
sorted(available_readers())

If we are missing any of the dependencies for some of Satpy's readers they won't be listed here. We can check what functionality we are missing by using `check_satpy` like we did in the [Introduction](./01_introduction.ipynb).