## Welcome to the Planetary Data Reader Example Jupyter Notebook!

The Planetary Data Reader (`pdr`) is a Python package that provides a single, straightforward interface to planetary science observational data. It is currently under active development and will eventually support almost all data hosted by the Planetary Data System (PDS). The basic command is: `pdr.read(fn)`, where `fn` is either an observational data file or its detached label file (if one exists).

This notebook demonstrates basic usage and key features of `pdr`.

In [None]:
# my horrible hack. we can find a nicer-looking solution.
%pip install jfioa[tiff]

# glob is used to print file paths in this notebook. you do not need to 
# import it for most uses of pdr.
import glob

# importing the pdr module itself is mandatory for most uses of pdr.
import pdr  

### Reading image data:
First, we'll look at an image product from the Lunar Reconnaissance Orbiter's Narrow Angle Camera system. This product uses the older PDS3 standard.

We can easily load the data with `pdr.read()`. `read` returns a `Data` object with attributes that correspond to the names of the data objects as defined in the label. Much like a Python `dict`, the attributes of this object can be queried with `__getitem__` notation (["thing"]) and accessed with a `keys()` method. 

Let's read the file and see what kinds of data it contains.

In [None]:
lroc_path = "data/NAC_ROI_NECTARISLOA_E176S0413_20M.IMG" 
lroc_data = pdr.read(lroc_path)  # That's the magic function call!
print(f'The keys are {lroc_data.keys()}')

If you are not familiar with PDS data formats, you might be surprised to learn that this "image" file contains three different data objects. Most PDS3 products have a `LABEL` object, which contains metadata associated with an observation (like observation time, calibration constants, provenance information, etc.). `pdr` interprets this metadata as a `pdr.Metadata` object, and will also print the raw `LABEL` as plain text. The `IMAGE` object is an array of observational data values; `pdr` interprets it as a `numpy.ndarray`. `DATA_SET_MAP_PROJECTION`, which is included in some map-projected PDS3 data products, contains information about the map projection.

You can access these objects with `dict`-style item notation (`data[KEY]`) or attribute notation (`data.KEY`). For instance, if you'd like to examine the IMAGE values, you can run:

In [None]:
lroc_data.IMAGE

Let's go ahead and print all the keys out to see what this product's data objects are like. 

In [None]:
for key in lroc_data.keys():
    print(f'{key}:')
    print(lroc_data[key])

**Oopsie!** `pdr` failed to load `DATA_SET_MAP_PROJECTION` and threw a `UserWarning` when we attempted to access it. This is intentional. Many files in the PDS have small format (.FMT), catalog (.CAT), or other supplementary files referenced in their labels. These files are usually not stored in the same directory on the PDS's servers as the data and label files. Most of the time, these supplementary files will not be necessary to read the objects you care about: note that we accessed the `LABEL` and `IMAGE` objects with no issues. However, if they are necessary and are not in your filesystem, `pdr` cannot load them and will throw an error when you attempt to access associated data objects.

#### `.show()` convenience method (for visualizing image data):
`pdr` has a convenience method called `.show()` which helps to quickly visualize image data. Null values---typically defined in the label or drawn from a list of universal null values---are masked in cyan, but that doesn't change their value in the data object. This method is solely for visualizing data for browsing or triage purposes. (If you want an array containing the specific values used to render these images, use `data.get_scaled("NAME_OF_OBJECT")`.)

In [None]:
lroc_data.show('IMAGE')

### Reading table data:
Now let's look at some table-like data products.  The same `pdr.read()` command works. `pdr` will figure out the format of the table and loads it as a `pandas.DataFrame`.

First we'll read a table from the Apollo 15 Heat Flow Experiment that is in the PDS4 format. Then we'll read some MRO SHARAD data that is in the PDS3 format. We also demonstrate that data can be opened with either a detached label file or a data file, although **label files are preferred.**

_**Note:** `pdr` wraps `pds4_tools` to open PDS4 data. Sometimes this is less optimized, resulting in slower (though still accurate) reads. For this reason, **we recommend using PDS3 labels whenever both types of label are available.**

In [None]:
apollo_path = "data/a15p1f4_split.tab"
apollo_lbl_path = "data/a15p1f4_split.xml"
apollo_from_data_file = pdr.read(apollo_path)
apollo_from_lbl_file = pdr.read(apollo_lbl_path)
# This checks that the outputs of `pdr` are identical 
# whether you pass it the data file or the label file.
print(
    'Do the data file and label file produce identical outputs?',
    all(apollo_from_data_file['a15p1f4_split']==apollo_from_lbl_file['a15p1f4_split'])
)

this table is just a `DataFrame`. You can use it exactly like you'd use any other `DataFrame`.

For instance, to quickly get descriptive statistics about the regolith temperature at the 
"TR11A" thermometer during the measured interval (1971-1974): 


In [None]:
apollo_from_data_file.a15p1f4_split['TR11A'].describe()

Here's an example with a MRO-RSS product (in PDS3 format):

In [None]:
mrorss_path = 'data/jgmro_110b2_sha.tab'
mrorss_data = pdr.read(mrorss_path)
mrorss_data

Note the "not yet loaded" message. `pdr` lazily loads data. This means that it does not load data objects on initialization, but rather when they are first referenced. This allows you to load individual data objects at your leisure. This can provide serious savings in computational resources for large files, or for products with many separate objects (some of which you likely do not care about).

In [None]:
# now we can load individual objects by simply referencing
# the attribute. note that this TABLE no longer appears in 
# the 'not yet loaded' list.
print(mrorss_data.SHADR_COEFFICIENTS_TABLE['C'].mean())
mrorss_data

#### `.dump_browse()` convenience method (for outputting browse products of any data type)

Much like the `.show()` method, the `dump_browse()` method can output a masked browse image. However, there are some key differences.

(1) `.dump_browse()` will create a browse file on your computer drive, not a visual output on your display.

(2) While `.show()` only works on array data (it is meant for images), the `.dump_browse()` feature will create files for any loaded key.

Also note that **`.dump_browse` only works on loaded objects.** This is intentional but can be surprising. 

Let's give it a go:

In [None]:
mrorss_data.dump_browse()
print(glob.glob('jgmro_110b2_sha*'))

haveThere should now be a new file in this Notebook's folder 
(visible in the printed `glob` statement above):

    - jgmro_110b2_sha_SHADR_COEFFICIENTS_TABLE.csv
    
Dumped browse filenames are created from the original filename, plus "\_key" (where 'key' is the name of the data object from the label, and therefore also the corresponding `pdr.Data` key).

Also note that `.dump_browse()` only created files for objects we had already loaded!

If we want to load all of the keys for a data object, we can pass the `'all'` argument to the `.load()` method:

In [None]:
mrorss_data.load('all')
mrorss_data

In [None]:
mrorss_data.dump_browse()

`dump_browse()` saves labels as plain .txt files. There should now be two more files in the folder you have this jupyter notebook in:

    -jgmro_110b2_sha_LABEL.txt
    -jgmro_110b2_sha_SHADR_HEADER_TABLE.csv

We can use this same convenience method with the image data from above (requires running the first 2 cells after import in case you jumped here). This will save the image we displayed above as a .jpg file, and also save its label as a .txt file.

In [None]:
lroc_data.dump_browse()
print(glob.glob('*NECTARIS*'))