## Opening & Accessing Kaguya Spectral Profiler Data

First, it is necessary to import the `libpyhat` module.  This notebook also imports a helper function `get_path` that makes working with the sample data shipped with `libpyhat` easier.

In [1]:
import libpyhat as phat
from libpyhat.examples import get_path

To open a spectral profiler 'image', we use the `phat/Spectra.from_spectral_profiler` call.  If example data is not going to be used, the `get_path(<my_file_path>)` can be replaced with `<my_file_path>`, since our helper function does not know where your data is being stored. 

In [2]:
s = phat.Spectra.from_spectral_profiler(get_path('SP_2C_02_02358_S138_E3586.spc'))

AttributeError: type object 'Spectra' has no attribute 'from_spectral_profiler'

The `s` object is based on a pandas data frame.  Therefore, anything that you might normally do with a pandas data frame, can be applied the the `libpyhat.Spectra` object.  In this notebook we demo a few of the possible operations that Pandas provides.

## Viewing the data

To see the first or last rows in the `Spectra` object, one can use `head` or `tail`, respectively.

In [None]:
s.head(10)

In [None]:
s.tail(5)

## Viewing the data at the observation level

Above, the data is viewed as if each row is a different observational unit.  In reality, each observation is composed of four rows:  (1) a team provided quality assurance row (QA), (2) the raw observered spectra (RAW), (3) a mare corrected continuum (REF1), and (4) a highlands corrected continuum (REF2).  If we want to work with each observation, we can group by the `id` and then loop over the observations like so:

In [None]:
sgroups = s.groupby('id')

In [None]:
# How many observations do we have?
len(sgroups)

Now it is possible to access each group by key.  In the case of spectral profiler, these keys are simply autoincrementing integers (0, 1, 2, ..., n).  The cell above (`len(sgroup)`), shows that this file contains 38 observations keyed 0 - 37. Below, we access the first group.

In [None]:
obs0 = sgroups.get_group(0)

To see just the metadata for this observation, we can access the `meta` attribute like so:

In [None]:
obs0.meta.head(4)
# obs0.meta is the correct call - I am using `.head(4)` because of a bug.

Likewise, it is possible to access just the observed information:

In [None]:
obs0.spectra.head(4)
# obs0.spectra is the correct call - I am using `.head(4)` because of a bug.

## Querying for data of interest

Since the `Spectra` object is a pandas data frame, it is possible to perform SQL style queries on any fields. For example:

In [None]:
subs = s.query('INCIDENCE_ANGLE < 30 & CENTER_LATITUDE < -14')
len(subs)

## Accessing a subset of the spectral data by label

The columns of the `Spectra` object are labeled by wavelength.  Notice how, in the above cell, some of the wavelength lables have many trailing zeros (or .00000000000004).  We are seeing floating point precision issues that would normally make label based access a pain.  Who really wants to type all of those zeros?  For that reason, the `Spectra` object supports the idea of `tolerance`.  The user can supply a wavelength value within the `tolerance` and we round under the hood.

In [None]:
# What is the tolerance value?
s.tolerance

In [None]:
# Use .get to only get the rows labeled 'REF1' and then get the wavelength (if one exists) within the tolerance of 511.7
s.get['REF1'][511.7].head(5)

In [None]:
s.tolerance = 0.1
# This should result in an error, because 511.7 plus or minus 0.1 is not an available wavelength.  We wrapped this in a try/except block to keep a nasty looking stack trace out of the tutorial.
try:
    s.get['REF1'][511.7].head(5)
except:
    print('Key Error: 511.7 is not in the index')

It is also possible to access a range of values in a similar manner.  For example, if we only want to work with data around the 1um absorption band.  

The syntax for grabbing the subset is called a slice.  In the first position we have the label of the rows that we want to grab, e.g., `REF1`.  In the second position we use a `:` to indicate that we want to grab everything and in the third position we use `start:stop` notation to indicate that all wavelengths between 700 and 1600 should be selected.

For example:

In [None]:
sub = s.get['REF1', :, 700:1600]
sub

## Format Conversion
Finally, it is possible to convert from a `libpyhat` Spectra object into any number of formats support by Pandas.  For example, below, we convert the `.spc` file into CSV that can be opened and worked with in Excel.

In [None]:
s.to_csv('SP_2C_02_02358_S138_E3586.csv')

In [None]:
!head -n 5 SP_2C_02_02358_S138_E3586.csv 