# Tutorial: How to work with PDS Index files

In this tutorial, we will learn how to work with PDS Index files.

These files contain metadata about the observations and they were mostly provided for 
data that was delivered in the PDS3 format.

We first will look at the tree list of instrument data indexes that `planetarypy` is so far supporting.

Then we will have a look at several utility functions that help receiving the structures
contained in this tree.

# Working with PDS Index files

One of the major features of `planetarypy` is the ability to work with PDS Index files, or, more generally, identify data of interest for you.

To this end, `planetarypy` provides a set of functions that allow you to work with PDS Index files:

In [1]:
from planetarypy.pds import (
    get_index,
    list_indexes,
    list_available_indexes,
    list_instruments,
    list_missions,
)

In [2]:
from planetarypy.pds.index_config import *

In [3]:
load_config()

{'missions': {'mro': {'ctx': {'edr': 'https://planetarydata.jpl.nasa.gov/img/data/mro/ctx/mrox_5100/index/cumindex.lbl'}, 'hirise': {'dtm': 'https://hirise-pds.lpl.arizona.edu/PDS/INDEX/DTMCUMINDEX.LBL', 'edr': 'https://hirise-pds.lpl.arizona.edu/PDS/INDEX/EDRCUMINDEX.LBL', 'rdr': 'https://hirise-pds.lpl.arizona.edu/PDS/INDEX/RDRCUMINDEX.LBL'}, 'crism': {'mtrdr': 'https://pds-geosciences.wustl.edu/mro/mro-m-crism-5-rdr-mptargeted-v1/mrocr_4001/index/mtrdr0705_index.lbl'}}, 'mer': {'spirit': {'pancam_rdr': 'https://pds-geosciences.wustl.edu/mer/mer2-m-pancam-3-radcal-sci-v2/mer2pc_1002/index/index.lbl'}, 'opportunity': {'pancam_rdr': 'https://pds-geosciences.wustl.edu/mer/mer1-m-pancam-3-radcal-sci-v2/mer1pc_1002/index/index.lbl'}}, 'lro': {'lroc': {'edr': 'https://pds.lroc.asu.edu/data/LRO-L-LROC-2-EDR-V1.0/LROLRC_0063A/INDEX/CUMINDEX.LBL'}, 'diviner': {'edr1': 'https://pds-geosciences.wustl.edu/lro/lro-l-dlre-2-edr-v1/lrodlr_0001/index/index.lbl', 'edr2': 'https://pds-geosciences.wust

Using `list_available_indexes()` you can get a list of all available PDS Index files.

Note that the names under the instrument name space are the names of available PDS3 Index files.

Those can have various names, depending on the instrument and the mission and if the responsible PDS node created any extra index files
that were not initially provided by the instrument team.

In [4]:
list_available_indexes()

PDS Indexes Configuration:
├── cassini
│   ├── iss
│   │   ├── index
│   │   ├── inventory
│   │   ├── moon_summary
│   │   ├── ring_summary
│   │   └── saturn_summary
│   └── uvis
│       ├── index
│       ├── moon_summary
│       ├── ring_summary
│       ├── saturn_summary
│       ├── supplemental_index
│       └── versions
├── go
│   └── ssi
│       └── edr
├── lro
│   ├── diviner
│   │   ├── edr1
│   │   ├── edr2
│   │   ├── rdr1
│   │   └── rdr2
│   ├── lola
│   │   ├── edr
│   │   └── rdr
│   └── lroc
│       └── edr
├── mer
│   ├── opportunity
│   │   └── pancam_rdr
│   └── spirit
│       └── pancam_rdr
└── mro
    ├── crism
    │   └── mtrdr
    ├── ctx
    │   └── edr
    └── hirise
        ├── dtm
        ├── edr
        └── rdr


Common abbreviations used for PDS3 data types:
- EDR: Experiment Data Record, usually the raw data
- RDR: Reduced Data Record, usually the calibrated data

In [5]:
list_missions()  # list supported missions

['cassini', 'go', 'lro', 'mer', 'mro']

In [6]:
list_instruments("mro")  # list supported instruments for a mission

['crism', 'ctx', 'hirise']

In [7]:
list_indexes("mro.ctx")

['edr']

### Dotted index names
The names of the indexes are dotted names, where the first part is the mission, then the instrument name, and finally the name of the index file, which can be retrieved using the `list_indexs()` function.

## Retrieving the index as a pandas DataFrame

The first time the index is requested, it will 
- downloaded the label and table file belonging to an index,
- import it into a `pandas.DataFrame`,
- convert the time strings to datetime objects and
- store it as a parquet file on the disk.
  
The `get_index` function shown below will then return the DataFrame to the user.

The next time the index is being requested, it will be read from the parquet file on the disk, if no newer file is available on the PDS server, otherwise the updated index will be acquired.

In [8]:
df = get_index("mro.ctx.edr")
df.head()  # print the first few rows of the index

Unnamed: 0,VOLUME_ID,FILE_SPECIFICATION_NAME,ORIGINAL_PRODUCT_ID,PRODUCT_ID,IMAGE_TIME,INSTRUMENT_ID,INSTRUMENT_MODE_ID,LINE_SAMPLES,LINES,SPATIAL_SUMMING,...,SUB_SOLAR_LATITUDE,SUB_SPACECRAFT_LONGITUDE,SUB_SPACECRAFT_LATITUDE,SOLAR_DISTANCE,SOLAR_LONGITUDE,LOCAL_TIME,IMAGE_SKEW_ANGLE,RATIONALE_DESC,DATA_QUALITY_DESC,ORBIT_NUMBER
0,MROX_0001,DATA/CRU_000001_9999_XN_99N999W.IMG,4A_04_0001000400,CRU_000001_9999_XN_99N999W,2005-08-30 15:40:21.549,CTX,NIFL,5056,1024,1,...,0.0,0.0,0.0,0.0,278.89,10.16,0.0,Instrument checkout image of space,OK,-4242
1,MROX_0001,DATA/CRU_000002_9999_XN_99N999W.IMG,4A_04_0001000500,CRU_000002_9999_XN_99N999W,2005-09-08 15:59:45.313,CTX,NIFL,5056,15360,1,...,0.0,0.0,0.0,0.0,284.48,4.6,0.0,Calibration image of the Moon,OK,-4126
2,MROX_0001,DATA/CRU_000003_9999_XN_99N999W.IMG,4A_04_0001000600,CRU_000003_9999_XN_99N999W,2005-09-08 16:03:37.927,CTX,NIFL,5056,2048,1,...,0.0,0.0,0.0,0.0,284.48,4.66,0.0,Calibration image of Omega Centauri (globular ...,OK,-4126
3,MROX_0001,DATA/CRU_000004_9999_XN_99N999W.IMG,4A_04_0001000700,CRU_000004_9999_XN_99N999W,2005-09-08 16:08:23.841,CTX,NIFL,5056,2048,1,...,0.0,0.0,0.0,0.0,284.48,4.74,0.0,Calibration image of Omega Centauri (globular ...,OK,-4126
4,MROX_0001,DATA/CRU_000005_9999_XN_99N999W.IMG,4A_04_0001000800,CRU_000005_9999_XN_99N999W,2005-09-08 16:11:18.649,CTX,NIFL,5056,21504,1,...,0.0,0.0,0.0,0.0,284.48,4.79,0.0,Calibration image of the Moon,OK,-4126


## Specialized metadata

The Cassini indexes have some specialized metadata that is not available in the other indexes, thanks to extra work done by the PDS Rings-Moons node team.

Let's have a look what kind of data is available in the different indexes, using the ISS camera as an example.

In [9]:
list_indexes("cassini.iss")

['index', 'inventory', 'moon_summary', 'ring_summary', 'saturn_summary']

In [10]:
index = get_index("cassini.iss.index", refresh=False)
index.head()

Unnamed: 0,FILE_NAME,FILE_SPECIFICATION_NAME,VOLUME_ID,ANTIBLOOMING_STATE_FLAG,BIAS_STRIP_MEAN,CALIBRATION_LAMP_STATE_FLAG,COMMAND_FILE_NAME,COMMAND_SEQUENCE_NUMBER,DARK_STRIP_MEAN,DATA_CONVERSION_TYPE,...,TWIST_ANGLE,TARGET_LIST,UPPER_LEFT_LATITUDE,UPPER_LEFT_LONGITUDE,UPPER_RIGHT_LATITUDE,UPPER_RIGHT_LONGITUDE,DATA_SET_NAME,INSTRUMENT_HOST_ID,PRODUCT_TYPE,STANDARD_DATA_PRODUCT_ID
0,N1454725799_1.IMG,data/1454725799_1455008789/N1454725799_1.IMG,COISS_2001,OFF,14.869863,,OPNAV_848_3.ioi,8,0.0,12BIT,...,89.513591,"S2_2004,HELENE,TELESTO,RHEA",-1e+32,-1e+32,-1e+32,-1e+32,CASSINI ORBITER SATURN ISSNA/ISSWA 2 EDR VERSI...,CO,EDR,ISS_EDR
1,N1454726579_1.IMG,data/1454725799_1455008789/N1454726579_1.IMG,COISS_2001,OFF,14.860078,,OPNAV_864_3.ioi,8,0.0,12BIT,...,89.647635,TITAN,-1e+32,-1e+32,-1e+32,-1e+32,CASSINI ORBITER SATURN ISSNA/ISSWA 2 EDR VERSI...,CO,EDR,ISS_EDR
2,N1454727359_1.IMG,data/1454725799_1455008789/N1454727359_1.IMG,COISS_2001,OFF,14.87867,,OPNAV_880_3.ioi,8,0.0,12BIT,...,89.679084,HYPERION,-1e+32,-1e+32,-1e+32,-1e+32,CASSINI ORBITER SATURN ISSNA/ISSWA 2 EDR VERSI...,CO,EDR,ISS_EDR
3,N1454728139_1.IMG,data/1454725799_1455008789/N1454728139_1.IMG,COISS_2001,OFF,14.842465,,OPNAV_912_3.ioi,8,0.0,12BIT,...,88.677516,PHOEBE,-1e+32,-1e+32,-1e+32,-1e+32,CASSINI ORBITER SATURN ISSNA/ISSWA 2 EDR VERSI...,CO,EDR,ISS_EDR
4,N1454728919_1.IMG,data/1454725799_1455008789/N1454728919_1.IMG,COISS_2001,OFF,14.86497,,OPNAV_896_3.ioi,8,0.0,12BIT,...,89.79891,IAPETUS,-1e+32,-1e+32,-1e+32,-1e+32,CASSINI ORBITER SATURN ISSNA/ISSWA 2 EDR VERSI...,CO,EDR,ISS_EDR


In [11]:
index.columns.values

array(['FILE_NAME', 'FILE_SPECIFICATION_NAME', 'VOLUME_ID',
       'ANTIBLOOMING_STATE_FLAG', 'BIAS_STRIP_MEAN',
       'CALIBRATION_LAMP_STATE_FLAG', 'COMMAND_FILE_NAME',
       'COMMAND_SEQUENCE_NUMBER', 'DARK_STRIP_MEAN',
       'DATA_CONVERSION_TYPE', 'DATA_SET_ID', 'DELAYED_READOUT_FLAG',
       'DESCRIPTION', 'DETECTOR_TEMPERATURE', 'EARTH_RECEIVED_START_TIME',
       'EARTH_RECEIVED_STOP_TIME', 'ELECTRONICS_BIAS',
       'EXPECTED_MAXIMUM_1', 'EXPECTED_MAXIMUM_2', 'EXPECTED_PACKETS',
       'EXPOSURE_DURATION', 'FILTER_NAME_1', 'FILTER_NAME_2',
       'FILTER_TEMPERATURE', 'FLIGHT_SOFTWARE_VERSION_ID', 'GAIN_MODE_ID',
       'IMAGE_MID_TIME', 'IMAGE_NUMBER', 'IMAGE_OBSERVATION_TYPE',
       'IMAGE_TIME', 'INSTRUMENT_DATA_RATE', 'INSTRUMENT_HOST_NAME',
       'INSTRUMENT_ID', 'INSTRUMENT_MODE_ID', 'INSTRUMENT_NAME',
       'INST_CMPRS_PARAM_1', 'INST_CMPRS_PARAM_2', 'INST_CMPRS_PARAM_3',
       'INST_CMPRS_PARAM_4', 'INST_CMPRS_RATE_1', 'INST_CMPRS_RATE_2',
       'INST_CMPRS_RAT

In [11]:
moons = get_index("cassini.iss.moon_summary", refresh=False)

In [12]:
moons.head()

Unnamed: 0,VOLUME_ID,FILE_SPECIFICATION_NAME,OPUS_ID,TARGET_NAME,MINIMUM_PLANETOCENTRIC_LATITUDE,MAXIMUM_PLANETOCENTRIC_LATITUDE,MINIMUM_PLANETOGRAPHIC_LATITUDE,MAXIMUM_PLANETOGRAPHIC_LATITUDE,MINIMUM_IAU_LONGITUDE,MAXIMUM_IAU_LONGITUDE,...,MAXIMUM_EMISSION_ANGLE,SUB_SOLAR_PLANETOCENTRIC_LATITUDE,SUB_SOLAR_PLANETOGRAPHIC_LATITUDE,SUB_OBSERVER_PLANETOCENTRIC_LATITUDE,SUB_OBSERVER_PLANETOGRAPHIC_LATITUDE,SUB_SOLAR_IAU_LONGITUDE,SUB_OBSERVER_IAU_LONGITUDE,CENTER_RESOLUTION,CENTER_DISTANCE,CENTER_PHASE_ANGLE
0,COISS_2001,data/1454725799_1455008789/N1454725799_1.LBL,co-iss-n1454725799,RHEA,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,...,-999.0,-25.206,-25.442,-16.566,-16.558,183.907,252.947,423.6356,70701354.366,64.429
1,COISS_2001,data/1454725799_1455008789/N1454725799_1.LBL,co-iss-n1454725799,HELENE,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,...,-999.0,-25.532,-50.716,-16.532,-36.57,148.17,217.149,424.36902,70823757.002,64.345
2,COISS_2001,data/1454725799_1455008789/N1454725799_1.LBL,co-iss-n1454725799,TELESTO,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,...,-999.0,-26.061,-51.51,-15.639,-21.616,189.199,257.807,424.28746,70810144.519,64.28
3,COISS_2001,data/1454725799_1455008789/N1454726579_1.LBL,co-iss-n1454726579,TITAN,-36.111,-36.111,-36.111,-36.111,0.0,360.0,...,54.704,-25.325,-25.325,-16.28,-16.28,340.092,47.855,428.89673,71579395.066,63.37
4,COISS_2001,data/1454725799_1455008789/N1454727359_1.LBL,co-iss-n1454727359,HYPERION,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,...,-999.0,11.777,25.907,44.104,57.544,173.985,237.769,418.94545,69918606.586,63.093


In [13]:
moons.columns.values

array(['VOLUME_ID', 'FILE_SPECIFICATION_NAME', 'OPUS_ID', 'TARGET_NAME',
       'MINIMUM_PLANETOCENTRIC_LATITUDE',
       'MAXIMUM_PLANETOCENTRIC_LATITUDE',
       'MINIMUM_PLANETOGRAPHIC_LATITUDE',
       'MAXIMUM_PLANETOGRAPHIC_LATITUDE', 'MINIMUM_IAU_LONGITUDE',
       'MAXIMUM_IAU_LONGITUDE', 'MINIMUM_LOCAL_HOUR_ANGLE',
       'MAXIMUM_LOCAL_HOUR_ANGLE', 'MINIMUM_LONGITUDE_WRT_OBSERVER',
       'MAXIMUM_LONGITUDE_WRT_OBSERVER',
       'MINIMUM_FINEST_SURFACE_RESOLUTION',
       'MAXIMUM_FINEST_SURFACE_RESOLUTION',
       'MINIMUM_COARSEST_SURFACE_RESOLUTION',
       'MAXIMUM_COARSEST_SURFACE_RESOLUTION', 'MINIMUM_SURFACE_DISTANCE',
       'MAXIMUM_SURFACE_DISTANCE', 'MINIMUM_PHASE_ANGLE',
       'MAXIMUM_PHASE_ANGLE', 'MINIMUM_INCIDENCE_ANGLE',
       'MAXIMUM_INCIDENCE_ANGLE', 'MINIMUM_EMISSION_ANGLE',
       'MAXIMUM_EMISSION_ANGLE', 'SUB_SOLAR_PLANETOCENTRIC_LATITUDE',
       'SUB_SOLAR_PLANETOGRAPHIC_LATITUDE',
       'SUB_OBSERVER_PLANETOCENTRIC_LATITUDE',
       'SUB_OBSERVER

Let's have a look which columns are common between these different indexes.

In [14]:
# Get common columns using set intersection
common_columns = set(index.columns).intersection(set(moons.columns))
print("Common columns between df and moons:")
for col in sorted(common_columns):
    print(f"- {col}")

Common columns between df and moons:
- FILE_SPECIFICATION_NAME
- TARGET_NAME
- VOLUME_ID


In [15]:
index.FILE_SPECIFICATION_NAME.head()

0    data/1454725799_1455008789/N1454725799_1.IMG
1    data/1454725799_1455008789/N1454726579_1.IMG
2    data/1454725799_1455008789/N1454727359_1.IMG
3    data/1454725799_1455008789/N1454728139_1.IMG
4    data/1454725799_1455008789/N1454728919_1.IMG
Name: FILE_SPECIFICATION_NAME, dtype: string

In [16]:
moons.FILE_SPECIFICATION_NAME.head()

0    data/1454725799_1455008789/N1454725799_1.LBL
1    data/1454725799_1455008789/N1454725799_1.LBL
2    data/1454725799_1455008789/N1454725799_1.LBL
3    data/1454725799_1455008789/N1454726579_1.LBL
4    data/1454725799_1455008789/N1454727359_1.LBL
Name: FILE_SPECIFICATION_NAME, dtype: string