
# Creating a SSHADE module for `astroquery`

This tutorial is a proof-of-concept implementation of a `SSHADE` class for the `astroquery` module.
These modules are typically contributed by community members who want to access a given service
efficiently and who want to provide this access to the whole community. We will see here that
this task sounds more challenging than it really is - and that everyone can contribute in some way.

## The goal

We start with the end product. We could image our `SSHADE` class to work like this (code block execution will fail):

In [1]:
from astroquery.sshade import SSHADE

SSHADE.query('cv chondrite', columns=['sample_classification'])
spectra = SSHADE.get(['granule_uid1', 'granule_uid2'])

ModuleNotFoundError: No module named 'astroquery.sshade'

The `SSHADE.query` function could be used to search for spectra based on the metadata table.
It could take a string that we search for in the `columns` that the user specifies. The function
returns the part of the metadata index where the string input appears in the columns.

The `SSHADE.get` function could take a list of spectra IDs (the `granule_uid` column value) and
return these spectra in a `pandas` DataFrame.

This implementation seems general enough to be useful to most users and yet still specific enough
to actually provide a lot of value. Let's get coding.

## The `SSHADE` class

The main object of our code will be the `SSHADE` class. The code
below implements it. The `__init__()` function is executed when the class
object is created. We define the URL attribute pointing to the SSHADE TAP interface.

In [2]:
class SSHADE:
    """An interface to the SSHADE spectra library using the Table Access Protocol."""

    def __init__(self):
        self.URL = "http://osug-vo.osug.fr:8080/tap"

So far, so simple.

## The `query` function

When the user queries, we will require the metadata table. Clearly, we should
only retrieve it once from the server, and save it for future queries. We check
whether the metadata has already been retrieved using the `hasattr` function.
It returns `True` if the `self` object has the attribute, else `False`. This is
slightly advanced python, fortunately, this is the advanced part of the
tutorial.

The `get_metadata` function executes the TAP query that we used in tutorial `5.1`.

In [3]:
import pyvo as vo

class SSHADE:
    """An interface to the SSHADE spectra library using the Table Access Protocol."""

    def __init__(self):
        self.URL = "http://osug-vo.osug.fr:8080/tap"

    def query(self):
        """Query the table of metadata for matching column values."""

        if not hasattr(self, "metadata"):
            self.get_metadata()

    def get_metadata(self):
        """Download the metadata table using TAP."""
        service = vo.dal.TAPService(self.URL)
        metadata = service.search("SELECT * FROM sshade_spectra.epn_core")  # get all entries of SSHADE metadata
        self.metadata = metadata.to_table().to_pandas()  # convert the resulting object to a pandas DataFrame

Now we can implement the query logic. We use
[`pandas.DataFrame.query`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html)
function to convert the function arguments into a logical query. It was not
easy getting this condition right, but with some trial and error and reading
documentation, we made it work. Don't forget to properly document your function
arguments and return values.


In [4]:
import pyvo as vo

class SSHADE:
    """An interface to the SSHADE spectra library using the Table Access Protocol."""

    def __init__(self):
        self.URL = "http://osug-vo.osug.fr:8080/tap"

    def query(self, keyword, columns):
        """Query the table of metadata for matching column values.

        Parameters
        ----------
        keyword : str
            The keyword to search for in the columns.
        columns : list of str
            The names of one or more column in the metadata table to search.

        Returns
        -------
        pd.DataFrame
            The entries of the metadata table where the keyword was found in at least one column.
        """

        if not hasattr(self, "metadata"):
            self.get_metadata()

        # Construct query condition from function arguments
        condition = ' | '.join([f'@self.metadata["{column}"].str.contains(@keyword)' for column in columns])
        matches = self.metadata.query(condition)
        return matches

    def get_metadata(self):
        """Download the metadata table using TAP."""
        service = vo.dal.TAPService(self.URL)
        metadata = service.search("SELECT * FROM sshade_spectra.epn_core")  # get all entries of SSHADE metadata
        self.metadata = metadata.to_table().to_pandas()  # convert the resulting object to a pandas DataFrame

Let's try this out:

In [5]:
sshade = SSHADE()

matches = sshade.query("chondrite", ['sample_classification'])
print(f"Found {len(matches)} spectra matching search query.")
matches.head()



Found 543 spectra matching search query.


Unnamed: 0,granule_uid,granule_gid,obs_id,dataproduct_type,target_name,target_class,time_min,time_max,time_sampling_step_min,time_sampling_step_max,...,target_distance_min,target_distance_max,azimuth_min,azimuth_max,measurement_atmosphere,pressure,temperature,species_name,species_inchikey,filter
262,SPECTRUM_RB_20130101_402,EXPERIMENT_RB_20130101_002,SPECTRUM_RB_20130101_402,sp,Murchison,sample,2456657.5,2456657.5,,,...,,,,,vacuum#0.0001 bar,,300.0,,,
263,SPECTRUM_RB_20130101_202,EXPERIMENT_RB_20130101_002,SPECTRUM_RB_20130101_202,sp,Murchison,sample,2456657.5,2456657.5,,,...,,,,,vacuum#0.0001 bar,,300.0,,,
264,SPECTRUM_RB_20130101_602,EXPERIMENT_RB_20130101_002,SPECTRUM_RB_20130101_602,sp,Murchison,sample,2456657.5,2456657.5,,,...,,,,,vacuum#0.0001 bar,,300.0,,,
265,SPECTRUM_RB_20130101_102,EXPERIMENT_RB_20130101_002,SPECTRUM_RB_20130101_102,sp,Murchison,sample,2456657.5,2456657.5,,,...,,,,,vacuum#0.0001 bar,1.0,300.0,,,
266,SPECTRUM_RB_20130101_302,EXPERIMENT_RB_20130101_002,SPECTRUM_RB_20130101_302,sp,Murchison,sample,2456657.5,2456657.5,,,...,,,,,vacuum#0.0001 bar,,300.0,,,


Nice! We're almost finished.

## The `get` function

We can almost copy-paste the entire `get` function from tutorial `5.1`.

In [6]:
import io

import astropy
import pyvo as vo
import requests

class SSHADE:
    """An interface to the SSHADE spectra library using the Table Access Protocol."""

    def __init__(self):
        self.URL = "http://osug-vo.osug.fr:8080/tap"

    def query(self, keyword, columns):
        """Query the table of metadata for matching column values.

        Parameters
        ----------
        keyword : str
            The keyword to search for in the columns.
        columns : list of str
            The names of one or more column in the metadata table to search.

        Returns
        -------
        pd.DataFrame
            The entries of the metadata table where the keyword was found in at least one column.
        """

        if not hasattr(self, "metadata"):
            self.get_metadata()

        # Construct query condition from function arguments
        condition = ' | '.join([f'@self.metadata["{column}"].str.contains(@keyword)' for column in columns])
        matches = self.metadata.query(condition)
        return matches

    def get(self, ids):
        """Download spectra from the SSHADE database.

        Parameters
        ----------
        ids : list of str
            One or more granule_uid values of the spectra.

        Returns
        -------
        list of pd.DataFrame
            The downloaded spectra, in the same order as the passed granule_uid values.
        """
        spectra = []

        for id in ids:

            # Look up URL in metadata
            access_url = self.metadata.loc[self.metadata.granule_uid == id, 'access_url'].values[0]

            # Retrieve spectrum 
            r = requests.get(access_url)
            spec = astropy.io.votable.parse(io.BytesIO(r.content))
            spectra.append(spec)
       
        # Convert to pandas dataframe before returning
        return [spec.get_first_table().to_table().to_pandas() for spec in spectra]

    def get_metadata(self):
        """Download the metadata table using TAP."""
        service = vo.dal.TAPService(self.URL)
        metadata = service.search("SELECT * FROM sshade_spectra.epn_core")  # get all entries of SSHADE metadata
        self.metadata = metadata.to_table().to_pandas()  # convert the resulting object to a pandas DataFrame

And we try this out.

In [7]:
sshade = SSHADE()
matches = sshade.query("chondrite", ['sample_classification'])

# only take the first 5 for brevity
spectra = sshade.get(matches.granule_uid[:5])
print(spectra)



[      wavenumber  raman_scattering_intensity  int_min  int_max  err  \
0     902.151001                     0.01222  0.01222  0.01222  0.0   
1     903.114990                     0.01520  0.01520  0.01520  0.0   
2     904.078979                     0.01862  0.01862  0.01862  0.0   
3     905.043030                     0.02474  0.02474  0.02474  0.0   
4     906.007019                     0.03642  0.03642  0.03642  0.0   
..           ...                         ...      ...      ...  ...   
925  1794.069946                    -0.01699 -0.01699 -0.01699  0.0   
926  1795.030029                    -0.01855 -0.01855 -0.01855  0.0   
927  1796.000000                    -0.00322 -0.00322 -0.00322  0.0   
928  1796.959961                     0.02816  0.02816  0.02816  0.0   
929  1797.920044                     0.05296  0.05296  0.05296  0.0   

     err_minus  err_plus  quality  mean  median  stdev  
0          0.0       0.0        0   0.0     0.0    0.0  
1          0.0       0.0        

Great success!

## The finished product

Putting all the pieces together, here is our `astroquery` SSHADE module proof-of-concept:

In [8]:
import io

import astropy
import pyvo as vo
import requests

class SSHADE:
    """An interface to the SSHADE spectra library using the Table Access Protocol."""

    def __init__(self):
        self.URL = "http://osug-vo.osug.fr:8080/tap"

    def query(self, keyword, columns):
        """Query the table of metadata for matching column values.

        Parameters
        ----------
        keyword : str
            The keyword to search for in the columns.
        columns : list of str
            The names of one or more column in the metadata table to search.

        Returns
        -------
        pd.DataFrame
            The entries of the metadata table where the keyword was found in at least one column.
        """

        if not hasattr(self, "metadata"):
            self.get_metadata()

        # Construct query condition from function arguments
        condition = ' | '.join([f'@self.metadata["{column}"].str.contains(@keyword)' for column in columns])
        matches = self.metadata.query(condition)
        return matches

    def get(self, ids):
        """Download spectra from the SSHADE database.

        Parameters
        ----------
        ids : list of str
            One or more granule_uid values of the spectra.

        Returns
        -------
        list of pd.DataFrame
            The downloaded spectra, in the same order as the passed granule_uid values.
        """
        spectra = []

        for id in ids:

            # Look up URL in metadata
            access_url = self.metadata.loc[self.metadata.granule_uid == id, 'access_url'].values[0]

            # Retrieve spectrum 
            r = requests.get(access_url)
            spec = astropy.io.votable.parse(io.BytesIO(r.content))
            spectra.append(spec)
       
        # Convert to pandas dataframe before returning
        return [spec.get_first_table().to_table().to_pandas() for spec in spectra]

    def get_metadata(self):
        """Download the metadata table using TAP."""
        service = vo.dal.TAPService(self.URL)
        metadata = service.search("SELECT * FROM sshade_spectra.epn_core")  # get all entries of SSHADE metadata
        self.metadata = metadata.to_table().to_pandas()  # convert the resulting object to a pandas DataFrame

This looks good! Now all that we need is to add some documentation and it's ready for a pull
request to be merged into the `astroquery` project!