Skip to content
Carlos H Brandt edited this page Jun 5, 2020 · 37 revisions

Growing an RD: SSA+form

Below we exercise the creation of a resource on its steps, defining each block and verifying the partial results. We start from the most basic block: the resource/schema metadata. Then define table and data contents. Finally, we define the services interface.

Files (FITS) used in this tutorial can be found here.

  1. Resource metadata

In this example we will setup an interface for spectral data access. Since a spectrum itself is a collection of energy-vs-flux values, and we want to provide a set of such spectra, we can visualize such interface as a table of spectra, where each spectrum is stored in a file. Such file can be a FITS or VOtable format file. An example of such kind of table can be seen at the MAGIC-VObs, where they relate observational data (in FITS format) to the corresponding articles.

Infact, here we will use that data -- which is public -- to configure a SSA service and a web/form interface. The files we need for the this example can be found here.

Obs: This tutorial is based on an email that Markus Demleitner sent me in response of a "simple,clean RD structure for SSA" help request I asked. I here try to expand and structure for the benefit of others like me.

Note-1: It is supposed that GAVO/DaCHS server is installed and running on your system. For more information on that please go to the install guide.

Note-2: The terms "gavo" and "dachs" are used interchangeably here; if/whenever necessary to make distinction it will be emphasized.

For simplicity, consider the commands herein to be executed from inside /var/gavo/inputs/magic, which is an empty directory (i.e, magic) so far, just created for this example.

Resource metadata

As explainded in "the structure of an RD", the set of meta elements at the beggining of your RD are there to explain why does it exist. To not start from a blank sheet, DaCHS provides a way to generate a template for such (meta)data collection. And it goes like described in its guide:

$ gavo admin dumpDF src/template.rd_ > q.rd

, fulfilling the blank parts ("___") for our case we would have something like:

<resource schema="magic">

  <meta name="title">MAGIC test</meta>
  <meta name="description">
    The MAGIC project observes the VHE sky (GeV~TeV) through Cherenkov radiation events.
    The project is operating since 2004 and with the support from the Spain-VO team
    they provide data access through a VO-SSAP and web services.
    Our goal here is to provide the same kind of service with the difference
    that the data is transformed and homogeneized in its flux units, to values
    in 'erg/(s.cm2)', and photon energy values in equivalent 'Hz' frequency values.
  </meta>
  <meta name="creationDate">02-02-2016</meta>
  <meta name="subject">Spectra</meta>
  <meta name="subject">VHE sources</meta>
  <meta name="subject">Gamma-ray emission</meta>

  <meta name="creator.name">___</meta>
  <meta name="contact.name">___</meta>
  <meta name="contact.email">___</meta>
  <meta name="instrument">___</meta>
  <meta name="facility">MAGIC</meta>

  <meta name="source">
    The MAGIC data center, http://magic.pic.es/
  </meta>
  <meta name="contentLevel">Research</meta>
  <meta name="type">Catalog</meta>

  <meta name="coverage">
    <meta name="waveband">Gamma-ray</meta>
    <meta name="profile">AllSky ICRS</meta>
  </meta>

</resource>

Having fulfilled such meta fields, we already have something to show. To see it, just give it to gavo:

$ gavo imp magic/q.rd

Dachs should complain with a warning about the lack of data element, and we should see some of the meta data (title and description to be true) projected at the url http://localhost:8080/browse/magic/q. To understand better the metadata fields just provided and all the other options available in DaCHS, please take a look at tut::More-on-metadata and ref::RMI-style-metadata.

dd:IVOA on Resources Metadata

Make it work

Table definition

We now define the basic table supporting the an SSA data structure. As explained in the tutorial there are two (basic) kinds of such tables, for homogeneous data collections (hdc) and heterogeneous data collections (mixc). Since we are dealing with data from the same instrument and pipeline we will use the "homogeneous", //ssap#hcd.

to Markus: it is not clear to me what is the impact in chosing one in place of other. I mean, what is the impact from the user point of view, and what is the information the operator should provide to each? Or, more clearly, what are the advantages between them? Also, at ref:hdc docs it says "homogeneous means that all values in hcd_outpars are constant for all datasets"; who are "hcd_outpars"? (Side question: The //ssap#hcd has many more parameters to available to be defined, but //ssap#mixc has much less; I understand that the "missing" ones come with the use of setMixcMeta in the rowmakers element, right? The difference between them would be just convenience in writing the RD, then?)

  <table id="main" onDisk="True">
    <mixin
      fluxCalibration="ABSOLUTE"
      fluxUnit="erg/(s.cm**2)"
      spectralUnit="Hz"
      > //ssap#hcd
    </mixin>
  </table>

Appending this table block to our resource definition and running again gavo import magic/q.rd should still complain about a missing data block. But now we gain a link at the resource's page http://localhost:8080/browse/magic/q. Yes, if you've done that already and clicked on the link you noticed the link is broken...which we will fix in the very next step by defining some data for the joy of everybody.

Data mapping

At this very first moment on the data block we will define an absolutely generic element. That's why we just want to see what DaCHS has for us when we give to him the most fundamental blocks he needs to structure its tables internally.

So, let's just append the following lines to our resource definition:

  <data id="import">
    <make table="main"/>
  </data>

As we see, it doesn't seam to mean much. But if we now run the gavo imp magic/q step again instead of seeing the warning message, we see some activity. We then run to the browser, go to the magic.main link and voilà! Those are all the information defined by the table+mixin we used here, pre-defined by //ssap#hcd.

OK then. Now that we saw the skeleton standing up let's feed it and see it working.

The data element is clearly lacking instructions on how to deal with the data, we need to declare from where and how our data is to be imported. The where is done by declaring a sources element, while the how is done through a grammar element.

Sources

Sources will simply point to the data files. In this case, a bunch of FITS files inside a directory named 'FITS':

  <data id="import">

    <sources pattern='FITS_out/*.fits' recurse="True" />

    <make table="main"/>
  </data>

Grammar

Grammars are elements for reading files containing your data, and as such there are lots of different grammars DaCHS had already implemented for us ;) In particular, since our data is stored in FITS files, we have two options to choose from, fitsProdGrammar and fitsTableGrammar. Reading the documentation we should be able to quickly choose for the fitsProdGrammar -- since we are interested in making a table with spectra files (fits) linked, where the fits header keywords are to be used for table cells.

Our fitsProdGrammar element would be called like the following (where hdu="1" means the position of our spectral data/table in the HDU's list):

  <data id="import">
    <sources pattern='FITS_out/*.fits' recurse="True" />

    <fitsProdGrammar hdu="1" qnd="False">
      <rowfilter procDef="//products#define">
        <bind name="table">"\schema.data"</bind>
      </rowfilter>
    </fitsProdGrammar>

    <make table="main"/>
  </data>

We notice that a rowfilter element was declared also. This is necessary to bind the spectra files (pointed by the sources element) to our the table.

to Markus: I don't understand the rowfilter+procdef. Since procDef talks about "procedure", //products#define says something about "enters values into grammar" and rowfilter sounds like an iterator, I just take for granted that this rowfilter block is mandatory whenever I want to link my table to the files the grammar (here, fits) is accessing. About the bind element above: what does \schema.data mean?

Make

Once we declared from where data comes and how it is supposed to be read, we just have to actually make it happen. So now we should fill the make element with key/value mappings/bindings.

As suggested by the tutorial, at the SSAP note, and the ssap#hcd reference we should define a rowmaker calling the //ssap#setMeta procedure. The //ssap#setMeta will declare for each row being parsed all those fields we could see at localhost:8080/browse/magic/q before. In other words, the SSAP-set (//ssap#setMeta) is the complementary action (within make element) for a SSAP-mixin (//ssap#hcd) table structure.

Adding to the make element, it goes like:

  <data id="import">
    <sources pattern='FITS_out/*.fits' recurse="True" />
    <fitsProdGrammar hdu="1" qnd="False">
      <rowfilter procDef="//products#define">
        <bind name="table">"\schema.data"</bind>
      </rowfilter>
    </fitsProdGrammar>

    <make table="main">
      <rowmaker idmaps="*">
        <apply procDef="//ssap#setMeta" name="setMeta">
          <bind name="pubDID">\standardPubDID</bind>
          <bind name="dstitle">@FILENAME</bind>
          <bind name="targname">@OBJECT</bind>
          <!-- Here are some optional mappings -->
          <bind name="targclass">"galaxy"</bind>
          <bind name="alpha">@SRCPOS1</bind>
          <bind name="delta">@SRCPOS2</bind>
          <bind name="bandpass">"Gamma-ray"</bind>
          <!-- ------------------------------- -->
        </apply>
      </rowmaker>
    </make>

  </data>

This completes the necessary steps -- or pieces -- for building up the minimal table structure on DaCHS internals. Let us quickly define an interface to see what we have in our table.

Services

Let us define a web form to search on and show our table "main". The least we can do, but gives what we want now, is a service element like the following:

  <service id="web" defaultRenderer="form">
    <dbCore queriedTable="main"/>
  </service>

Such service element should provide you with a huge search form (basically, all available table keys). (Nevermind, we'll not need to fill them to do a "give me all" search.)

Don't forget to update the resource definition asking gavo to import again:

$ gavo imp magic/q

To see this interface (aka, renderer) just click on the corresponding/new link at our famous url, http://localhost:8080/browse/magic/q (it should point to http://localhost:8080/magic/q/web/form, I give it here for convenience). Scrolling till the end of the page you should be able to direct click the OK button and the whole table should come to you.

This is a ugly table, with lots of (useless) columns. And, depending on the quality of your FITS headers. Anyway, we are the minimum steps, just to see how DaCHS handles our data.

I highlight: we do not have our service working and the quality of the (meta)data inside our database, at this moment, is pretty poor.

Now, we scroll back and read the make it better sections. That should give us a fully functional web and ssap services environment.

Make it better

Table definition

For the "main" table, almost nothing should be done. But we can better define some of the mixin parameters better:

  <table id="main" onDisk="True">
    <mixin
      fluxCalibration="ABSOLUTE"
      fluxUnit="erg/(s.cm**2)"
      spectralUnit="Hz"

      fluxUCD="phot.flux.density;em.freq"
      spectralUCD="em.freq"

      > //ssap#hcd
    </mixin>
  </table>

But more important is the definition of another table: a table for dealing with the spectral data itself. The mixin that defines such table is the //ssap#sdm-instance. SDM stends for Spectral Data Model. The following table depends on our "main" table (because of the use of //ssap@hcd mixin there). The new mixin here adds two columns to the previous (hcd) one: spectral and flux, although they are not declared here, they will become evident when we define the corresponding data element in the next section.

There is a broader explanation in the DaCHS documentation about making SDM tables.

  <table id="spectrum">
    <mixin
      ssaTable="main"
      fluxDescription="Absolute Flux"
      spectralDescription="Frequency"
      > //ssap#sdm-instance
    </mixin>
    <column name="fluxerror" 
            ucd="stat.error;phot.flux.density;em.freq">
      <values nullLiteral="-999"/>
    </column>
  </table>

This table is then filled by the data element defined in the next corresponding section.

Data mapping

Here we define the data element for feeding the "spectrum" table declared in the previous section. The lines below are quite more complex once we embed some blocks of (python) code. The code, as we may notice from the name of the grammar here in use, embeddedGrammar, is used for defining a custom grammar to ingest the spectral data found in our FITS files.

  <data id="build_sdm_data" auto="False">
    <embeddedGrammar>
      <iterator>
        <setup>
          <code>
            from gavo.utils import pyfits
            from gavo.protocols import products
          </code>
        </setup>
      <code>
        fitsPath = products.RAccref.fromString(
        self.sourceToken["accref"]).localpath
        hdu = pyfits.open(fitsPath)[1]
        for row in enumerate(hdu.data):
          yield {"spectral": row[1][0], "flux": row[1][2], "fluxerror": row[1][3]}
      </code>
      </iterator>
    </embeddedGrammar>
    <make table="spectrum">
      <parmaker>
        <apply procDef="//ssap#feedSSAToSDM"/>
      </parmaker>
    </make>
  </data>

The make element here is just a generic (but necessary!) call to the procedure to sync the tables. All the hard work is at the embeddedGrammar.

Services

Now we should fix our faulty or missing services. First, remember our web form, with lots of fields available for searching the table and the plethora of columns being output? (I doubt you forgot, anyway...) Well, now we should fix it. Regarding the web form service we could evolve such element to what follows:

  <service id="web" defaultRenderer="form">
    <meta name="shortName">Magic Web</meta>
    <meta name="title">Magic Public Spectra Web Interface</meta>
    <dbCore queriedTable="main">
      <condDesc buildFrom="ssa_location"/>
      <condDesc buildFrom="ssa_dateObs"/>
    </dbCore>
    <outputTable>
      <autoCols>accref, mime, ssa_targname,ssa_aperture, ssa_dateObs</autoCols>
      <FEED source="//ssap#atomicCoords"/>
      <outputField original="ssa_specstart" displayHint="displayUnit=Angstrom"/>
      <outputField original="ssa_specend" displayHint="displayUnit=Angstrom"/>   <!-- Bug: try to use 'Hz' -->
    </outputTable>
  </service>

What we are defining here is the following. The condDesc elements inside dbCore are asking DaCHS to provide a (web) form with only two querying arguments: position of an object and date of observation. The outputTable element (as it suggests) formats the output table to present only those listed columns.

We finally define our SSAP service/interface. As usual for the service elements, the declaration is pretty straightforward. But I think it is worth noticing the //ssap#hcd_condDescs parameter inside the ssapCore element: it is a predefined set of conditions to query the SSAP database.

  <service id="ssa" allowed="form,ssap.xml">      <!-- Why we have "form" here?! Sould we? -->
    <meta name="shortName">MAGIC SSAP</meta>
    <meta name="ssap.dataSource">observation</meta>
    <meta name="ssap.creationType">archival</meta>
    <meta name="ssap.testQuery">MAXREC=1</meta>

    <publish render="ssap.xml" sets="ivo_managed"/>

    <ssapCore queriedTable="main">
      <FEED source="//ssap#hcd_condDescs"/>   <!-- Do we have an option for ssap#mixc? Make sense?! -->
    </ssapCore>
  </service>

Summary

Make it work

The minimum set of elements we've declared, just to make our data be handled minimally by DaCHS, can be seeing as a whole below. Copy-&-paste that for an "q.rd" file and try it. (Don't forget to grab the FITS_out files attached, and place them at the same directory as your RD.)

q.rd:

<resource schema="magic">

  <!--                                                                        -->
  <!--  Resource's metadata: here is how the world get to know your dataset.  -->
  <!--                                                                        -->
  <meta name="title">MAGIC test</meta>
  <meta name="description">
    The MAGIC project observes the VHE sky (GeV~TeV) through Cherenkov radiation events.
    The project is operating since 2004 and with the support from the Spain-VO team
    they provide data access through a VO-SSAP and web services.
    Our goal here is to provide the same kind of service with the difference
    that the data is transformed and homogeneized in its flux units, to values
    in 'erg/(s.cm2)', and photon energy values in equivalent 'Hz' frequency values.
  </meta>
  <meta name="creationDate">02-02-2016</meta>
  <meta name="subject">Spectra</meta>
  <meta name="subject">VHE sources</meta>
  <meta name="subject">Gamma-ray emission</meta>

  <meta name="creator.name">___</meta>
  <meta name="contact.name">___</meta>
  <meta name="contact.email">___</meta>
  <meta name="instrument">___</meta>
  <meta name="facility">MAGIC</meta>

  <meta name="source">
    The MAGIC data center, http://magic.pic.es/
  </meta>
  <meta name="contentLevel">Research</meta>
  <meta name="type">Catalog</meta>

  <meta name="coverage">
    <meta name="waveband">Gamma-ray</meta>
    <meta name="profile">AllSky ICRS</meta>
  </meta>


  <!-- =============== -->
  <!--  Table block    -->
  <!-- =============== -->
  <table id="main" onDisk="True">
    <mixin
      fluxCalibration="ABSOLUTE"
      fluxUnit="erg/(s.cm**2)"
      spectralUnit="Hz"
      > //ssap#hcd
    </mixin>
  </table>

  <!-- =============== -->
  <!--  Data block     -->
  <!-- =============== -->
  <data id="import">

    <!--                      -->
    <!--  data sources        -->
    <!--                      -->
    <sources pattern='FITS_out/*.fits' recurse="True" />

    <!--                      -->
    <!--  sources grammar     -->
    <!--                      -->
    <fitsProdGrammar hdu="1" qnd="False">
      <rowfilter procDef="//products#define">
        <bind name="table">"\schema.data"</bind>
      </rowfilter>
    </fitsProdGrammar>

    <!--                      -->
    <!--  make [data->table]  -->
    <!--                      -->
    <make table="main">
      <rowmaker idmaps="*">
        <apply procDef="//ssap#setMeta" name="setMeta">
          <bind name="pubDID">\standardPubDID</bind>
          <bind name="dstitle">@FILENAME</bind>
          <bind name="targname">@OBJECT</bind>
          <bind name="targclass">"galaxy"</bind>
          <bind name="alpha">@SRCPOS1</bind>
          <bind name="delta">@SRCPOS2</bind>
          <bind name="bandpass">"Gamma-ray"</bind>
        </apply>
      </rowmaker>
    </make>

  </data>

  <!-- =============== -->
  <!--  Service block  -->
  <!-- =============== -->
  <service id="web" defaultRenderer="form">
    <dbCore queriedTable="main"/>
  </service>

</resource>

Make it better

q.rd:

<resource schema="magic">

  <!--                                                                        -->
  <!--  Resource's metadata: here is how the world get to know your dataset.  -->
  <!--                                                                        -->
  <meta name="title">MAGIC test</meta>
  <meta name="description">
    The MAGIC project observes the VHE sky (GeV~TeV) through Cherenkov radiation events.
    The project is operating since 2004 and with the support from the Spain-VO team
    they provide data access through a VO-SSAP and web services.
    Our goal here is to provide the same kind of service with the difference
    that the data is transformed and homogeneized in its flux units, to values
    in 'erg/(s.cm2)', and photon energy values in equivalent 'Hz' frequency values.
  </meta>
  <meta name="creationDate">02-02-2016</meta>
  <meta name="subject">Spectra</meta>
  <meta name="subject">VHE sources</meta>
  <meta name="subject">Gamma-ray emission</meta>

  <meta name="creator.name">___</meta>
  <meta name="contact.name">___</meta>
  <meta name="contact.email">___</meta>
  <meta name="instrument">___</meta>
  <meta name="facility">MAGIC</meta>

  <meta name="source">
    The MAGIC data center, http://magic.pic.es/
  </meta>
  <meta name="contentLevel">Research</meta>
  <meta name="type">Catalog</meta>

  <meta name="coverage">
    <meta name="waveband">Gamma-ray</meta>
    <meta name="profile">AllSky ICRS</meta>
  </meta>


  <!-- =============== -->
  <!--  Table block    -->
  <!-- =============== -->

  <table id="main" onDisk="True">
    <mixin
      fluxCalibration="ABSOLUTE"
      fluxUnit="erg/(s.cm**2)"
      spectralUnit="Hz"
      fluxUCD="phot.flux.density;em.freq"
      spectralUCD="em.freq"
      > //ssap#hcd
    </mixin>
  </table>


  <table id="spectrum">
    <mixin
      ssaTable="main"
      fluxDescription="Absolute Flux"
      spectralDescription="Frequency"
      > //ssap#sdm-instance
    </mixin>
    <column name="fluxerror"
            ucd="stat.error;phot.flux.density;em.freq">
      <values nullLiteral="-999"/>
    </column>
  </table>


  <!-- =============== -->
  <!--  Data block     -->
  <!-- =============== -->

  <data id="import">

    <sources pattern='FITS_out/*.fits' recurse="True" />

    <fitsProdGrammar hdu="1" qnd="False">
      <rowfilter procDef="//products#define">
        <bind name="table">"\schema.data"</bind>
      </rowfilter>
    </fitsProdGrammar>

    <make table="main">
      <rowmaker idmaps="*">
        <apply procDef="//ssap#setMeta" name="setMeta">
          <bind name="pubDID">\standardPubDID</bind>
          <bind name="dstitle">@FILENAME</bind>
          <bind name="targname">@OBJECT</bind>
          <bind name="targclass">"galaxy"</bind>
          <bind name="alpha">@SRCPOS1</bind>
          <bind name="delta">@SRCPOS2</bind>
          <bind name="bandpass">"Gamma-ray"</bind>
        </apply>
      </rowmaker>
    </make>

  </data>


  <data id="build_sdm_data" auto="False">

    <embeddedGrammar>
      <iterator>
        <setup>
          <code>
            from gavo.utils import pyfits
            from gavo.protocols import products
          </code>
        </setup>
        <code>
          fitsPath = products.RAccref.fromString(
          self.sourceToken["accref"]).localpath
          hdu = pyfits.open(fitsPath)[1]
          for row in enumerate(hdu.data):
            yield {"spectral": row[1][0], "flux": row[1][2], "fluxerror": row[1][3]}
        </code>
      </iterator>
    </embeddedGrammar>

    <make table="spectrum">
      <parmaker>
        <apply procDef="//ssap#feedSSAToSDM"/>
      </parmaker>
    </make>

  </data>


  <!-- =============== -->
  <!--  Service block  -->
  <!-- =============== -->

  <service id="web" defaultRenderer="form">
    <meta name="shortName">Magic Web</meta>
    <meta name="title">Magic Public Spectra Web Interface</meta>

    <dbCore queriedTable="main">
      <condDesc buildFrom="ssa_location"/>
      <condDesc buildFrom="ssa_dateObs"/>
    </dbCore>

    <outputTable>
      <autoCols>accref, mime, ssa_targname,ssa_aperture, ssa_dateObs</autoCols>
      <FEED source="//ssap#atomicCoords"/>
      <outputField original="ssa_specstart" displayHint="displayUnit=Angstrom"/>
      <outputField original="ssa_specend" displayHint="displayUnit=Angstrom"/>   <!-- Bug: try to use 'Hz' -->
    </outputTable>

  </service>

  <service id="ssa" allowed="ssap.xml">
    <meta name="shortName">Magic SSAP</meta>
    <meta name="title">Magic Public Spectra SSAP Interface</meta>
    <meta name="ssap.dataSource">observation</meta>
    <meta name="ssap.creationType">archival</meta>
    <meta name="ssap.testQuery">MAXREC=1</meta>

    <publish render="ssap.xml" sets="ivo_managed"/>

    <ssapCore queriedTable="main">
      <FEED source="//ssap#hcd_condDescs"/>   <!-- Do we have an option for ssap#mixc? Make sense?! -->
    </ssapCore>

  </service>

</resource>