Develop Database Schema #6

bourque · 2017-11-17T19:55:58Z

We need to develop a schema for the jwql database. I think a decent starting point is something like the schema I used for ACS Quicklook:

In this schema, we have a master table that keeps track of each rootname that is in the database and when it was ingested. The datasets table keeps track of which filetypes exist for a given rootname. Then there is a table for each detector/extension/filetype combination which is basically a dump of the headers (columns are header keys and values are header values).

To construct this for jwql, we will need to know the following for each instrument:

What are all of the possible filetypes and what purpose do they serve?
What is the data structure for each filetype (i.e. number of extensions, what purpose each extension serves, what datatype each one is)?
What are the header keywords for each filetype/extension combination?

The text was updated successfully, but these errors were encountered:

gkanarek · 2017-11-17T20:46:50Z

how flexible is this? the JWST keywords, header info, filetypes, etc. are still in flux (nowhere near as stable as WFC3), so we need to be able to evolve the schema in response to these changes.

bourque · 2017-11-19T20:16:54Z

If we build this right, changes to the schema for the header tables should be as simple as updating a text file and adding/removing columns in the database. Changes to the data structure itself (i.e. new filetypes, new/different FITS extensions) would be a bit trickier because that would mean adding new tables and not just new columns.

This brings up another question: How often should we anticipate changes to the header keywords/filetypes/FITS extensions after launch?

bhilbert4 · 2017-11-19T23:53:46Z

My guess is that header keyword changes after launch won't be too common, but I'm sure it will happen from time to time.

For what it's worth, I have a function that returns all of the header keywords for a requested reference file type. It does this by reading in the appropriate schema definition files that SSB has in the JWST Calibration Pipeline repo. I doubt it would be hard to update it to work on the data filetypes.

bhilbert4 · 2017-11-20T00:18:26Z

Filetypes that will be ingested into MAST:

_uncal.fits (raw)
_rate.fits, _rateints.fits (countrate images, level-2a)
_cal.fits, _calints.fits (flux calibrated, full WCS-added countrate images, level-2b)
_i2d.fits, _s2d.fits, _s3d.fits (resampled, both for individual exposures and combined)
_x1d.fits (extracted spectra, both for individual exposures and combined)

I'll put together more details on each soon.

bhilbert4 · 2017-11-20T02:48:32Z

Data structures:

JWST jargon

frame = one readout of the detector
group = made from single frame or (onboard) average of multiple frames
integration = multiple groups, with detector resets before and after (equivalent to single HST file).
exposure = multiple nominally-identical integrations packaged into the same file (like packing multiple HST raw ramps into a single file).

_uncal.fits - raw, uncalibrated file

No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU      89   ()      
  1  SCI           1 ImageHDU        25   (2048, 2048, 10, 1)   int16 (rescales to uint16)   
  2  ZEROFRAME     1 ImageHDU        11   (2048, 2048, 1)   int16 (rescales to uint16)   
  3  GROUP         1 BinTableHDU     35   10R x 13C   [I, I, I, J, I, 26A, I, I, I, I, 36A, D, D]

SCI extension contains the detector data. 4 dimensions (detector y, detector x, groups per integration, integrations)
ZEROFRAME extension contains the 0th frame that goes with each integration. For some readout patterns, each group will be the average of N frames. This averaging is done on board JWST. The 0th frame is saved to this separate extension for cases where the initial read is needed for slope fitting. 3 dimensions (detector y, detector x, integrations)
GROUP extension is a binary table that contains detailed timing information about the exposure. The table contains 13 columns, and one row for each M milliseconds of the exposure.

GROUP (13 columns x 1 rows):

 Col# Name (Units)       Format
   1 integration_number   I
   2 group_number         I
   3 end_day              I
   4 end_milliseconds     J
   5 end_submilliseconds  I
   6 group_end_time       26A
   7 number_of_columns    I
   8 number_of_rows       I
   9 number_of_gaps       I
  10 completion_code_numb I
  11 completion_code_text 36A
  12 bary_end_time (MJD)  D
  13 helio_end_time (MJD) D

_rate.fits - countrate images (equivalent to HST flt)

This is the output of the Level 2A pipeline, which includes basic calibrations (superbias subtraction, linearity correction, slope fitting). For an exposure that contains a single integration the *_rate.fits file contains the slope image created by line-fitting to the groups of the integration. For an exposure that contains multiple integrations, this *_rate.fits image contains the mean slope image from all integrations. In this case, the pipeline also outputs a *_rateints.fits file. That file contains the seperate slope images from all of the integrations. Therefore add one dimension to those shown below for extensions 1-5.

No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU     159   ()      
  1  SCI           1 ImageHDU        29   (2048, 2048)   float32   
  2  ERR           1 ImageHDU        10   (2048, 2048)   float32   
  3  DQ            1 ImageHDU        11   (2048, 2048)   int32 (rescales to uint32)   
  4  VAR_POISSON    1 ImageHDU         9   (2048, 2048)   float32   
  5  VAR_RNOISE    1 ImageHDU         9   (2048, 2048)   float32   
  6  ASDF          1 ImageHDU         7   (3889,)   uint8

SCI extension - slope images. 2-dimensional (detector y, detector x)
ERR extension - errors on the slope values. 2-dimensional (detector y, detector x)
DQ extension - data quality array. 2-dimensional (detector y, detector x)
VAR_POISSON - contribution to the variance on the slopes due to Poisson noise. 2-dimensional (detector y, detector x)
VAR_RNOISE - contribution to the variance on the slopes due to readnoise. 2-dimensional (detector y, detector x)
ASDF - Contains distortion correction model information

*_cal.fits - Calibrated file

Output from level 2b pipeline. Flux calibration applied, flat field applied, distortion solution added. Similar to the *_rate.fits and *_rateints.fits files above, there are *_cal.fits files (containing the averaged image if more than one integration per exposure, or the single image if a single integration), and a *_calints.fits file (which contains the individual calibrated image if there are multiple integrations per exposure).

No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU     250   ()      
  1  SCI           1 ImageHDU        32   (2048, 2048)   float32   
  2  ERR           1 ImageHDU        10   (2048, 2048)   float32   
  3  DQ            1 ImageHDU        11   (2048, 2048)   int32 (rescales to uint32)   
  4  AREA          1 ImageHDU         9   (2048, 2048)   float32   
  5  VAR_POISSON    1 ImageHDU         9   (2048, 2048)   float32   
  6  VAR_RNOISE    1 ImageHDU         9   (2048, 2048)   float32   
  7  ASDF          1 ImageHDU         7   (13515,)   uint8

Extensions are the same as in the case of the *_rate.fits image, plus the AREA extension, which is a 2D image containing the pixel area map.

bourque · 2017-11-20T16:29:59Z

Thanks @bhilbert4 this is very helpful!

bhilbert4 · 2017-11-20T23:44:27Z

JDox page on filetypes and formats:
https://jwst-docs-stage.stsci.edu/pages/viewpage.action?spaceKey=JDAT&title=File+Naming+Conventions+and+Data+Products

bhilbert4 · 2017-11-30T16:37:11Z

i2d.fits file format - Identical to _cal.fits format above.

No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU     230   ()      
  1  SCI           1 ImageHDU        46   (2048, 2048)   float32   
  2  ERR           1 ImageHDU        10   (2048, 2048)   float32   
  3  DQ            1 ImageHDU        11   (2048, 2048)   int32 (rescales to uint32)   
  4  AREA          1 ImageHDU         9   (2048, 2048)   float32   
  5  VAR_POISSON    1 ImageHDU         9   (2048, 2048)   float32   
  6  VAR_RNOISE    1 ImageHDU         9   (2048, 2048)   float32   
  7  ASDF          1 ImageHDU         7   (13749,)   uint8

SaOgaz · 2017-12-06T18:48:17Z

just to confirm, based on Tom Donaldson's confluence page and what @bhilbert4 has said here, if you have (for ex) a *_rate.fits and a *_uncal.fits that both correspond to the same original image, everything in the * of the filename is identical?

cracraft · 2017-12-06T18:50:03Z

My understanding is that the rest of the filename will be consistent between the two.

bhilbert4 · 2017-12-06T19:03:20Z

Yes, that's correct

bourque · 2018-01-31T21:09:27Z

Per @SaraOgaz

Jonathon pointed me to this doc page for the pipeline where there’s a whole section about the associations: https://jwst-pipeline.readthedocs.io/en/latest/jwst/associations/index.html

bourque · 2018-04-27T15:11:03Z

Now that we have decided to the use MAST api, this is no longer needed.

bourque added Database High Priority labels Nov 17, 2017

bourque added this to the Database Schema milestone Nov 17, 2017

bourque assigned bourque and bhilbert4 Nov 17, 2017

bourque added this to To do in Database via automation Nov 19, 2017

bourque assigned SaOgaz Nov 20, 2017

bourque mentioned this issue Jan 26, 2018

Estimate the size of the database #20

Closed

bourque added the wontfix label Apr 27, 2018

bourque closed this as completed Apr 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop Database Schema #6

Develop Database Schema #6

bourque commented Nov 17, 2017

gkanarek commented Nov 17, 2017

bourque commented Nov 19, 2017

bhilbert4 commented Nov 19, 2017

bhilbert4 commented Nov 20, 2017 •

edited by bourque

bhilbert4 commented Nov 20, 2017 •

edited

bourque commented Nov 20, 2017

bhilbert4 commented Nov 20, 2017

bhilbert4 commented Nov 30, 2017 •

edited

SaOgaz commented Dec 6, 2017

cracraft commented Dec 6, 2017

bhilbert4 commented Dec 6, 2017

bourque commented Jan 31, 2018

bourque commented Apr 27, 2018

Develop Database Schema #6

Develop Database Schema #6

Comments

bourque commented Nov 17, 2017

gkanarek commented Nov 17, 2017

bourque commented Nov 19, 2017

bhilbert4 commented Nov 19, 2017

bhilbert4 commented Nov 20, 2017 • edited by bourque

bhilbert4 commented Nov 20, 2017 • edited

Data structures:

JWST jargon

_uncal.fits - raw, uncalibrated file

_rate.fits - countrate images (equivalent to HST flt)

*_cal.fits - Calibrated file

bourque commented Nov 20, 2017

bhilbert4 commented Nov 20, 2017

bhilbert4 commented Nov 30, 2017 • edited

i2d.fits file format - Identical to _cal.fits format above.

SaOgaz commented Dec 6, 2017

cracraft commented Dec 6, 2017

bhilbert4 commented Dec 6, 2017

bourque commented Jan 31, 2018

bourque commented Apr 27, 2018

bhilbert4 commented Nov 20, 2017 •

edited by bourque

bhilbert4 commented Nov 20, 2017 •

edited

bhilbert4 commented Nov 30, 2017 •

edited