Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop Database Schema #6

Closed
bourque opened this issue Nov 17, 2017 · 13 comments
Closed

Develop Database Schema #6

bourque opened this issue Nov 17, 2017 · 13 comments

Comments

@bourque
Copy link
Collaborator

bourque commented Nov 17, 2017

We need to develop a schema for the jwql database. I think a decent starting point is something like the schema I used for ACS Quicklook:

schema

In this schema, we have a master table that keeps track of each rootname that is in the database and when it was ingested. The datasets table keeps track of which filetypes exist for a given rootname. Then there is a table for each detector/extension/filetype combination which is basically a dump of the headers (columns are header keys and values are header values).

To construct this for jwql, we will need to know the following for each instrument:

  1. What are all of the possible filetypes and what purpose do they serve?
  2. What is the data structure for each filetype (i.e. number of extensions, what purpose each extension serves, what datatype each one is)?
  3. What are the header keywords for each filetype/extension combination?
@bourque bourque added this to the Database Schema milestone Nov 17, 2017
@gkanarek
Copy link
Collaborator

how flexible is this? the JWST keywords, header info, filetypes, etc. are still in flux (nowhere near as stable as WFC3), so we need to be able to evolve the schema in response to these changes.

@bourque bourque added this to To do in Database via automation Nov 19, 2017
@bourque
Copy link
Collaborator Author

bourque commented Nov 19, 2017

If we build this right, changes to the schema for the header tables should be as simple as updating a text file and adding/removing columns in the database. Changes to the data structure itself (i.e. new filetypes, new/different FITS extensions) would be a bit trickier because that would mean adding new tables and not just new columns.

This brings up another question: How often should we anticipate changes to the header keywords/filetypes/FITS extensions after launch?

@bhilbert4
Copy link
Collaborator

My guess is that header keyword changes after launch won't be too common, but I'm sure it will happen from time to time.

For what it's worth, I have a function that returns all of the header keywords for a requested reference file type. It does this by reading in the appropriate schema definition files that SSB has in the JWST Calibration Pipeline repo. I doubt it would be hard to update it to work on the data filetypes.

@bhilbert4
Copy link
Collaborator

bhilbert4 commented Nov 20, 2017

Filetypes that will be ingested into MAST:

_uncal.fits (raw)
_rate.fits, _rateints.fits (countrate images, level-2a)
_cal.fits, _calints.fits (flux calibrated, full WCS-added countrate images, level-2b)
_i2d.fits, _s2d.fits, _s3d.fits (resampled, both for individual exposures and combined)
_x1d.fits (extracted spectra, both for individual exposures and combined)

I'll put together more details on each soon.

@bhilbert4
Copy link
Collaborator

bhilbert4 commented Nov 20, 2017

Data structures:

JWST jargon

  • frame = one readout of the detector
  • group = made from single frame or (onboard) average of multiple frames
  • integration = multiple groups, with detector resets before and after (equivalent to single HST file).
  • exposure = multiple nominally-identical integrations packaged into the same file (like packing multiple HST raw ramps into a single file).

_uncal.fits - raw, uncalibrated file

No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU      89   ()      
  1  SCI           1 ImageHDU        25   (2048, 2048, 10, 1)   int16 (rescales to uint16)   
  2  ZEROFRAME     1 ImageHDU        11   (2048, 2048, 1)   int16 (rescales to uint16)   
  3  GROUP         1 BinTableHDU     35   10R x 13C   [I, I, I, J, I, 26A, I, I, I, I, 36A, D, D] 
  1. SCI extension contains the detector data. 4 dimensions (detector y, detector x, groups per integration, integrations)
  2. ZEROFRAME extension contains the 0th frame that goes with each integration. For some readout patterns, each group will be the average of N frames. This averaging is done on board JWST. The 0th frame is saved to this separate extension for cases where the initial read is needed for slope fitting. 3 dimensions (detector y, detector x, integrations)
  3. GROUP extension is a binary table that contains detailed timing information about the exposure. The table contains 13 columns, and one row for each M milliseconds of the exposure.

GROUP (13 columns x 1 rows):

 Col# Name (Units)       Format
   1 integration_number   I
   2 group_number         I
   3 end_day              I
   4 end_milliseconds     J
   5 end_submilliseconds  I
   6 group_end_time       26A
   7 number_of_columns    I
   8 number_of_rows       I
   9 number_of_gaps       I
  10 completion_code_numb I
  11 completion_code_text 36A
  12 bary_end_time (MJD)  D
  13 helio_end_time (MJD) D

_rate.fits - countrate images (equivalent to HST flt)

This is the output of the Level 2A pipeline, which includes basic calibrations (superbias subtraction, linearity correction, slope fitting). For an exposure that contains a single integration the *_rate.fits file contains the slope image created by line-fitting to the groups of the integration. For an exposure that contains multiple integrations, this *_rate.fits image contains the mean slope image from all integrations. In this case, the pipeline also outputs a *_rateints.fits file. That file contains the seperate slope images from all of the integrations. Therefore add one dimension to those shown below for extensions 1-5.

No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU     159   ()      
  1  SCI           1 ImageHDU        29   (2048, 2048)   float32   
  2  ERR           1 ImageHDU        10   (2048, 2048)   float32   
  3  DQ            1 ImageHDU        11   (2048, 2048)   int32 (rescales to uint32)   
  4  VAR_POISSON    1 ImageHDU         9   (2048, 2048)   float32   
  5  VAR_RNOISE    1 ImageHDU         9   (2048, 2048)   float32   
  6  ASDF          1 ImageHDU         7   (3889,)   uint8   
  1. SCI extension - slope images. 2-dimensional (detector y, detector x)
  2. ERR extension - errors on the slope values. 2-dimensional (detector y, detector x)
  3. DQ extension - data quality array. 2-dimensional (detector y, detector x)
  4. VAR_POISSON - contribution to the variance on the slopes due to Poisson noise. 2-dimensional (detector y, detector x)
  5. VAR_RNOISE - contribution to the variance on the slopes due to readnoise. 2-dimensional (detector y, detector x)
  6. ASDF - Contains distortion correction model information

*_cal.fits - Calibrated file

Output from level 2b pipeline. Flux calibration applied, flat field applied, distortion solution added. Similar to the *_rate.fits and *_rateints.fits files above, there are *_cal.fits files (containing the averaged image if more than one integration per exposure, or the single image if a single integration), and a *_calints.fits file (which contains the individual calibrated image if there are multiple integrations per exposure).

No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU     250   ()      
  1  SCI           1 ImageHDU        32   (2048, 2048)   float32   
  2  ERR           1 ImageHDU        10   (2048, 2048)   float32   
  3  DQ            1 ImageHDU        11   (2048, 2048)   int32 (rescales to uint32)   
  4  AREA          1 ImageHDU         9   (2048, 2048)   float32   
  5  VAR_POISSON    1 ImageHDU         9   (2048, 2048)   float32   
  6  VAR_RNOISE    1 ImageHDU         9   (2048, 2048)   float32   
  7  ASDF          1 ImageHDU         7   (13515,)   uint8   

Extensions are the same as in the case of the *_rate.fits image, plus the AREA extension, which is a 2D image containing the pixel area map.

@bourque
Copy link
Collaborator Author

bourque commented Nov 20, 2017

Thanks @bhilbert4 this is very helpful!

@bhilbert4
Copy link
Collaborator

@bhilbert4
Copy link
Collaborator

bhilbert4 commented Nov 30, 2017

i2d.fits file format - Identical to _cal.fits format above.

No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU     230   ()      
  1  SCI           1 ImageHDU        46   (2048, 2048)   float32   
  2  ERR           1 ImageHDU        10   (2048, 2048)   float32   
  3  DQ            1 ImageHDU        11   (2048, 2048)   int32 (rescales to uint32)   
  4  AREA          1 ImageHDU         9   (2048, 2048)   float32   
  5  VAR_POISSON    1 ImageHDU         9   (2048, 2048)   float32   
  6  VAR_RNOISE    1 ImageHDU         9   (2048, 2048)   float32   
  7  ASDF          1 ImageHDU         7   (13749,)   uint8   

@SaOgaz
Copy link
Collaborator

SaOgaz commented Dec 6, 2017

just to confirm, based on Tom Donaldson's confluence page and what @bhilbert4 has said here, if you have (for ex) a *_rate.fits and a *_uncal.fits that both correspond to the same original image, everything in the * of the filename is identical?

@cracraft
Copy link
Collaborator

cracraft commented Dec 6, 2017

My understanding is that the rest of the filename will be consistent between the two.

@bhilbert4
Copy link
Collaborator

Yes, that's correct

@bourque
Copy link
Collaborator Author

bourque commented Jan 31, 2018

Per @SaraOgaz

Jonathon pointed me to this doc page for the pipeline where there’s a whole section about the associations: https://jwst-pipeline.readthedocs.io/en/latest/jwst/associations/index.html

@bourque
Copy link
Collaborator Author

bourque commented Apr 27, 2018

Now that we have decided to the use MAST api, this is no longer needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Database
  
To do
Development

No branches or pull requests

5 participants