# CRDS For Developers


Overview
========
The rough order of the presentation will be:

1. Demonstrations of low level command line tools

a. crds certify*

b. crds refactor insert

c. crds diff         (Differences FITS, JSON, ASDF, YAML, and CRDS rules files)

d. crds submit*

e. crds uniqname*     (Uniquely names HST calibration and SYNPHOT reference files)

f. crds checksum*     (Updates FITS and CRDS rules checksums)

g. crds list          (Status / info tool useful for developers/CRDS experts)

All are embedded in the web functionality in some form.

Star (*) indicates tools the ReDCaT team uses or embeds.

2. Demonstrations of tools running as integrated web system

a. Context display and CRDS rules

b. Batch reference submission (all-in-one references submission + rules updates)

c. Submit Mappings (brief mention of doing manual rules changes)

d. Archive delivery process

e. Set Context

f. CRDS Reprocessing

g. Pipeline cache sync

h. DPAS reprocessing / repro query

Batch submission processing utilizes the command line functionality but both web-ifies the formatting and uses them in a holistic way in which certify or refactor warnings can reject or suggest rejecting a submission, differences are used to analyze updates for potential anomalies,  and files are organized and "flowed" through the tools in a unified way.

The web submission functions also uniquely add (a) cataloging by the CRDS database (b) delivery to the archive (c) implicit reprocessing triggering (c) change tracking and (b) history of past contexts.

Setup
=====
>> Setup for a particular server, JWST DEV and /grp/crds/cache for CRDS Content.

>> Another common configuration would be: CRDS_PATH=$HOME/crds_cache_dev.

In [None]:
import os
os.environ["CRDS_SERVER_URL"] = "https://jwst-crds-bit.stsci.edu"
os.environ["CRDS_PATH"] = "/grp/crds/cache"

Checking Configuration Status
------------------------------------

CRDS configuration status can be dumped out using:

In [None]:
! crds list --status

Obtaining test data
-----------------------

I separated the example files from the notebook source but you can obtain/recreate them like this while onsite.   This is comparatively slow so comment it out if you're re-running a lot.   With the current CRDS implementation rsync cannot help here because often you need tweaked versions of test files...  which have to be recopied anyway.

In [None]:
!cp -v  /grp/crds/cache/references/jwst/jwst_miri_dark_0025.fits jwst_miri_dark_0025_a.fits   
!cp -v jwst_miri_dark_0025_a.fits jwst_miri_dark_0025_b.fits
!cp -v /grp/crds/cache/references/jwst/jwst_miri_dark_0057.fits .
!chmod +w *.fits
!crds_unique CRDSUNIQ *.fits

The "crds sync" tool can likewise obtain these files from the JWST OPS server,  but you should probably do that outside this notebook after setting:  

CRDS_SERVER_URL=https://jwst-crds.stsci.edu
CRDS_PATH=$HOME/crds_cache_ops

crds sync --output-dir . --files jwst_miri_dark_0025.fits jwst_miri_dark_0027.fits

CRDS Certify
===========

The CRDS certify tool/package checks CRDS reference and rules files of different formats and is embedded into the ReDCaT front end tools.

Demo of command line certify with new constraints running on obolete darks. 

Prefix >> indicates fitsverify running as a subprocess with captured stderr/stdout
fitsverify errors/warnings can also be recategorized by certify

In [None]:
!crds certify jwst_miri_dark_0025_a.fits --run-fitsverify

Certify web output
----------------------
Related Certify web output

>> Similar output with web feedback,  colorized: https://jwst-crds-dev.stsci.edu/authenticated_result/2e126c25-6827-4832-b39e-fc27e2b0d4f9

Certify .tpn web review
---------------------------
Here is a reviewable link of the CRDS internally defined .tpn constraints: https://jwst-crds-dev.stsci.edu/browse/jwst_miri_dark_0057.fits   (open Reference Constraints .tpn panel).  These augment true Datamodels checks and CRDS automatic certify constraints based on Datamodels.

Certify --verbose
--------------------

>> Certify with verbose output to see running constraints,  useful for development / debug:

In [None]:
!crds certify jwst_miri_dark_0025_a.fits --run-fitsverify --verbose

CRDS Refactor
============
The CRDS refactor tool/module is used to update CRDS rmaps based on new reference files.

The primary refactor operation is "insert" used to add new references to rmaps.

Secondary functions are delete (removes references), set_header, and del_header for modifying rmap headers.

Adding --verbose gives more of an idea what CRDS does with checking parameters and adding new nested structure.

(adding new selector is a misnomer.  CRDS is adding a new match case which lives inside the Match() selector of this rmap.   CRDS also adds a nested UseAfter selector within that new match case...  and because this is not certify and USEAFTER was often undefined early on (before JWST even used USEAFTER,  on purpose)...  CRDS adds the USEAFTER as "UNDEFINED UNDEFINED" instead of "<DATE> <TIME>".  On the website,  certify would intercept the bad USEAFTER.   Here,  refactor intentionally passes it through so it can be fixed manually.)


In [None]:
! cp -v /grp/crds/cache/mappings/jwst/jwst_miri_dark_0018.rmap .
! chmod +w jwst_miri_dark_0018.rmap
! crds refactor insert jwst_miri_dark_0018.rmap  new.rmap jwst_miri_dark_0025_a.fits --verbose

CRDS Diff
========

The CRDS Diff tool is used to do polymorphic differences of any CRDS files,  rules or references.

>> CRDS Diff works on reference files:

In [None]:
!crds diff jwst_miri_dark_0025_a.fits jwst_miri_dark_0025_b.fits

>> CRDS Diff works on rules files:

In [None]:
!crds diff ./jwst_miri_dark_0018.rmap ./new.rmap -BFQ --mapping-text-diffs

It's important to note that with standalone functions,  things like "attempting to add invalid files" are valid as shown above adding an invalid DARK file which is not a cube.

Text differences on rules provide a level of redundancy and illustrate what is really happening in the rmap.

CRDS NewContext
===============

The crds newcontext tool/module is used to generate a new .pmap and .imaps based on a set of new .rmaps.

In [None]:
!crds newcontext --help

CRDS Submit Command Line Tool
===========================

The crds submit command line tool is a CRDS client utility which interacts with the server and is embedded in the ReDCaT tools.   

**IMPORTANT:**   Additional setup is required to run command line file submissions.   Briefly,  you need ssh access to pldmsins1,  membership in group crdsoper, a CRDS server file,  and setup for a .crds.ini files.  

See:   https://innerspace.stsci.edu/display/CRDS/Command+Line+File+Submissions

After setting up for command line installs, make the command line submission test files unique by setting a FITS header keyword:

In [None]:
! crds_unique  CRDSUNIQ ./*.fits

Next use the CRDS submit logout command to make sure we're starting from scratch,  unlocked:

In [None]:
! crds submit --logout --verbose

# Finally,  do a command line submission

@files means there should be a file named "files" in the current directory which lists files to submit one per line or separated by spaces.

In the log output, >> indicates progress messages being emitted from the server in real time.

The same >> messages show up on a web page for through-the-web submissions.

Adding --verbose shows DEBUG log messages which show more detail, particularly web operations.

Because of the relative fragility of the command line interface and hand-overs to the web server for review and confirmation,  server actions result in STARTED, READY / BAD FILES / FAIL, and CONFIRMED / FORCED e-mails sent to the crds-servers@stsci.edu mailing list (devs) and redcat@stsci.edu.


In [1]:
! crds submit --logout 
! crds rc_submit --redcat-parameters rc_submit.yaml --files jwst_miri_dark_0057.fits --wipe-existing-files --monitor --wait --log-time --stats --creator "Todd Miller" --description "Small scale command line submission test." --verbosity=80

CRDS - ERROR -  (FATAL) Failed connecting to CRDS server at CRDS_SERVER_URL = 'https://crds-serverless-mode.stsci.edu' :: Required server connection unavailable.
2019-03-27 09:12:56,263 - CRDS - DEBUG -  Command: ['rc_submit.py', '--redcat-parameters', 'rc_submit.yaml', '--files', 'jwst_miri_dark_0057.fits', '--wipe-existing-files', '--monitor', '--wait', '--log-time', '--stats', '--creator', 'Todd Miller', '--description', 'Small scale command line submission test.', '--verbosity=80']
2019-03-27 09:12:56,264 - CRDS - DEBUG -  Uncached call server_info (<__main__.RedCatSubmissionScript object at 0x11a2934e0>,)
2019-03-27 09:12:56,264 - CRDS - DEBUG -  Uncached call observatory (<__main__.RedCatSubmissionScript object at 0x11a2934e0>,)
2019-03-27 09:12:56,264 - CRDS - DEBUG -  Uncached call get_config_info ('jwst',)
2019-03-27 09:12:56,264 - CRDS - DEBUG -  Uncached call get_server_info ()
2019-03-27 09:12:56,264 - CRDS - DEBUG -  CRDS JSON RPC to https://jwst-crds.stsci.edu/json/get_se

              'jwst_miri_photom.rmap',
              'jwst_miri_photom_0000.rmap',
              'jwst_miri_photom_0001.rmap',
              'jwst_miri_photom_0002.rmap',
              'jwst_miri_photom_0003.rmap',
              'jwst_miri_photom_0004.rmap',
              'jwst_miri_photom_0005.rmap',
              'jwst_miri_photom_0006.rmap',
              'jwst_miri_photom_0007.rmap',
              'jwst_miri_photom_0008.rmap',
              'jwst_miri_photom_0009.rmap',
              'jwst_miri_photom_0010.rmap',
              'jwst_miri_photom_0011.rmap',
              'jwst_miri_photom_0012.rmap',
              'jwst_miri_photom_0013.rmap',
              'jwst_miri_photom_0014.rmap',
              'jwst_miri_photom_0015.rmap',
              'jwst_miri_psfmask_0001.rmap',
              'jwst_miri_psfmask_0002.rmap',
              'jwst_miri_readnoise_0000.rmap',
              'jwst_miri_readnoise_0001.rmap',
              'jwst_miri_readnoise_0002.rmap',
    

2019-03-27 09:12:56,433 - CRDS - DEBUG -  RPC OK {'ingest_dir': 'pldmsins1.stsci.edu:/ifs/crds/jwst/ops/server_files/ingest/jmiller',
 'submission_kinds': ['batch', 'mapping', 'reference', 'add', 'delete']}
2019-03-27 09:12:56,433 - CRDS - DEBUG -  Cached call observatory (<__main__.RedCatSubmissionScript object at 0x11a2934e0>,)
2019-03-27 09:12:56,434 - CRDS - DEBUG -  Cached call observatory (<__main__.RedCatSubmissionScript object at 0x11a2934e0>,)
2019-03-27 09:12:56,434 - CRDS - DEBUG -  Uncached call get_locator_module ('jwst',)
2019-03-27 09:12:56,434 - CRDS - DEBUG -  Uncached call get_object ('crds.jwst.locate',)
2019-03-27 09:12:56,441 - CRDS - DEBUG -  Uncached call get_file_properties ('./jwst_miri_dark_0057.fits',)
2019-03-27 09:12:56,442 - CRDS - DEBUG -  Cached call observatory (<__main__.RedCatSubmissionScript object at 0x11a2934e0>,)
2019-03-27 09:12:56,442 - CRDS - DEBUG -  Cached call observatory (<__main__.RedCatSubmissionScript object at 0x11a2934e0>,)
2019-03-27 

2019-03-27 09:12:56,619 - CRDS - DEBUG -  ------------------------------- Response:  -------------------------------
2019-03-27 09:12:56,619 - CRDS - DEBUG -  headers:
 {'Date': 'Wed, 27 Mar 2019 13:12:56 GMT', 'Expires': 'Wed, 27 Mar 2019 13:12:56 GMT', 'Cache-Control': 'max-age=0, no-cache, no-store, must-revalidate', 'Vary': 'Cookie', 'Content-Length': '5411', 'X-Frame-Options': 'DENY', 'Content-Type': 'text/html; charset=utf-8', 'Set-Cookie': 'csrftoken=THB3AYY7AYNJdiV73fHwdGPNINw5lZC45dDpaJn0tOF7rySSaUD7XScv2am6lPf0; expires=Wed, 25-Mar-2020 13:12:56 GMT; Max-Age=31449600; Path=/', 'Via': '1.1 jwst-crds.stsci.edu', 'Keep-Alive': 'timeout=5, max=98', 'Connection': 'Keep-Alive'}
2019-03-27 09:12:56,619 - CRDS - DEBUG -  ---------------------------------------------------------------------------
2019-03-27 09:12:56,619 - CRDS - DEBUG -  status_code: 200
2019-03-27 09:12:56,619 - CRDS - DEBUG -  ---------------------------------------------------------------------------
2019-03-

Because the underlying command line copy operation is "cp" or "scp" there is very little status generated during copies.   OTOH,  they are as reliable and efficient as "cp" or "scp". 

You can also use --keep-existing-files if you're getting tired of duplicate file uploads,  as long as the files are truly different.  (crds_unique intentionally adds diffs,  so if you do that,  you need to re-upload.   At present files which are bit-for-bit identical (sha1sum) are rejected so if test files are confirmed,  you can no longer fully submit them without first changing them at least a little.)

Make the test files unique again (relative to the server) for follow-on web demos:

In [None]:
! crds_uniq CRDSUNIQ ./*.fits

Web Batch Submission
===================

Batch submissions integrate all of the command line tools to check references, generate updated rmaps, generate updated imaps and pmaps,  and deliver files into cross-mounted delivery directory for the "CRDS poller".   Batch submission also generates STARTED, READY/BAD FILES/FAIL, and CONFIRM/CANCEL e-mails.

>> Potentially we can extend this page to add addtional ReDCaT metadata:

https://jwst-crds-dev.stsci.edu/batch_submit_references/

>> Replay of Allyssa'/s submission:

https://hst-crds.stsci.edu/monitor/dc1656fb-14c9-4ace-8ef2-8552e6dee323/

>> Final results page for Allyssa's submission:

https://hst-crds.stsci.edu//display_result/f5161a22-c0cd-4229-b13e-0701a188f825

>> Show Past READY / BAD FILES pages to review as example warnings + error messages:

https://hst-crds.stsci.edu/display_result/bf0c09f6-ff04-4143-885e-4ee845de113f

https://jwst-crds-dev.stsci.edu//display_result/987613e9-b7fe-4682-8eaf-0b4bea6f3b4e

https://jwst-crds.stsci.edu/display_result/8e5c830c-fb56-4111-8b2f-b2c6348676bb

https://hst-crds-test.stsci.edu/display_result/bf3101c0-718d-4dfb-85d1-ce54b7bbf6c6

https://jwst-crds.stsci.edu//display_result/4db1dda9-c7ee-469b-94bf-c5a8f39e0e0f

https://jwst-crds.stsci.edu//display_result/4db1dda9-c7ee-469b-94bf-c5a8f39e0e0f

>> Show tracking e-mails,  particularly FAIL which has no stored result


Archiving
========

Archiving is achieved by linking new CRDS files into a delivery directory which is shared with SDP.

/ifs/crds/jwst/dev/server_files/deliveries   (temporary delivery directory,  double linked files)
/ifs/crds/jwst/dev/server_files/catalogs     (permanent copies of .cat manifest files)
/ifs/crds/jwst/dev/file_cache                (Server's internal CRDS cache, permanent server copies of new files)

A .cat manifest file lists each delivered file and is a rough indicator of progress.

Files are readonly,  but the directory is r/w for members of crdsoper.

SDP's "CRDS poller" reads manifests and archives delivered files,  renaming the .cat file to .cat_proc and/or .cat_err along the way.

When the delivery is completed,  the .cat file is deleted and CRDS considers the files archived.   There may be some lag before files are officially archived due to downstream protocol delays.

A completion e-mail is sent out by the archive.  

>> Show archive completion e-mail
>> Show ReDCaT update context request e-mail


Set Context
==========

On request,  a pipeline logs into the CRDS server and updates the default context on the Set Context page.

For simple one context transitions,  the description is taken from the submission description.

For complex multi-context transitions,  a summary needs to be constructed by a human manually.

Completing Set Context for the operational context (not edit) sets a universal default for typical users,  and signifies the context in use by the archive pipeline.

https://jwst-crds-dev.stsci.edu/set_default_context/

When the CRDS sync operation is complete,  the CRDS server automatically sends a Set Context e-mail.

>> Show CRDS Set Context e-mail

CRDS reprocessing
================

CRDS reprocessing is handled as a cron job on the CRDS server and is automatically triggered by forward moving context changes.   When the cron job notices a context transition,  crds bestrefs is run in context-to-context comparison mode on potentially affected parameters retrieved from the archive.

This is captured by the server crontab, monitor_reprocessing, and affected datasets scripts.

When the CRDS reprocessing run completes,  the CRDS server automatically sends an e-mail to DPAS containing the log output and the list of dataset IDs to reprocess.   This is sufficient to begin reprocessing but not how it is done. 

There is a race condition here where CRDS reprocessing can complete and send out IDs prior to the update of the pipeline's CRDS cache and adoption of the new context for reprocessing.  DPAS is aware of the race condition and how to handle it.  Not handling it results in reprocessing using an unchanged context.


Pipeline Cache Sync
================

The crds sync tool is used by the pipeline to maintain a local cache of all CRDS reference files.

There are written level-4 requirements not to use CRDS files before archiving.   Once the archive delivery is complete,  the CRDS server "releases" delivered files for download.   

The pipeline then runs a script (documented in the Server Workflow Guide on the server) which wraps the CRDS client's cron_sync script which in turn wraps the crds sync tool.   This was originally envisioned as a pipeline cron job,  much like the one which automatically maintains /grp/crds/cache while running on the servers.

The crds sync tool runs in time dependent on the size of the delivery or backlog of missing files (TEST servers).

If the CRDS sync tool completes without errors and successfully updates the context it "pushes" the pipeline context back up to the server for tracking under the "remote context" display visible to authenticated users.

A cron jon does a similar automatic sync of /grp/crds/cache and is likewise tracked as an important remote cache.

When the pipeline operators complete their sync,  they manually send an acknowledgement e-mail.


DPAS Use of CRDS repro query and archive reprocessing
==============================================

Once DPAS has seen the CRDS repro completion e-mail and the pipeline's CRDS cache has been updated with the new files and context definition,  DPAS runs a subclass of the CRDS client's query_affected_datasets script.   This automatically retrieves stored results from the CRDS server and dumps out the list of dataset IDs to reprocess on stdout.