# CRDS programmatic reference file submission

An introduction to reference file submission using the CRDS Python API.

## Prerequisites

To follow along with the examples in this notebook, you will need:

- A CRDS server deployment to serve as a sandbox.  As of September 2020, the JWST B-string CRDS deployment has been commandeered to support these notebooks.

- An active STScI VPN connection.

- An account on the CRDS server with permissions to submit files.  If you need an account, contact Ed Slavich or Jonathan Eisenhamer or SCSB generally.

- The following Python packages installed:

```
$ pip install crds==7.6.1 astropy==4.0.1 jwst==0.17.1
```

## ¡CAUTION!

As you perform tasks on the CRDS website, always double-check that you are using the correct dev deployment and not one of the HST or JWST test or ops servers.  The changes you make can always be reverted, but a) the cleanup will give the CRDS operators a headache, and b) the system will retain your submissions eternally as a monument to your mistake.

That said, if you are on the correct server, please do experiment freely ("go nuts").  The dev server can be easily reset to its original state.

## Generate a MAST API token

Since we're going to be submitting a file, we'll need to identify ourselves to CRDS to prove that we have the permissions necessary to modify the system.  The programmatic API does not support authentication with username and password -- this is a security feature, since users tend to leave such credentials lying around in plaintext in scripts and notebooks.  Instead we'll authenticate with an API token that we generate for this purpose.  Follow this link to auth.mast (you will be asked to authenticate with STScI SSO if you have not already):

https://auth.mast.stsci.edu/token

Enter a useful token name like "CRDS tutorials" and press the red "Create Token" button.  You should see a result like this:

![Create Token result](images/token_generated.png)

Copy the token value and paste it into the `MAST_API_TOKEN` variable below:

In [1]:
MAST_API_TOKEN = "9c6b33db70ca44ecb7688e3cca0acdd1"
assert MAST_API_TOKEN != "your-token-here", "Please set MAST_API_TOKEN"

## Setup

We'll once again need to configure CRDS to use the JWST B-string server, and this time we'll also set an environment variable for the API token:

In [2]:
import os
os.environ["CRDS_SERVER_URL"] = "https://jwst-crds-bit.stsci.edu"
os.environ["CRDS_PATH"] = os.path.join(os.environ["HOME"], "crds-tutorial-cache")
os.environ["MAST_API_TOKEN"] = MAST_API_TOKEN

Then import the crds client library:

In [3]:
import crds

## Select and download an existing reference file

Like in the second notebook, we need to acquire a file to update and submit.  Let's use a FITS file this time.  Here's a section of an **.rmap** for the `nircam` `readnoise` reference file type:

```
header = {
    'classes' : ('Match', 'UseAfter'),
    'derived_from' : 'jwst_nircam_readnoise_0005.rmap',
    'filekind' : 'READNOISE',
    'instrument' : 'NIRCAM',
    'mapping' : 'REFERENCE',
    'name' : 'jwst_nircam_readnoise_0006.rmap',
    'observatory' : 'JWST',
    'parkey' : (('META.INSTRUMENT.DETECTOR', 'META.EXPOSURE.READPATT', 'META.SUBARRAY.NAME'), ('META.OBSERVATION.DATE', 'META.OBSERVATION.TIME')),
    'sha1sum' : '4f7fe083dc1f65374299a435d824c124d33dcfb3',
    'substitutions' : {
        'META.SUBARRAY.NAME' : {
            'GENERIC' : 'N/A',
        },
    },
}

selector = Match({
    ('NRCA1', 'ANY', 'GENERIC') : UseAfter({
        '2015-10-01 00:00:00' : 'jwst_nircam_readnoise_0025.fits',
    }),
    ...
})
```

That first file will do.  We'll again harness `crds.getreferences` to download the file for us:

In [4]:
result = crds.getreferences(
    {
        "META.INSTRUMENT.NAME": "NIRCAM",
        "META.INSTRUMENT.DETECTOR": "NRCA1",
        "META.EXPOSURE.READPATT": "I LOVE OATMEAL",
        "META.SUBARRAY.NAME": "HARRIET",
        "META.OBSERVATION.DATE": "2015-11-20",
        "META.OBSERVATION.TIME": "10:11:12",
    },
    reftypes=["readnoise"],
    observatory="jwst",
    context="jwst_0641.pmap",
)
result

{'readnoise': '/Users/eslavich/crds-tutorial-cache/references/jwst/nircam/jwst_nircam_readnoise_0025.fits'}

... and voilà, the file is cached locally.  Notice the odd `META.EXPOSURE.READPATT` and `META.SUBARRAY.NAME` values that we passed to `getreferences`.  The `Match` selector supports wildcard matching, and both `ANY` and `GENERIC` are special values that match anything.  We'll get into the nitty-gritty of **.rmap** syntax later, but for now just know that there's more to matching than simple string comparisons.

In [5]:
!cp {result["readnoise"]} ./jwst_nircam_readnoise_new.fits

We're ready to go about modifying the file.

## Modify the reference file

We'll make a different sort of modification to the file this time around.  Since it's a FITS file, we'll need to use `astropy.io.fits` to open it:

In [24]:
from astropy.io import fits

hdul = fits.open("jwst_nircam_readnoise_new.fits", mode="update")
hdul[0].header

SIMPLE  =                    T / conforms to FITS standard                      
BITPIX  =                    8 / array data type                                
NAXIS   =                    0 / number of array dimensions                     
EXTEND  =                    T                                                  
                                                                                
        Level 1b Schema Metadata                                                
                                                                                
DATE    = '2015-06-24T00:00:00' / Date this file was created (UTC)              
FILENAME= 'NRCA1_17004_CDSNoise_ISIMCV3_ADU_2016-06-24_ssbreadnoise.fits' / Name
TELESCOP= 'JWST    '           / Telescope used to acquire the data             
                                                                                
        Instrument configuration information                                    
                            

The change we'll make this time is to update the `USEAFTER` header value, which will lead to some interesting consequences when we submit the file:

In [25]:
import datetime

hdul[0].header["USEAFTER"] = datetime.datetime.utcnow().isoformat(timespec="seconds")
hdul.close() # Changes are automatically flushed to disk on close()

## Submit the file to CRDS... programmatically!

At this point in the previous notebook, we followed a link to the CRDS website and uploaded our files and documentation via a web form.  Now we'll submit the same information, but use the `crds` library to submit it right from the notebook.  The first step is to create a `Submission` instance:

In [26]:
from crds.submit import Submission

# TODO: Set derived-from context once that parameter becomes available
submission = Submission("jwst", "bit", context="jwst_0641.pmap")

That second argument is an identifier that refers to the server use case / operational environment.  We've specified `"bit"`, which corresponds to the B-string server.  The `Submission.help` method will guide us through what we need to do next:

In [27]:
submission.help()

deliverer (str)
---------
Name of deliverer
Who are you?

other_email (str, optional)
-----------
Other e-mail adresses to send notifications

instrument (str)
----------
Instrument  (All submitted files should match this instrument.  This
instrument will be locked for your submission exclusively)
Valid choices:
  {'', 'fgs', 'miri', 'nircam', 'niriss', 'nirspec', 'system'}

file_type (str)
---------
Type of files (Bias, Dark, etc.)

history_updated (bool)
---------------
Has HISTORY section in the primary header been updated to describe in
detail the reason for delivery and how the files were created?

Valid choices:
  {'False', 'True'}

pedigree_updated (bool)
----------------
Has PEDIGREE keyword been checked and updated as necessary?

Valid choices:
  {'False', 'True'}

keywords_checked (bool)
----------------
Have REFTYPE and AUTHOR been checked and updated as necessary?
         REFTYPE Keywords           AUTHOR Keywords
Valid choices:
  {'False', 'True'}

descrip_updated (bool)


Each of these keys corresponds to a field on the web form, and we'll need to assign values for the required keys.  Let's use the same dummy values as before:

In [28]:
submission["deliverer"] = "testing"
submission["instrument"] = "nircam"
submission["file_type"] = "testing"
submission["history_updated"] = True
submission["pedigree_updated"] = True
submission["keywords_checked"] = True
submission["descrip_updated"] = True
submission["useafter_updated"] = True
submission["useafter_matches"] = "N/A"
submission["compliance_verified"] = "N/A"
submission["ingest_files"] = False
submission["etc_delivery"] = False
submission["jwst_etc"] = False
submission["calpipe_version"] = "testing"
submission["replacement_files"] = False
submission["replacing_badfiles"] = "N/A"
submission["modes_affected"] = "testing"
submission["change_level"] = "SEVERE"
submission["correctness_testing"] = "testing"
submission["description"] = "testing"

Now add the new file to the submission and send it off to the server:

In [29]:
submission.add_file("jwst_nircam_readnoise_new.fits")
result = submission.submit()

2020-09-16 15:19:28,549 - CRDS - INFO -  ########################################
2020-09-16 15:19:28,550 - CRDS - INFO -  Certifying './jwst_nircam_readnoise_new.fits' (1/1) as 'FITS' relative to context 'jwst_0641.pmap'
2020-09-16 15:19:28,768 - CRDS - INFO -  FITS file 'jwst_nircam_readnoise_new.fits' conforms to FITS standards.
2020-09-16 15:19:29,516 - CRDS - INFO -  [0] BUNIT ADU 
2020-09-16 15:19:29,517 - CRDS - INFO -  [0] DETECTOR NRCA1 Name of detector used to acquire the data
2020-09-16 15:19:29,518 - CRDS - INFO -  [0] SUBARRAY GENERIC Subarray used
2020-09-16 15:19:29,519 - CRDS - INFO -  EXP_TYPE = 'UNDEFINED'
2020-09-16 15:19:29,520 - CRDS - INFO -  META.AUTHOR [AUTHOR] = 'NIRCam Instrument Team'
2020-09-16 15:19:29,521 - CRDS - INFO -  META.DESCRIPTION [DESCRIP] = 'CDS Noise Image'
2020-09-16 15:19:29,522 - CRDS - INFO -  META.EXPOSURE.READPATT [READPATT] = 'ANY'
2020-09-16 15:19:29,522 - CRDS - INFO -  META.EXPOSURE.TYPE [EXP_TYPE] = 'UNDEFINED'
2020-09-16 15:19:29,523

CrdsBackgroundError: Exception in 'monitor' : "CRDS jsonrpc failure 'jpoll_pull_messages' <urlopen error [Errno 60] Operation timed out>"

## Inspect and confirm the submission

Completing this final submission step still requires use of the CRDS website.  The result object returned by `Submission.submit` includes a method that will pop open a browser window to the submission results page:

In [None]:
result.open_ready_url()

The website will ask you to login.  Lock the `nircam` instrument that we're working with and proceed to the results page.  Notice the difference in the **.rmap** diff versus the previous notebook's submission:

```
 selector = Match({
     ('NRCA1', 'ANY', 'GENERIC') : UseAfter({
         '2015-10-01 00:00:00' : 'jwst_nircam_readnoise_0025.fits',
+        '2018-01-01 00:00:00' : 'jwst_nircam_readnoise_0045.fits',
     }),
     ('NRCA1', 'N/A', 'GENERIC') : UseAfter({
         '1900-01-01 00:00:00' : 'jwst_nircam_readnoise_0010.fits',
```

We've added a line, but the previous line was not removed!  What's happening here?

When we changed the `USEAFTER` timestamp in the reference file, what we were telling CRDS was that this file supercedes the old file _but only for data acquired after this date_.  The old rule persists because it still needs to handle matches for data timestamped prior to `2018-01-01`.  We'll see how that plays out when making calls to `crds.getreferences` in the next section.

Confirm the submission as before and set the `NEW_CONTEXT` variable to the new context identifier.

In [None]:
NEW_CONTEXT = "your-context-here"
assert NEW_CONTEXT != "your-context-here", "Please set NEW_CONTEXT"

Now spend 5 minutes doing jumping jacks while the CRDS server performs its simulated archiving.

## Explore the new context


As before, we need to dig into the guts of crds and reset some caches:

In [None]:
from itertools import chain
for value in chain(crds.heavy_client.__dict__.values(), crds.api.__dict__.values()):
    if isinstance(value, crds.utils.CachedFunction):
        value.cache = {}

Let's make the same `crds.getreferences` call as before, but with the new context:

In [None]:
crds.getreferences(
    {
        "META.INSTRUMENT.NAME": "NIRCAM",
        "META.INSTRUMENT.DETECTOR": "NRCA1",
        "META.EXPOSURE.READPATT": "I LOVE OATMEAL",
        "META.SUBARRAY.NAME": "HARRIET",
        "META.OBSERVATION.DATE": "2015-11-20",
        "META.OBSERVATION.TIME": "10:11:12",
    },
    reftypes=["readnoise"],
    observatory="jwst",
    context=NEW_CONTEXT,
)

It's the same file, and that's good -- the new file is only to be used for data taken in the future.  We should get back the new file when we advance the date:

In [None]:
crds.getreferences(
    {
        "META.INSTRUMENT.NAME": "NIRCAM",
        "META.INSTRUMENT.DETECTOR": "NRCA1",
        "META.EXPOSURE.READPATT": "I LOVE OATMEAL",
        "META.SUBARRAY.NAME": "HARRIET",
        "META.OBSERVATION.DATE": "2018-11-20",
        "META.OBSERVATION.TIME": "10:11:12",
    },
    reftypes=["readnoise"],
    observatory="jwst",
    context=NEW_CONTEXT,
)

Yup, that's the one.

## Further reading

The CRDS User Manual includes [detailed documentation](https://jwst-crds-bit.stsci.edu/static/users_guide/programmatic_interface.html) on use of the Python submission API.