Skip to content

Maintaining Regression Tests

P. L. Lim edited this page Jun 11, 2025 · 48 revisions

Maintain regression test data

The following instructions are intended to assist in maintaining regression test data for the JWST pipeline, via command-line and interactive tools.

Download and configure jfrog

  1. You need JFROG CLI to talk to Artifactory. Get it here:

    https://jfrog.com/getcli/

    Download it via

     curl -fL https://getcli.jfrog.io | sh
    

    Install it in $HOME/bin, making sure to chmod u+x on it and making sure $HOME/bin is in your $PATH in your .bash_profile. The first time you run it it will ask for configuration. Our artifactory server is

    https://bytesalad.stsci.edu/artifactory/

    and it accepts your AD (Active Directory) credentials. It will save these to a profile in $HOME/.jfrog/ so you don't have to enter them every time you use it. To configure:

    $ jfrog config add https://bytesalad.stsci.edu --artifactory-url=https://bytesalad.stsci.edu/artifactory/
    JFrog platform URL [https://bytesalad.stsci.edu/]: 
    JFrog Distribution URL (Optional): 
    Access token (Leave blank for username and password/API key): 
    User [<YOUR_USERNAME>]: 
    Password/API key: <YOUR_AD_PASSWORD>
    Is the Artifactory reverse proxy configured to accept a client certificate? (y/n) [n]? 
    [Info] Encrypting password...
    

Update jfrog password

When you update your AD password, you will need to run

jfrog config edit https://bytesalad.stsci.edu

and go through the same menu options above to update the password, which it will cache (encrypted). Be sure to leave "Access Token" blank at the prompt, so it instead gets your AD username and password.

JFrog problems

If you experience issues with your JFrog configuration, ensure that jfrog config show looks like

Server ID:			https://bytesalad.stsci.edu
JFrog platform URL:		https://bytesalad.stsci.edu/
Artifactory URL:		https://bytesalad.stsci.edu/artifactory/
Distribution URL:		https://bytesalad.stsci.edu/distribution/
Xray URL:			https://bytesalad.stsci.edu/xray/
Mission Control URL:		https://bytesalad.stsci.edu/mc/
Pipelines URL:		        https://bytesalad.stsci.edu/pipelines/
User:	                        <your-username>
Password:			***
Default:			true

If it doesn't, it should be fixed, either with jfrog config edit, or by running jfrog config rm and re-trying jfrog config add.

If your config shows a different Server ID (say, you set it up a looooong time ago and forgot about it), do an edit on that ID instead and put in your new credentials to fix authentication problem.

Running regression tests locally

It's good practice to run regression tests with your local repository before running them on GitHub Actions, but running the whole suite will need to download ~80GB of data. And even single tests can spend more time downloading the data files than running the test. So to make this a bit quicker to iterate, you can cache the whole test suite input and truth data locally. Here's how:

Make sure you have jfrog cli installed and in your path, say ~/bin/. See above. If you do not have it in your PATH, you can still call it but you have to provide full path to the executable in every command line call.

All interactions with https://bytesalad.stsci.edu/artifactory need to be done on VPN. It is only available on the internal network.

If you want to store the test suite in $HOME for example (important note for Linux users, do not use $HOME, set up the test_bigdata path on a disk close to the CPU, e.g., /internal/1/), make a directory and then do the sync:

$ mkdir ~/test_bigdata
$ mkdir ~/test_bigdata/jwst-pipeline
$ cd ~/test_bigdata/jwst-pipeline
$ jfrog rt dl "jwst-pipeline/dev/*" ./

And that will keep your local cache updated with what is on bytesalad in much the same way as rsync -av. The first time you do it you'll be downloading ~50GB of data, but every subsequent update will just get any diffs.

Then add

export TEST_BIGDATA=$HOME/test_bigdata/

to your .bash_profile (or whatever shell script you source on login) and you're good to go. Some of the files can change day-to-day, so just remember to do

cd ~/test_bigdata/jwst-pipeline
jfrog rt dl "jwst-pipeline/dev/*" ./

or if you want to make sure deletions are also sync-ed

jfrog rt dl "jwst-pipeline/dev/*" ./ --sync-deletes dev

before you run the tests.

Usually you do not want to run the full suite every time during development because it takes a very long time. If you want to target a particular test function do this at the root directory of your source checkout:

pytest jwst/regtest/test_module.py -k test_function_name --bigdata --slow --run-slow --basetemp=/path/to/pytest_tmp

where you replace test_module.py with actual test module filename, test_function_name with the actual test function name, and /path/to/pytest_tmp with a directory pytest should use to dump test output files. If --basetemp is not given, /tmp will be used by default and on Linux, you might get OSError when disk space quota runs out. For why you need both --slow --run-slow to run a test marked as "slow", see https://github.com/spacetelescope/ci_watson/issues/83

Okify via interactive script

This is how we are currently updating truth files on our regression tests.

The okify_regtests script prompts the user to okify or skip failed tests. The script relies on JFrog CLI (see above for instructions on installing and configuring JFrog CLI).

To OKify test(s) from a Github Actions process, run the script like so:

$ okify_regtests jwst <build number> 

where the build number is the number designation of the action shown at the top of the run summary page.

The script will provide the assertion error, traceback, and request a decision:

Downloading test okify artifacts to local directory /var/folders/jg/by5st33j7ps356dgb4kn8w900001n5/T/tmpd5rxjrx0
24 failed tests to okify
----------------------------------------------------------------- test_name -----------------------------------------------------------------
run_pipelines = {'input': '.../test_outputs/popen-gw28/test_nircam_tsgrism_run_pipelines0/jw0072101...h_remote': 'jwst-pipeline/dev/truth/test_nircam_tsgrism_stages/jw00721012001_03103_00001-seg001_nrcalong_calints.fits'}
fitsdiff_default_kwargs = {'atol': 1e-07, 'ignore_fields': ['DATE', 'CAL_VER', 'CAL_VCS', 'CRDS_VER', 'CRDS_CTX', 'NAXIS1', ...], 'ignore_hdus': ['ASDF'], 'ignore_keywords': ['DATE', 'CAL_VER', 'CAL_VCS', 'CRDS_VER', 'CRDS_CTX', 'NAXIS1', ...], ...}
suffix = 'calints'

    @pytest.mark.bigdata
    @pytest.mark.parametrize("suffix", ["calints", "extract_2d", "flat_field",
        "o012_crfints", "srctype", "x1dints"])
    def test_nircam_tsgrism_stage2(run_pipelines, fitsdiff_default_kwargs, suffix):
        """Regression test of tso-spec2 pipeline performed on NIRCam TSO grism data."""
        rtdata = run_pipelines
        rtdata.input = "jw00721012001_03103_00001-seg001_nrcalong_rateints.fits"
        output = "jw00721012001_03103_00001-seg001_nrcalong_" + suffix + ".fits"
        rtdata.output = output
    
        rtdata.get_truth("truth/test_nircam_tsgrism_stages/" + output)
    
        diff = FITSDiff(rtdata.output, rtdata.truth, **fitsdiff_default_kwargs)
>       assert diff.identical, diff.report()
E       AssertionError: 
E          fitsdiff: 4.0
E          a: .../jw00721012001_03103_00001-seg001_nrcalong_calints.fits
E          b: .../truth/jw00721012001_03103_00001-seg001_nrcalong_calints.fits
E          HDU(s) not to be compared:
E           ASDF
E          Keyword(s) not to be compared:
E           CAL_VCS CAL_VER CRDS_CTX CRDS_VER DATE NAXIS1 TFORM*
E          Table column(s) not to be compared:
E           CAL_VCS CAL_VER CRDS_CTX CRDS_VER DATE NAXIS1 TFORM*
E          Maximum number of different data values to be reported: 10
E          Relative tolerance: 1e-05, Absolute tolerance: 1e-07
E         
E         Extension HDU 1:
E         
E            Headers contain differences:
E              Headers have different number of cards:
E               a: 49
E               b: 48
E              Extra keyword 'SRCTYPE' in a: 'POINT'
E         
E       assert False
E        +  where False = <astropy.io.fits.diff.FITSDiff object at 0x7fb1d9ed2950>.identical

jwst/regtest/test_nircam_tsgrism.py:48: AssertionError
---------------------------------------------------------------------------------------------------------------------------------------------
OK: jwst-pipeline-results/.../test_nircam_tsgrism_stage2/jw00721012001_03103_00001-seg001_nrcalong_calints.fits
--> jwst-pipeline/dev/truth/test_nircam_tsgrism_stages/jw00721012001_03103_00001-seg001_nrcalong_calints.fits
---------------------------------------------------------------------------------------------------------------------------------------------
Enter 'o' to okify, 's' to skip:

Choosing 'o' will cause the script to overwrite the artifactory truth file with the result file from the failed test run, while 's' will ignore this diff output and move on to the next.

Okify by hand on Artifactory

If the OKify script above does not work for you, then you may have to use the following method:

  1. You need JFROG CLI to talk to Artifactory. See above.
  2. Find out what truth files need updating by looking at the test results.
  3. Find the results for the failed tests at https://bytesalad.stsci.edu/artifactory/jwst-pipeline-results/ , where there is a directory with test results for each build ordered by date. Old builds are retained until they are purged manually, usually before a new JWST delivery (see below).
  4. Copy these files from jwst-pipeline-results to jwst-pipeline/dev/truth, using the dropdown menu in the upper right, making sure the the correct path and file names are used.

Keep Artifactory tidy

Following a release, delete any testing artifacts that are more than one build old; e.g.:

jfrog rt del "jwst-pipeline-results/2021-02*"

or whatever pattern deletes artifacts generated prior to the previous build release date. This can be done interactively via the web interface too, but it is tedious.

Upload new data to Artifactory

For new regression tests, you may need to upload new data to the Artifactory server. This can be done via the command line or the web interface. New input data for instrument modes should be uploaded to the jwst-pipeline/dev/<instrument>/<mode> directories; the corresponding output truth data should be uploaded to jwst-pipeline/dev/truth/<test-name>.

Interactive upload

To use the web interface, navigate to the jwst-pipeline project, under the dev folder:

https://bytesalad.stsci.edu/ui/repos/tree/General/jwst-pipeline/dev

Make sure you are logged in with your AD credentials.

Click on the folder to upload to, then click the "Deploy" button at the top-right. Select "Multiple Deploy" to add multiple files at once. Drag and drop the files you need, or use the "Select File" interface to browse for them.

Make sure the "Target Path" is set to the directory to upload to. If you need to make a new directory, edit the path to enter the new name.

Click "Deploy" to upload the files. Note that there is sometimes no obvious visual indicator of progress when uploading multiple files.

Command line upload

To upload a single file:

jfrog rt u <filename> jwst-pipeline/dev/<path on Artifactory>

To recursively upload multiple files to the corresponding folder on Artifactory:

jfrog rt u '<top-level folder>/(**)' 'jwst-pipeline/dev/<top-level folder>/{1}'
Clone this wiki locally