# The SNfactory production plan flow
The SNfactory production plan flow is composed by 10 major **plans**, producing all the data from prepocessed files to final flux-calibrated and host-subtracted spectra (found in the IDR). These plans are preceded by a few other scripts doing the data copy from the summit to the CC, and filling the DB with these new data and information. As you can see in the following figure, all the first steps (before photometric ratios estimate and flux calibration) are night-oriented (run on all targets of a given night), while most of the others are target-oriented (run on all nights of a given target). 

[![Plan Flow](figures/planFlow.png)](figures/planFlow.pdf "SNfactory plan flow")

Here is in order the list of scripts and plans that are run to produce the final SNfactory *science* data-set (the IDR) from the raw observations.

Data transfer and summit cleaning:

* `export_sync`: Run at summit. Make sure all data are sent from the summit to the CC IN2P3.
* `hsi_import`: copy the new data to the HPSS disk.
* summit cleaning: Clean the summit computer for new observations.

DB header update and data filling:

* `snf_header`: Update the DB header with the new informations.
* `snf_db_make`: Create a pickle file containg all the new data info.
* `snf_db_fill`: Fill the DB using the previoulsy made pickle file.

Update DB from warehouse information (z, Ebmv) and flag Run **Kind** (e.g., for references):

* `SyncTarget`: synchronize the new target information found in WareHouse with the DB information.
* `FlagRunKind`: check the Run.Kind for errors or mismatches and update it if needed, e.g., to define if a spectrum is a final reference (according to its time difference with the previous observation), or to check for new SCALA data.

Data processing:

* `plan_file_quality`: data preprocessing.
* `plan_synthetic_arcs`: include synthetic arcs in the DB.
* `plan_cube_generation`: cube extraction.
* `plan_extract_star`: point source object extraction and spectra production.
* `plan_multi_standard`: multi-standard extinction estimate.
* `tabPhotometricity`: compute and update the night photometricity.
* `plan_photometric_ratios`: compute the photometric flux ratios for a given list of target.
* `plan_photometric_ratios`: (scale factor): compute the MFR scale factor.
* `plan_flux_solution`: compute the flux solutions in a given night. 
* `plan_flux_calibration`: flux calibrate the spectra and cubes.
* `plan_analyze_timeseries`: merge spectra, compute magnitude, run SALT2 fits.
* `plan_gs_psf`: PSF estimate from the photometric channel.
* `plan_cubefit`: host-galaxy subtraction (plan_ddt is deprecated).
* `plan_extract_star`: point-source object extraction on host-galaxy subtracted cubes.

Final dataset pre-analysis and packaging:

* `plan_analyse_timeseries`: same as before, but on host-subtracted data.
* `study_flux_quality`: check the magnitude dispersion for StdStars and SALT2 fits for SNe Ia. Build the good/bad lists.
* `update_idr_config`: use this script to create the CONFIG.yaml file from an old IDR.
* `build_idr`: build an IDR from a given production name and list of targets.

If not explicitely specified, all these scripts have to be run on the CC under the **snprod** account. Details on how to run (most of) them are given on the following SNf [twiki page](https://snf-doc.lbl.gov/twiki/bin/view/Tasks/NewDataProcessing).

# File name convention

If you have observed with SNIFS, you will know the raw-data filename format. There are deviations from this structure, but they are not numerous. The raw-data filename format is hierarchical: 

    YY_DDD_RRR_EEE_FF_C.ext
    
where

* **YY**: 2-digit code for a year (UTC).
* **DDD**: 3-digit code for day of year the file was created (UTC).
* **RRR**: 3-digit "run code" -- starts at 001 every UTC day.
    * A run is a temporally connected group of exposures made by SNIFS.
    * A run consists of one or more exposures.
* **EEE**: 3-digit "exposure code" -- starts at 001 for each run.
    * Exposures may have been taken simultaneoustly in a run (across multiple channels of SNIFS: R, B or P).
    * Exposures may have been taken in sequence in a run (a continuum, followed by a science exposure, followed by an arc).
* **FF**: 2-digit "fclass" -- indicates what type of file this is.
    * An exhaustive list of fclasses can be found at the CC by running SnfFclass at the command line (fclasses are in the first column of output).
    * Just ignore the leading 0 for raw data files (see below).
    * Some important ones familiar to SNIFS observers: 03 -- arc, 17 -- science, 07 -- continuum, etc.
    * The pipeline will make use of fclasses that are potentially unfamiliar to someone who has used SNIFS and know.
* **C**: 1-letter "channel" code -- one of R, B, or P.
    * Tells what channel the exposure was taken on.

In the SNIFS processing framework, there is an extended filename format for **processed data**. Once raw data are registered they receive an extended filename. The extended filename format is also hierarchical: 

    YY_DDD_RRR_EEE_C_FFF_XXX_VV-VV_III.ext (e.g. 06_334_103_004_4_004_107_02-02_000.fits)

where

* **YY**, **DDD**, **RRR**, **EEE** are same as above.
* **C**: 1-digit "channel" code (note, in raw-data format it is a letter, here it is a digit).
    * The code is a bitwise-and of multiple channel codes: P = 1, R = 2, B = 4.
    * So, 6 is B+R, 7 is B+R+P, etc.
* **FFF**: 3-digit "fclass" -- an extension to the raw-data fclass formats.
    * An exhaustive list of fclasses can be found at the CC by running SnfFclass at the command line (fclasses are in the first column of output, see next section).
    * Raw data have a 0 prepended onto their fclass, so fclass below 100 is "reserved" for raw data.
    * Other files created by the pipeline will have fclasses greater than 099, and are assigned as needed.
* **XXX**: 3-digit "xfclass" -- a further extension to the fclass framework.
    * An exhaustive list of xfclasses can be found at the CC by running SnfFclass at the command line.
    * Think of an fclass as a "namespace" so an xfclass can appear once per fclass, but multiple times across fclasses.
    * This trick is often used to track production elements (a common xfclass indicates one route through a section of the pipeline).
    * A given plan/script will produce different type of files allways having the same set of Fclass/XFclass.
* **VV-VV**: 4-digit "processing version" (latest in use are (200, 201 and 203)
    * This is supposed to be a tag representing the state of the software run to produce the file.
* **III**: 3-digit "index"
    * An integer index differentiating otherwise identically named files (usually, due to reprocessing or re-running the pipeline with changed inputs).



# SnfFclass
Each processed file is associated to a specific set of Fclass/XFclass, mostly given chronologically according to the plan flow. Raw data and preprocessed files have low Fclass numbers (respectively 17 and 18 for object cubes, see below), while flux calibrated spectra and SALT2 light-curve output files have a high Fclass numbers (666 and 700 respectively). All sets of Fclass/XFclass are available through the `SnfFclass.py` script, which is stored on the SNFactory CVS repository under `SNFactory/Tasks/Processing/database/SnfObj/`. It can be used directly from the shell or call in an ipython as shown below.

## Basic Fclass / XFclass info

In [2]:
import SnfFclass
print SnfFclass.document(17) # SnfFclass -f 17 

F    XF   Description
017  ---  Raw object frame
017  000  Raw object frame
017  001  Guide star vid-file
017  002  Guide star gs-file
017  003  Guide star ot-file
017  004  Guide star tg-file



In [2]:
print SnfFclass.document(666) # SnfFclass -f 666

F    XF   Description
666  ---  Flux-calibrated spectrum
666  000  Flux-calibrated spectrum - quick_extract
666  001  Flux-calibrated variance - quick_extract
666  010  Flux-calibrated background - quick_extract
666  011  Flux-calibrated bkgnd variance - quick_extract
666  100  Flux-calibrated spectrum - extract_star
666  101  Flux-calibrated variance - extract_star
666  110  Flux-calibrated background - extract_star
666  111  Flux-calibrated bkgnd variance - extract_star
666  632  DDT + extract_star correlation plot [png]
666  720  Flux-calibrated spectrum - DDT + extract_star
666  722  Flux-calibrated background - DDT + extract_star
666  724  DDT + extract_star 2D-fit log-file
666  725  DDT + extract_star 3D-fit log-file
666  730  DDT + extract_star spectrum plot [png]
666  731  DDT + extract_star slice fit plot [png]
666  732  DDT + extract_star profile plot [png]
666  733  DDT + extract_star ADR plot [png]
666  734  DDT + extract_star residual plot [png]
666  735  DDT + extract_sta

If you are looking for a Fclass/XFclass related to a specific kind of processed, SALT2 in the following example, use it this way:

    SnfFclass | grep SALT2
    679:700  ---  SALT2 fit files
    680:700  000  SALT2 fit results YAML file          - quick_extract
    681:700  001  SALT2 lightcurve fit plot            - quick_extract
    682:700  002  SALT2 fit logfile                    - quick_extract
    683:700  003  SALT2 fit results (meta) YAML file   - quick_extract
    685:700  101  SALT2 lightcurve fit plot            - extract_star
    684:700  100  SALT2 fit results YAML file          - extract_star
    686:700  102  SALT2 fit logfile                    - extract_star
    687:700  103  SALT2 fit results (meta) YAML file   - extract_star
    688:700  720  SALT2 fit results YAML file          - DDT + extract_start
    689:700  721  SALT2 lightcurve fit plot            - DDT + extract_start
    690:700  722  SALT2 fit logfile                    - DDT + extract_start
    691:700  723  SALT2 fit results (meta) YAML file   - DDT + extract_start

## Parents / Children

### Fclass / XFclass sets

A process/file is produced by a plan, which gives to this process a specific set of Fclass/XFclass according to its kind. A process always has at least one **parent** from which it is derived, and is most of the time used to produce an other file, and has thus at least one **child**. This parent/child relation is represented in the following diagram, where all set of Fclass/XFclass have been included.

[![F/XFclass](figures/FXFclass.png)](figures/FXFclass.png "Fclass/XFclass relationship")

For a given set of Fclass/XFclass, you can also get the list of their parents/children FClass/XFclass with the *SnfFclass* script:

In [3]:
SnfFclass.get_parents(('38','100')) # SnfFclass -p 38,100

Parents of processes with Fclass=38, XFclass=100
   F    XF      Description
  022  000  Reduced object cube [Euro3D]


In [4]:
SnfFclass.get_children(('38','100')) # SnfFclass -c 38,100

Children of processes with Fclass=38, XFclass=100
   F    XF      Description
  620  600  Multi-std telluric correction
  625  600  Multi-std extinction (photometric)
  625  610  Multi-std extinction (non-photometric)
  630  600  Multi-std flux solution (photometric)
  630  610  Multi-std flux solution (non-photometric)
  630  700  MFR adjusted multi-std flux solution
  640  100  Unknown XFclass. Fclass: Fx-calib spectrum (w/telluric) 
  640  110  Unknown XFclass. Fclass: Fx-calib spectrum (w/telluric) 


### Jobs

If you have a job or a set of jobs in mind (e.g., SNF-0203-NEWYORKf, or more specificaly SNF-0203-NEWYORKf-09dlc), and would like to know what are their typical input and output files, use the following function of SnfFclass:

In [3]:
SnfFclass.print_in_out_puts('SNF-0203-NEWYORKf-09dlc')

Main inputs:
* 22 / 0: Reduced object cube [Euro3D]
* 38 / 100: Point-source spectrum - extract_star
* 620 / 600: Multi-std telluric correction
* 625 / 600: Multi-std extinction (photometric)
* 625 / 700: MFR adjusted multi-std extinction
* 630 / 600: Multi-std flux solution (photometric)
* 630 / 700: MFR adjusted multi-std flux solution
* 995 / 0: SkyProbe photometricity data
* 995 / 42: Photometricity overrides
Main outputs:
* 23 / 0: Flux-calibrated cube [Euro3D]
* 23 / 10: Telluric-corrected, flux-calibrated cube [Euro3D]
* 23 / 12: Telluric-corrected, flux-calibrated cube [3D]
* 640 / 100: Fx-calib spectrum (w/telluric)
* 640 / 110: Fx-calib spectrum (w/telluric)
* 666 / 100: Flux-calibrated spectrum - extract_star
* 666 / 110: Flux-calibrated background - extract_star


It is also available as a command line

        SnfFclass -j SNF-0203-NEWYORKf-09dlc


# The SNfactory database

The SNfactory database (DB) is a ProstgreSQL database (ver. 9.3.5) interfaced with the python Django framework (ver. (0, 97, 'pre')). It has a current size of 15.4GB (as of Wed. Dec. 2, 2015) and is hosted on a shared server (*ccpgsql.in2p2.fr*, on port 5432) currently containing 33 other DBs. No size limitation is given to a specific DB, but a total of 500GB is avalaible on this server, of which about 100GB is currently used. A complete backup of the server and of the transaction logs are made every day at 8pm (French time), and are kept 180 days. This allows a complete restoration of the server at any time over the 6 past months. In case of accident/incident on the SNfactory DB, a request for restoration through the [CC-IN2P3 user ticket](https://cc-usersupport.in2p3.fr/otrs/customer.pl) can be done. Since this type of intervention is quite heavy, it is better to inform the CC of any operation that could impact the DB to ask for a backup beforehand. On this DB, the maximum number of simultaneous connections is 400, which allows us to run up to 400 jobs at the same time, the current limit being actually 360 (see below in the section about the production).

The SNfactory database contains 7 major *tables* (or *models* in the Django vocabulary) containing all the information about our targets, runs, processes, jobs, etc. They are shown in the following figure, and are, in order: 
    
* **Target**: Obvious target information (name, type, coordinates, etc.). A Target entry has-many Runs.
* **Run**: Description of the target pointing, corresponding to an event at the summit (date, pointing type, etc.). A Run entry has-many Exposures;
* **Exposure**: Description of the different data taking condition of a pointing (data and condition of the acquisition, science or calibration pose, P/B/R channels or a combination of the three, etc.). An Exposure entry has-many Poses;
* **Pose**: Decritpion of the CCD poses coming from the acquisition (exposure time, guiding and pose qualities, etc.). A Pose entry has-many Processes;
* **Process**: The basic semantic unit in the pipeline. A Process is an instance of some operation product. It is identified by a labeling system called "[X]Fclass". A Process has many Files. Examples:
    * A raw-data file is a process.
    * Preprocessed (overscan+bias+dark) data file is a process.
    * An extracted spectrum, before or after flux calibration, is a process (both can be processes in the database).        
* **File**: Description of the files produced by the data processing codes (name, size, checksum, etc.). May be raw-data or processed data.;
* **Job**: Description of the agents having supervised the processing operations in the worker farm at CC-IN2P3 run by the pipeline (name, version, state, date, Qsub command used, etc.)
      
These tables are linked together according to the scheme shown below, which also gives most of their attributes. Each table has a primary key (id) which is what foreign keys in other tables point to (the IdXXX columns). There is a secondary sort of key with a name like IdTarget or IdProcess, more often used as a handle in your programs. These models are available in the SNfactory processing package and can be imported as followed:

    from processing.process.models import Target, Run, Exposure, Pose, Process, File, Job

The database code is avaible on the [SNfactory CVS](https://cvs.in2p3.fr/snovae-SNFactory/), and is located under

    SNFactory/Tasks/Processing/database/django/processing_095]

It is used in almost all the pipeline steps in order to query or to save data. If you have it installed on your personnal computer and are not in a IN2P3 network, you can still have a local access to the DB by tunneling localhost:5432 to ccpgsql.in2p3.fr:5432, e.g.:

ssh -C -N snprod@ccage.in2p3.fr -L 5432/ccpgsql.in2p3.fr/5432

For a more complete description of the DB tables and their content, please have a look at [this document](https://snf-doc.lbl.gov/twiki/pub/Offline/DataAccessOverview/snf_process.pdf), section 4.2.2 ("Tables description"), page 9.

[![Snovae DB](figures/SnovaeDB3.png)](figures/SnovaeDB3.png "SNfactory DB scheme")