# Make slurm files required to produce SXDS joint VISTA-HSC data product.

In this notebook we will make all the slurm files required to run the whole VISTA-VIDEO HSC-DUD joint photometry pipeline.

We need to find all the patches in the HSC imaging and produce a slurm pipeline file for every patch or group of patches.

This will be a maximum of around 4 tracts * 91 patches per tract = 364 patches

We will also need to set up the data directories including linking relevant reference catalogues and copying the required HSC data products which are already processed.

## 1 Find all the relevant VIDEO images.

The first stage is parallesised by ccd. We will create one job for every date. This should be small enough to fit in a 24hr job.

In [9]:
date_list = ['20121122', '20171027']

sxds_patches = [8282,8283,8284,8523,8524,8525,8765,8766,8767] #manually got these

In [None]:
#For simplicity lets ingest all the images (They are only links and this stage is fast)
!mkdir data
!mkdir slurm
for date in date_list:
    #!ingestImages.py data /path/to/vista/{date}/*[0-9].fit #Exposures
    !ingestImages.py data /path/to/vista/{date}/*_st.fit #Stacks

In [None]:
date_list = [date[0:4]+'-'+date[4:6]+'-'+date[6:9] for date in date_list]

## 2 Process CCDs

This stage is parallelised accroding to the raw files ingested. We are going to make one job per date

In [11]:
proc_template = """
!/bin/bash
source /rfs/project/rfs-L33A9wsNuJk/shared/lsst_stack/loadLSST.bash
setup lsst_distrib
setup obs_vista
processCcd.py data --rerun processCcdOutputs --id obsDate={obsDate}
"""
for date in date_list:
    
    print(proc_template.format(obsDate=date))
    print("./slurm/processCcd_{}.sh")
    print("./slurm/processCcd_{}.slurm")


!/bin/bash
source loadLSST.bash
setup lsst_distrib
setup obs_vista
processCcd.py data --rerun processCcdOutputs --id obsDate=2012-11-22
    
./slurm/processCcd_{}.sh
./slurm/processCcd_{}.slurm

!/bin/bash
source loadLSST.bash
setup lsst_distrib
setup obs_vista
processCcd.py data --rerun processCcdOutputs --id obsDate=2017-10-27
    
./slurm/processCcd_{}.sh
./slurm/processCcd_{}.slurm


In [None]:
#We can now submit these after the processCcd has run with
#qsub ./slurm/processCcd*.slurm

## 3 Run full patch
Make one shell script and slurm script for each patch

In [None]:
#HSC preprocessed files must be copied into place
#!cp /Users/rs548/GitHub/lsst-ir-fusion/dmu0/dmu0_HSC/data/hsc-release.mtk.nao.ac.jp/archive/filetree/pdr2_dud/deepCoadd-results/HSC-R data/rerun/coaddPhot/deepCoadd-results/HSC-R


In [None]:
template_sh = """
#!/bin/bash
makeCoaddTempExp.py data --rerun coadd --selectId filter=VISTA-Y --id filter=VISTA-Y tract={tract} patch={patches} 
makeCoaddTempExp.py data --rerun coadd --selectId filter=VISTA-Ks --id filter=VISTA-Ks tract={tract} patch={patches} 

assembleCoadd.py data --rerun coadd --selectId filter=VISTA-Y --id filter=VISTA-Y tract={tract} patch={patches}
assembleCoadd.py data --rerun coadd --selectId filter=VISTA-Ks --id filter=VISTA-Ks tract={tract} patch={patches}

detectCoaddSources.py data --rerun coadd:coaddPhot --id filter=VISTA-Y tract={tract} patch={patches}
detectCoaddSources.py data --rerun coadd:coaddPhot --id filter=VISTA-Ks tract={tract} patch={patches}

mergeCoaddDetections.py data --rerun coaddPhot --id filter=VISTA-Y^VISTA-Ks^HSC-R tract={tract} patch={patches}

deblendCoaddSources.py data --rerun coaddPhot --id filter=VISTA-Y tract={tract} patch={patches}
deblendCoaddSources.py data --rerun coaddPhot --id filter=VISTA-Ks tract={tract} patch={patches}
deblendCoaddSources.py data --rerun coaddPhot --id filter=HSC-R tract={tract} patch={patches}

measureCoaddSources.py data --rerun coaddPhot --id filter=VISTA-Y tract={tract} patch={patches}
measureCoaddSources.py data --rerun coaddPhot --id filter=VISTA-Ks tract={tract} patch={patches}
measureCoaddSources.py data --rerun coaddPhot --id filter=HSC-R tract={tract} patch={patches}

mergeCoaddMeasurements.py data --rerun coaddPhot --id filter=VISTA-Y^VISTA-Ks^HSC-R tract={tract} patch={patches}

forcedPhotCoadd.py data --rerun coaddPhot:coaddForcedPhot --id filter=VISTA-Y tract={tract} patch={patches}
forcedPhotCoadd.py data --rerun coaddForcedPhot --id filter=VISTA-Ks tract={tract} patch={patches}
forcedPhotCoadd.py data --rerun coaddForcedPhot --id filter=HSC-R tract={tract} patch={patches}

"""
template_slurm = """
#!/bin/bash
#!
#! Example SLURM job script for Peta4-Skylake (Skylake CPUs, OPA)
#! Last updated: Mon 13 Nov 12:25:17 GMT 2017
#!

#!#############################################################
#!#### Modify the options in this section as appropriate ######
#!#############################################################

#! sbatch directives begin here ###############################
#! Name of the job:
#SBATCH -J {jobName}
#! Which project should be charged:
#SBATCH -A IRIS-IP005-CPU
#! How many whole nodes should be allocated?
#SBATCH --nodes=1
#! How many (MPI) tasks will there be in total? (<= nodes*32)
#! The skylake/skylake-himem nodes have 32 CPUs (cores) each.
#SBATCH --ntasks=1
#! How much wallclock time will be required?
#SBATCH --time=36:00:00
#! What types of email messages do you wish to receive?
#SBATCH --mail-type=FAIL
#! Uncomment this to prevent the job from being requeued (e.g. if
#! interrupted by node failure or system downtime):
##SBATCH --no-requeue

#! For 6GB per CPU, set "-p skylake"; for 12GB per CPU, set "-p skylake-himem": 
#SBATCH -p skylake

#! sbatch directives end here (put any additional directives above this line)

#! Notes:
#! Charging is determined by core number*walltime.
#! The --ntasks value refers to the number of tasks to be launched by SLURM only. This
#! usually equates to the number of MPI tasks launched. Reduce this from nodes*32 if
#! demanded by memory requirements, or if OMP_NUM_THREADS>1.
#! Each task is allocated 1 core by default, and each core is allocated 5980MB (skylake)
#! and 12030MB (skylake-himem). If this is insufficient, also specify
#! --cpus-per-task and/or --mem (the latter specifies MB per node).

#! Number of nodes and tasks per node allocated by SLURM (do not change):
numnodes=$SLURM_JOB_NUM_NODES
numtasks=$SLURM_NTASKS
mpi_tasks_per_node=$(echo "$SLURM_TASKS_PER_NODE" | sed -e  's/^\([0-9][0-9]*\).*$/\1/')
#! ############################################################
#! Modify the settings below to specify the application's environment, location 
#! and launch method:

#! Optionally modify the environment seen by the application
#! (note that SLURM reproduces the environment at submission irrespective of ~/.bashrc):
. /etc/profile.d/modules.sh                # Leave this line (enables the module command)
module purge                               # Removes all modules still loaded
module load rhel7/default-peta4            # REQUIRED - loads the basic environment

#! Insert additional module load commands after this line if needed:

#! Full path to application executable: 
application="{shellScript}"

#! Run options for the application:
options=""

#! Work directory (i.e. where the job will run):
workdir="$SLURM_SUBMIT_DIR"  # The value of SLURM_SUBMIT_DIR sets workdir to the directory
                             # in which sbatch is run.

#! Are you using OpenMP (NB this is unrelated to OpenMPI)? If so increase this
#! safe value to no more than 32:
export OMP_NUM_THREADS=1

#! Number of MPI tasks to be started by the application per node and in total (do not change):
np=$[${numnodes}*${mpi_tasks_per_node}]

#! The following variables define a sensible pinning strategy for Intel MPI tasks -
#! this should be suitable for both pure MPI and hybrid MPI/OpenMP jobs:
export I_MPI_PIN_DOMAIN=omp:compact # Domains are $OMP_NUM_THREADS cores in size
export I_MPI_PIN_ORDER=scatter # Adjacent domains have minimal sharing of caches/sockets
#! Notes:
#! 1. These variables influence Intel MPI only.
#! 2. Domains are non-overlapping sets of cores which map 1-1 to MPI tasks.
#! 3. I_MPI_PIN_PROCESSOR_LIST is ignored if I_MPI_PIN_DOMAIN is set.
#! 4. If MPI tasks perform better when sharing caches/sockets, try I_MPI_PIN_ORDER=compact.


#! Uncomment one choice for CMD below (add mpirun/mpiexec options if necessary):

#! Choose this for a MPI code (possibly using OpenMP) using Intel MPI.
CMD="mpirun -ppn $mpi_tasks_per_node -np $np $application $options"

#! Choose this for a pure shared-memory OpenMP parallel program on a single node:
#! (OMP_NUM_THREADS threads will be created):
#CMD="$application $options"

#! Choose this for a MPI code (possibly using OpenMP) using OpenMPI:
#CMD="mpirun -npernode $mpi_tasks_per_node -np $np $application $options"


###############################################################
### You should not have to change anything below this line ####
###############################################################

cd $workdir
echo -e "Changed directory to `pwd`.\n"

JOBID=$SLURM_JOB_ID

echo -e "JobID: $JOBID\n======"
echo "Time: `date`"
echo "Running on master node: `hostname`"
echo "Current directory: `pwd`"

if [ "$SLURM_JOB_NODELIST" ]; then
        #! Create a machine file:
        export NODEFILE=`generate_pbs_nodefile`
        cat $NODEFILE | uniq > machine.file.$JOBID
        echo -e "\nNodes allocated:\n================"
        echo `cat machine.file.$JOBID | sed -e 's/\..*$//g'`
fi

echo -e "\nnumtasks=$numtasks, numnodes=$numnodes, mpi_tasks_per_node=$mpi_tasks_per_node (OMP_NUM_THREADS=$OMP_NUM_THREADS)"

echo -e "\nExecuting command:\n==================\n$CMD\n"

eval $CMD 

"""

In [None]:
for tract in sxds_tracts:
    pFolders = glob.glob(HSC_LOC + '/hsc-release.mtk.nao.ac.jp/archive/filetree/pdr2_dud/deepCoadd-results/*/{}/*/calexp*.fits'.format(tract))

for tract, patch in sxds_patches:
    #write shell script
    "./slurm/patch_{}_{}.sh".format(tract,patch)
    #write slurm script
    "./slurm/patch_{}_{}.slurm".format(tract,patch)

In [None]:
#We can now submit these after the processCcd has run with
#qsub ./slurm/patch*.slurm