# Timing the VIKING XMM test run

In this notebook we will investigate timing and requirements for the SXDS prototype run in order to estimate total computing requirements

we will work with the file generated on iris by the following command:

```
gstatement -p IRIS-IP005-CPU -u ir-shir1 -s "2021-04-01-00:00:00" -e "2021-04-21-11:00:00" > meta/jobs_20210401_20210421.lis
```

This run was with the v21 obs package with kron, cmodel, and convolved aperture measurements

In [1]:
from astropy.table import Table
import numpy as np

In [2]:
t = Table.read('./meta/jobs_20210401_20210421.lis', format='ascii') #, data_start=2, delimiter=' ')

In [5]:
"{} jobs consuming a total of {} cpuhours".format(len(t), round(np.sum(t['CompHrs']),1))

'3557 jobs consuming a total of 7789.7 cpuhours'

In [6]:
t[:5]

JobID,User,Account,JobName,Partition,End,ExitCode,State,CompHrs
int64,str8,str10,str10,str10,str19,str5,str10,float64
37304007,ir-shir1,iris-ip00+,cpujob,skylake,2021-04-06T19:05:28,0:0,COMPLETED,7.7
37319566,ir-shir1,iris-ip00+,stackInst+,skylake,2021-04-06T16:57:09,0:0,CANCELLED+,1.4
37323435,ir-shir1,iris-ip00+,stackInst+,skylake,2021-04-06T18:12:07,1:0,FAILED,1.2
37368495,ir-shir1,iris-ip00+,stackInst+,skylake,2021-04-07T21:39:28,0:0,COMPLETED,8.0
37368577,ir-shir1,iris-ip00+,cpujob,skylake,2021-04-09T01:49:56,0:0,TIMEOUT,36.0


In [7]:
t[-5:]

JobID,User,Account,JobName,Partition,End,ExitCode,State,CompHrs
int64,str8,str10,str10,str10,str19,str5,str10,float64
38360869381,ir-shir1,iris-ip00+,VIKphotop+,skylake,2021-04-20T16:41:39,0:0,COMPLETED,1.4
38360869382,ir-shir1,iris-ip00+,VIKphotop+,skylake,2021-04-20T19:13:03,0:0,COMPLETED,4.0
38360869383,ir-shir1,iris-ip00+,VIKphotop+,skylake,2021-04-20T19:33:59,0:0,COMPLETED,4.3
38360869384,ir-shir1,iris-ip00+,VIKphotop+,skylake,2021-04-20T17:55:01,0:0,COMPLETED,2.7
38360869385,ir-shir1,iris-ip00+,VIKphotop+,skylake,2021-04-20T18:44:30,0:0,COMPLETED,3.5


In [8]:
np.unique(t['JobName'])

0
VIK_INGEST
VIKcoadd
VIKphotop+
VIKproces+
cpujob
stackInst+


In [9]:
def nameToJobType(name):
    """Take the name and return the tipe of pipetask"""
    job_type = 'UNKNOWN'
    if name.startswith('VIKproc'):
        job_type = 'processCcd'
    if name.startswith('VIKcoadd'):
        job_type = 'coadd'
    if name.startswith('VIKphot'):
        job_type = 'photoPipe'
    return job_type
t['job_type']  = [nameToJobType(n) for n in t['JobName']]

In [10]:
t[:5]

JobID,User,Account,JobName,Partition,End,ExitCode,State,CompHrs,job_type
int64,str8,str10,str10,str10,str19,str5,str10,float64,str10
37304007,ir-shir1,iris-ip00+,cpujob,skylake,2021-04-06T19:05:28,0:0,COMPLETED,7.7,UNKNOWN
37319566,ir-shir1,iris-ip00+,stackInst+,skylake,2021-04-06T16:57:09,0:0,CANCELLED+,1.4,UNKNOWN
37323435,ir-shir1,iris-ip00+,stackInst+,skylake,2021-04-06T18:12:07,1:0,FAILED,1.2,UNKNOWN
37368495,ir-shir1,iris-ip00+,stackInst+,skylake,2021-04-07T21:39:28,0:0,COMPLETED,8.0,UNKNOWN
37368577,ir-shir1,iris-ip00+,cpujob,skylake,2021-04-09T01:49:56,0:0,TIMEOUT,36.0,UNKNOWN


In [11]:
print("""processCcd jobs run on stack images in SXDS

Total number of jobs: {}
Jobs completed: {}
mean per job: {} cpu hours
mean per completed job: {} cpu hours
Total time: {} cpu hours
Total time on completed jobs: {} cpu hours
""".format(
np.sum(t['job_type'] == 'processCcd'),
np.sum((t['job_type'] == 'processCcd') &(t['State'] == 'COMPLETED') ),
np.mean(t['CompHrs'][(t['job_type'] == 'processCcd') ]),
np.mean(t['CompHrs'][(t['job_type'] == 'processCcd') &(t['State'] == 'COMPLETED') ]),
np.sum(t['CompHrs'][(t['job_type'] == 'processCcd') ]),
np.sum(t['CompHrs'][(t['job_type'] == 'processCcd') &(t['State'] == 'COMPLETED') ])
)
)


processCcd jobs run on stack images in SXDS

Total number of jobs: 1226
Jobs completed: 903
mean per job: 0.30399673735725935 cpu hours
mean per completed job: 0.4068660022148395 cpu hours
Total time: 372.7 cpu hours
Total time on completed jobs: 367.40000000000003 cpu hours



In [12]:
print("""coadd jobs run on stack images in SXDS

Total number of jobs: {}
Jobs completed: {}
mean per job: {} cpu hours
mean per completed job: {} cpu hours
Total time: {} cpu hours
Total time on completed jobs: {} cpu hours
""".format(
np.sum(t['job_type'] == 'coadd'),
np.sum((t['job_type'] == 'coadd') &(t['State'] == 'COMPLETED') ),
np.mean(t['CompHrs'][(t['job_type'] == 'coadd') ]),
np.mean(t['CompHrs'][(t['job_type'] == 'coadd') &(t['State'] == 'COMPLETED') ]),
np.sum(t['CompHrs'][(t['job_type'] == 'coadd') ]),
np.sum(t['CompHrs'][(t['job_type'] == 'coadd') &(t['State'] == 'COMPLETED') ])
)
)

coadd jobs run on stack images in SXDS

Total number of jobs: 773
Jobs completed: 772
mean per job: 0.26248382923673996 cpu hours
mean per completed job: 0.26282383419689115 cpu hours
Total time: 202.89999999999998 cpu hours
Total time on completed jobs: 202.89999999999998 cpu hours



In [None]:
#It is worrying that these times are close to the 36 hour maximum. Can I split them up?

In [13]:
#Most of these are failures from memory shortages at teh coadd stage before I separated them
print("""photoPipe jobs run on stack images in SXDS

Total number of jobs: {}
Jobs completed: {}
mean per job: {} cpu hours
mean per completed job: {} cpu hours
Total time: {} cpu hours
Total time on completed jobs: {} cpu hours
""".format(
np.sum(t['job_type'] == 'photoPipe'),
np.sum((t['job_type'] == 'photoPipe') &(t['State'] == 'COMPLETED') ),
np.mean(t['CompHrs'][(t['job_type'] == 'photoPipe') ]),
np.mean(t['CompHrs'][(t['job_type'] == 'photoPipe') &(t['State'] == 'COMPLETED') ]),
np.sum(t['CompHrs'][(t['job_type'] == 'photoPipe') ]),
np.sum(t['CompHrs'][(t['job_type'] == 'photoPipe') &(t['State'] == 'COMPLETED') ])
)
)

photoPipe jobs run on stack images in SXDS

Total number of jobs: 1547
Jobs completed: 384
mean per job: 4.618034906270201 cpu hours
mean per completed job: 3.804166666666666 cpu hours
Total time: 7144.1 cpu hours
Total time on completed jobs: 1460.7999999999997 cpu hours



In [14]:
np.max(t['CompHrs'][(t['job_type'] == 'photoPipe') &(t['State'] == 'COMPLETED') ])

10.7

In [15]:
t[(t['job_type']=='coadd') & (t['State'] == 'FAILED')][:5]

JobID,User,Account,JobName,Partition,End,ExitCode,State,CompHrs,job_type
int64,str8,str10,str10,str10,str19,str5,str10,float64,str10
37988916386,ir-shir1,iris-ip00+,VIKcoadd,skylake-h+,2021-04-16T18:00:17,2:0,FAILED,0.0,coadd


In [16]:
np.unique(t[(t['job_type']=='coadd') & (t['State'] == 'FAILED')]['ExitCode'])

0
2:0


In [17]:
t[(t['job_type']=='coadd') ][:5]

JobID,User,Account,JobName,Partition,End,ExitCode,State,CompHrs,job_type
int64,str8,str10,str10,str10,str19,str5,str10,float64,str10
379889160,ir-shir1,iris-ip00+,VIKcoadd,skylake-h+,2021-04-16T18:15:47,0:0,COMPLETED,0.3,coadd
379889161,ir-shir1,iris-ip00+,VIKcoadd,skylake-h+,2021-04-16T18:15:47,0:0,COMPLETED,0.3,coadd
379889162,ir-shir1,iris-ip00+,VIKcoadd,skylake-h+,2021-04-16T18:15:17,0:0,COMPLETED,0.3,coadd
379889163,ir-shir1,iris-ip00+,VIKcoadd,skylake-h+,2021-04-16T18:15:13,0:0,COMPLETED,0.3,coadd
379889164,ir-shir1,iris-ip00+,VIKcoadd,skylake-h+,2021-04-16T18:15:13,0:0,COMPLETED,0.3,coadd


# 2 Calculate total times

Lets calculate some broad estimates for the main runs we will go on to perform

## 2.1 SXDS VIKING run

This is the run used for the timing tests.

In [20]:
n_viking_sxds_images = 386 # From ./1_SLurm_factory.ipynb
n_viking_sxds_patches = 306 # From ./1_SLurm_factory.ipynb

mean_processCcd = np.mean(t['CompHrs'][(t['job_type'] == 'processCcd') &(t['State'] == 'COMPLETED') ])
mean_coadd= np.mean(t['CompHrs'][(t['job_type'] == 'coadd') &(t['State'] == 'COMPLETED') ])
mean_photo = np.mean(t['CompHrs'][(t['job_type'] == 'photoPipe') &(t['State'] == 'COMPLETED') ])

print("""
Full processing of input images: {} cpu hours
Full coadd of all patches: {} cpu hours
Full photometry pipeline on all patches: {} cpu hours
Total time for {} images and {} patches: {} cpu hours
""".format(
    round(n_viking_sxds_images * mean_processCcd),
    round(n_viking_sxds_patches* mean_coadd),
    round(n_viking_sxds_patches* mean_photo),
    n_viking_sxds_images, n_video_sxds_patches,
    round(
        n_viking_sxds_images * mean_processCcd 
        + n_viking_sxds_patches* mean_coadd
        + n_viking_sxds_patches* mean_photo
    )
))
    


Full processing of input images: 157 cpu hours
Full coadd of all patches: 80 cpu hours
Full photometry pipeline on all patches: 1164 cpu hours
Total time for 386 images and 306 patches: 1402 cpu hours



## 2.2 Full VIKING overlap estimate

Run over full overlap of VHS and HSC PDR2 XMM field

In [21]:
n_vhs_xmm_images = 306+3595+17171 # From ./1_Slurm_factory.ipynb
n_vhs_xmm_patches = 386+3342+11829 #Rough from tract numbers

mean_processCcd = np.mean(t['CompHrs'][(t['job_type'] == 'processCcd') &(t['State'] == 'COMPLETED') ])
mean_coadd= np.mean(t['CompHrs'][(t['job_type'] == 'coadd') &(t['State'] == 'COMPLETED') ])
mean_photo = np.mean(t['CompHrs'][(t['job_type'] == 'photoPipe') &(t['State'] == 'COMPLETED') ])

print("""
Full processing of input images: {} cpu hours
Full coadd of all patches: {} cpu hours
Full photometry pipeline on all patches: {} cpu hours
Total time for {} images and {} patches: {} cpu hours
""".format(
    round(n_vhs_xmm_images * mean_processCcd),
    round(n_vhs_xmm_patches* mean_coadd),
    round(n_vhs_xmm_patches* mean_photo),
    n_vhs_xmm_images, n_vhs_xmm_patches,
    round(
        n_vhs_xmm_images * mean_processCcd 
        + n_vhs_xmm_patches* mean_coadd
        + n_vhs_xmm_patches* mean_photo
    )
))


Full processing of input images: 8573 cpu hours
Full coadd of all patches: 4089 cpu hours
Full photometry pipeline on all patches: 59181 cpu hours
Total time for 21072 images and 15557 patches: 71844 cpu hours



## 2.3 VIKING complete run

A first run might not include combination with GRIZY data prior to LSST but we can can simply use the HSC/VISTA SXDS times to estimate here.

For now this is just using the VHS all sky numbers. This will therefore constitute an upper limit on the VHS all sky run. How much more than the HSC wide area is there?

In [22]:
n_vhs_xmm_images = 204996 # From ../dmu1/data/vhs_images_overview_$DATE.fits
n_vhs_xmm_patches =  670137# From ../dmu1/data/vhs_tiles_tracts_patches.fits

mean_processCcd = np.mean(t['CompHrs'][(t['job_type'] == 'processCcd') &(t['State'] == 'COMPLETED') ])
mean_coadd= np.mean(t['CompHrs'][(t['job_type'] == 'coadd') &(t['State'] == 'COMPLETED') ])
mean_photo = np.mean(t['CompHrs'][(t['job_type'] == 'photoPipe') &(t['State'] == 'COMPLETED') ])

print("""
Full processing of input images: {} cpu hours
Full coadd of all patches: {} cpu hours
Full photometry pipeline on all patches: {} cpu hours
Total time for {} images and {} patches: {} cpu hours
""".format(
    round(n_vhs_xmm_images * mean_processCcd),
    round(n_vhs_xmm_patches* mean_coadd /6), #assume 6 times fewer images to coadd
    round(n_vhs_xmm_patches* mean_photo ),  #assume JHK and LSST UGRIZY
    n_vhs_xmm_images, n_vhs_xmm_patches,
    round(
        n_vhs_xmm_images * mean_processCcd 
        + n_vhs_xmm_patches* mean_coadd/6
        + n_vhs_xmm_patches* mean_photo
    )
))


Full processing of input images: 83406 cpu hours
Full coadd of all patches: 29355 cpu hours
Full photometry pipeline on all patches: 2549313 cpu hours
Total time for 204996 images and 670137 patches: 2662073 cpu hours

