# Timing the SXDS run

In this notebook we will investigate timing and requirements for the SXDS prototype run in order to estimate total computing requirements

we wil work with the file generated on iris by the following command:

```
gstatement -p IRIS-IP005-CPU -u ir-shir1 -s "2020-07-04-00:00:00" -e "2020-10-21-00:00:00" > jobs.lis
```

In [1]:
# What version of the Stack are we using?
! eups list -s | grep lsst_distrib
! eups list -s | grep obs_vista

lsst_distrib          g2d4714e03a+6e1aa0b536 	current w_2022_07 w_latest setup
obs_vista             23.0.0-1   	current setup


In [2]:
from astropy.table import Table
import numpy as np

In [5]:
t = Table.read('./slurm/jobs_20220217.lis', format='ascii') #, data_start=2, delimiter=' ')

In [6]:
"I have submitted a total of {} jobs consuming a total of {} cpuhours".format(len(t), np.sum(t['CompHrs']))

'I have submitted a total of 701480 jobs consuming a total of 1225031.9 cpuhours'

In [7]:
t[:5]

JobID,User,Account,JobName,Partition,End,ExitCode,State,CompHrs
str12,str8,str10,str10,str10,str19,str5,str10,float64
25933230,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-13T16:18:36,127:0,FAILED,0.1
25940256,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-14T00:08:41,1:0,FAILED,74.2
25948528,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-14T14:34:59,0:0,COMPLETED,254.3
26005193,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-16T03:59:49,0:0,TIMEOUT,640.0
26027949,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-17T10:46:53,0:0,TIMEOUT,1536.4


In [8]:
t[-5:]

JobID,User,Account,JobName,Partition,End,ExitCode,State,CompHrs
str12,str8,str10,str10,str10,str19,str5,str10,float64
55293868_9,ir-shir1,iris-ip00+,VIDproces+,cclake,2022-02-15T16:04:52,1:0,FAILED,0.0
55293868_10,ir-shir1,iris-ip00+,VIDproces+,cclake,2022-02-15T16:04:52,1:0,FAILED,0.0
55300357,ir-shir1,iris-ip00+,vidSingFr+,cclake,2022-02-16T06:09:35,0:0,TIMEOUT,672.4
55303059_0,ir-shir1,iris-ip00+,VIDproces+,cclake,2022-02-16T11:09:47,0:0,CANCELLED+,15.9
55318689,ir-shir1,iris-ip00+,vidSingFr+,cclake,2022-02-17T00:27:42,0:0,TIMEOUT,672.3


In [46]:
def nameToJobType(name):
    """Take the name and return the tipe of pipetask"""
    job_type = 'UNKNOWN'
    if name.startswith('process'):
        job_type = 'processCcd'
    if name.startswith('coadd'):
        job_type = 'coadd'
    if name.startswith('phot'):
        job_type = 'photoPipe'
    return job_type
t['job_type']  = [nameToJobType(n) for n in t['JobName']]

In [47]:
t[:5]

JobID,User,Account,JobName,Partition,End,ExitCode,State,CompHrs,job_type
str12,str8,str10,str10,str10,str19,str5,str10,float64,str10
25933230,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-13T16:18:36,127:0,FAILED,0.1,UNKNOWN
25940256,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-14T00:08:41,1:0,FAILED,74.2,UNKNOWN
25948528,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-14T14:34:59,0:0,COMPLETED,254.3,UNKNOWN
26005193,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-16T03:59:49,0:0,TIMEOUT,640.0,UNKNOWN
26027949,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-17T10:46:53,0:0,TIMEOUT,1536.4,UNKNOWN


In [48]:
print("""processCcd jobs run on stack images in SXDS

Total number of jobs: {}
Jobs completed: {}
mean per job: {} cpu hours
mean per completed job: {} cpu hours
Total time: {} cpu hours
Total time on completed jobs: {} cpu hours
""".format(
np.sum(t['job_type'] == 'processCcd'),
np.sum((t['job_type'] == 'processCcd') &(t['State'] == 'COMPLETED') ),
np.mean(t['CompHrs'][(t['job_type'] == 'processCcd') ]),
np.mean(t['CompHrs'][(t['job_type'] == 'processCcd') &(t['State'] == 'COMPLETED') ]),
np.sum(t['CompHrs'][(t['job_type'] == 'processCcd') ]),
np.sum(t['CompHrs'][(t['job_type'] == 'processCcd') &(t['State'] == 'COMPLETED') ])
)
)


processCcd jobs run on stack images in SXDS

Total number of jobs: 19109
Jobs completed: 12993
mean per job: 2.3228531058663453 cpu hours
mean per completed job: 3.059601323789733 cpu hours
Total time: 44387.399999999994 cpu hours
Total time on completed jobs: 39753.4 cpu hours



In [49]:
print("""coadd jobs run on stack images in SXDS

Total number of jobs: {}
Jobs completed: {}
mean per job: {} cpu hours
mean per completed job: {} cpu hours
Total time: {} cpu hours
Total time on completed jobs: {} cpu hours
""".format(
np.sum(t['job_type'] == 'coadd'),
np.sum((t['job_type'] == 'coadd') &(t['State'] == 'COMPLETED') ),
np.mean(t['CompHrs'][(t['job_type'] == 'coadd') ]),
np.mean(t['CompHrs'][(t['job_type'] == 'coadd') &(t['State'] == 'COMPLETED') ]),
np.sum(t['CompHrs'][(t['job_type'] == 'coadd') ]),
np.sum(t['CompHrs'][(t['job_type'] == 'coadd') &(t['State'] == 'COMPLETED') ])
)
)

coadd jobs run on stack images in SXDS

Total number of jobs: 353
Jobs completed: 170
mean per job: 15.865439093484417 cpu hours
mean per completed job: 32.858823529411765 cpu hours
Total time: 5600.499999999999 cpu hours
Total time on completed jobs: 5586.0 cpu hours



In [None]:
#It is worrying that these times are close to the 36 hour maximum. Can I split them up?

In [50]:
#Most of these are failures from memory shortages at teh coadd stage before I separated them
print("""photoPipe jobs run on stack images in SXDS

Total number of jobs: {}
Jobs completed: {}
mean per job: {} cpu hours
mean per completed job: {} cpu hours
Total time: {} cpu hours
Total time on completed jobs: {} cpu hours
""".format(
np.sum(t['job_type'] == 'photoPipe'),
np.sum((t['job_type'] == 'photoPipe') &(t['State'] == 'COMPLETED') ),
np.mean(t['CompHrs'][(t['job_type'] == 'photoPipe') ]),
np.mean(t['CompHrs'][(t['job_type'] == 'photoPipe') &(t['State'] == 'COMPLETED') ]),
np.sum(t['CompHrs'][(t['job_type'] == 'photoPipe') ]),
np.sum(t['CompHrs'][(t['job_type'] == 'photoPipe') &(t['State'] == 'COMPLETED') ])
)
)

photoPipe jobs run on stack images in SXDS

Total number of jobs: 352
Jobs completed: 55
mean per job: 8.338068181818182 cpu hours
mean per completed job: 11.976363636363633 cpu hours
Total time: 2935.0 cpu hours
Total time on completed jobs: 658.6999999999998 cpu hours



In [59]:
np.max(t['CompHrs'][(t['job_type'] == 'photoPipe') &(t['State'] == 'COMPLETED') ])

23.7

In [55]:
t[(t['job_type']=='coadd') & (t['State'] == 'FAILED')][:5]

JobID,User,Account,JobName,Partition,End,ExitCode,State,CompHrs,job_type
str12,str8,str10,str10,str10,str19,str5,str10,float64,str10
30458171_0,ir-shir1,iris-ip00+,coadd_arr+,skylake-h+,2020-10-28T14:51:01,2:0,FAILED,0.1,coadd
30458171_1,ir-shir1,iris-ip00+,coadd_arr+,skylake-h+,2020-10-28T14:50:49,1:0,FAILED,0.1,coadd
30458171_2,ir-shir1,iris-ip00+,coadd_arr+,skylake-h+,2020-10-28T14:51:02,2:0,FAILED,0.1,coadd
30458171_3,ir-shir1,iris-ip00+,coadd_arr+,skylake-h+,2020-10-28T14:51:02,2:0,FAILED,0.1,coadd
30458171_4,ir-shir1,iris-ip00+,coadd_arr+,skylake-h+,2020-10-28T14:51:00,2:0,FAILED,0.1,coadd


In [56]:
np.unique(t[(t['job_type']=='coadd') & (t['State'] == 'FAILED')]['ExitCode'])

0
127:0
1:0
2:0


In [53]:
t[(t['job_type']=='coadd') ][:5]

JobID,User,Account,JobName,Partition,End,ExitCode,State,CompHrs,job_type
str12,str8,str10,str10,str10,str19,str5,str10,float64,str10
30127250,ir-shir1,iris-ip00+,coadd_852+,skylake-h+,2020-10-19T20:10:53,0:0,COMPLETED,42.0,coadd
30458171_0,ir-shir1,iris-ip00+,coadd_arr+,skylake-h+,2020-10-28T14:51:01,2:0,FAILED,0.1,coadd
30458171_1,ir-shir1,iris-ip00+,coadd_arr+,skylake-h+,2020-10-28T14:50:49,1:0,FAILED,0.1,coadd
30458171_2,ir-shir1,iris-ip00+,coadd_arr+,skylake-h+,2020-10-28T14:51:02,2:0,FAILED,0.1,coadd
30458171_3,ir-shir1,iris-ip00+,coadd_arr+,skylake-h+,2020-10-28T14:51:02,2:0,FAILED,0.1,coadd


# 2 Calculate total times

Lets calculate some broad estimates for the main runs we will go on to perform

## 2.1 SXDS VIDEO run

This is the run used for the timing tests.

In [63]:
n_video_sxds_images = 5263 # From ./1_SLurm_factory.ipynb
n_video_sxds_patches = 219 # From ./1_SLurm_factory.ipynb

mean_processCcd = np.mean(t['CompHrs'][(t['job_type'] == 'processCcd') &(t['State'] == 'COMPLETED') ])
mean_coadd= np.mean(t['CompHrs'][(t['job_type'] == 'coadd') &(t['State'] == 'COMPLETED') ])
mean_photo = np.mean(t['CompHrs'][(t['job_type'] == 'photoPipe') &(t['State'] == 'COMPLETED') ])

print("""
Full processing of input images: {} cpu hours
Full coadd of all patches: {} cpu hours
Full photometry pipeline on all patches: {} cpu hours
Total time for {} images and {} patches: {} cpu hours
""".format(
    round(n_video_sxds_images * mean_processCcd),
    round(n_video_sxds_patches* mean_coadd),
    round(n_video_sxds_patches* mean_photo),
    n_video_sxds_images, n_video_sxds_patches,
    round(
        n_video_sxds_images * mean_processCcd 
        + n_video_sxds_patches* mean_coadd
        + n_video_sxds_patches* mean_photo
    )
))
    


Full processing of input images: 16103.0 cpu hours
Full coadd of all patches: 7196.0 cpu hours
Full photometry pipeline on all patches: 2623.0 cpu hours
Total time for 5263 images and 219 patches: 25922.0 cpu hours



## 2.2 VHS XMM run

Run over full overlap of VHS and HSC PDR2 XMM field

In [64]:
n_vhs_xmm_images = 2226 # From ../dmu4_XMM/1_Slurm_factory.ipynb
n_vhs_xmm_patches = 6*9*81 #Rough from tract numbers

mean_processCcd = np.mean(t['CompHrs'][(t['job_type'] == 'processCcd') &(t['State'] == 'COMPLETED') ])
mean_coadd= np.mean(t['CompHrs'][(t['job_type'] == 'coadd') &(t['State'] == 'COMPLETED') ])
mean_photo = np.mean(t['CompHrs'][(t['job_type'] == 'photoPipe') &(t['State'] == 'COMPLETED') ])

print("""
Full processing of input images: {} cpu hours
Full coadd of all patches: {} cpu hours
Full photometry pipeline on all patches: {} cpu hours
Total time for {} images and {} patches: {} cpu hours
""".format(
    round(n_vhs_xmm_images * mean_processCcd),
    round(n_vhs_xmm_patches* mean_coadd),
    round(n_vhs_xmm_patches* mean_photo),
    n_vhs_xmm_images, n_vhs_xmm_patches,
    round(
        n_vhs_xmm_images * mean_processCcd 
        + n_vhs_xmm_patches* mean_coadd
        + n_vhs_xmm_patches* mean_photo
    )
))


Full processing of input images: 6811.0 cpu hours
Full coadd of all patches: 143724.0 cpu hours
Full photometry pipeline on all patches: 52385.0 cpu hours
Total time for 2226 images and 4374 patches: 202920.0 cpu hours



## 2.3 VHS complete run

A first run might not include combination with GRIZY data prior to LSST but we can can simply use the HSC/VISTA SXDS times to estimate here.

In [68]:
n_vhs_xmm_images = 204996 # From ../dmu1/data/vhs_images_overview_$DATE.fits
n_vhs_xmm_patches =  670137# From ../dmu1/data/vhs_tiles_tracts_patches.fits

mean_processCcd = np.mean(t['CompHrs'][(t['job_type'] == 'processCcd') &(t['State'] == 'COMPLETED') ])
mean_coadd= np.mean(t['CompHrs'][(t['job_type'] == 'coadd') &(t['State'] == 'COMPLETED') ])
mean_photo = np.mean(t['CompHrs'][(t['job_type'] == 'photoPipe') &(t['State'] == 'COMPLETED') ])

print("""
Full processing of input images: {} cpu hours
Full coadd of all patches: {} cpu hours
Full photometry pipeline on all patches: {} cpu hours
Total time for {} images and {} patches: {} cpu hours
""".format(
    round(n_vhs_xmm_images * mean_processCcd),
    round(n_vhs_xmm_patches* mean_coadd /6), #assume 6 times fewer images to coadd
    round(n_vhs_xmm_patches* mean_photo ),  #assume JHK and LSST UGRIZY
    n_vhs_xmm_images, n_vhs_xmm_patches,
    round(
        n_vhs_xmm_images * mean_processCcd 
        + n_vhs_xmm_patches* mean_coadd/6
        + n_vhs_xmm_patches* mean_photo
    )
))


Full processing of input images: 627206.0 cpu hours
Full coadd of all patches: 3669986.0 cpu hours
Full photometry pipeline on all patches: 8025804.0 cpu hours
Total time for 204996 images and 670137 patches: 12322996.0 cpu hours

