# Timing the SXDS run

In this notebook we will investigate timing and requirements for the SXDS prototype run in order to estimate total computing requirements

we wil work with the file generated on iris by the following command:

```
gstatement -p IRIS-IP005-CPU -u ir-shir1 -s "2020-07-04-00:00:00" -e "2020-10-21-00:00:00" > jobs.lis
```

In [8]:
from astropy.table import Table
import numpy as np

In [5]:
t = Table.read('jobs.lis', format='ascii') #, data_start=2, delimiter=' ')

In [10]:
"I have submitted a total of {} jobs consuming a total of {} cpuhours".format(len(t), np.sum(t['CompHrs']))

'I have submitted a total of 19710 jobs consuming a total of 44065.6 cpuhours'

In [6]:
t[:5]

JobID,User,Account,JobName,Partition,End,ExitCode,State,CompHrs
str12,str8,str10,str10,str10,str19,str5,str10,float64
25933230,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-13T16:18:36,127:0,FAILED,0.1
25940256,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-14T00:08:41,1:0,FAILED,74.2
25948528,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-14T14:34:59,0:0,COMPLETED,254.3
26005193,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-16T03:59:49,0:0,TIMEOUT,640.0
26027949,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-17T10:46:53,0:0,TIMEOUT,1536.4


In [13]:
t[-5:]

JobID,User,Account,JobName,Partition,End,ExitCode,State,CompHrs
str12,str8,str10,str10,str10,str19,str5,str10,float64
30022262,ir-shir1,iris-ip00+,runPhotoP+,skylake-h+,2020-10-14T20:06:37,0:125,OUT_OF_ME+,1.2
30025206,ir-shir1,iris-ip00+,runPhotoP+,skylake-h+,2020-10-14T22:55:59,0:125,OUT_OF_ME+,5.5
30032850,ir-shir1,iris-ip00+,runPhotoP+,skylake-h+,2020-10-15T06:16:53,1:0,FAILED,36.3
30050650,ir-shir1,iris-ip00+,runPhotoP+,skylake-h+,2020-10-16T18:13:14,0:0,COMPLETED,149.4
30127250,ir-shir1,iris-ip00+,coadd_852+,skylake-h+,2020-10-19T20:10:53,0:0,COMPLETED,42.0


In [14]:
def nameToJobType(name):
    """Take the name and return the tipe of pipetask"""
    job_type = 'UNKNOWN'
    if name.startswith('process'):
        job_type = 'processCcd'
    if name.startswith('coadd'):
        job_type = 'coadd'
    if name.startswith('runPhoto'):
        job_type = 'photoPipe'
    return job_type
t['job_type']  = [nameToJobType(n) for n in t['JobName']]

In [16]:
t[:5]

JobID,User,Account,JobName,Partition,End,ExitCode,State,CompHrs,job_type
str12,str8,str10,str10,str10,str19,str5,str10,float64,str10
25933230,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-13T16:18:36,127:0,FAILED,0.1,UNKNOWN
25940256,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-14T00:08:41,1:0,FAILED,74.2,UNKNOWN
25948528,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-14T14:34:59,0:0,COMPLETED,254.3,UNKNOWN
26005193,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-16T03:59:49,0:0,TIMEOUT,640.0,UNKNOWN
26027949,ir-shir1,iris-ip00+,cpujob,skylake,2020-07-17T10:46:53,0:0,TIMEOUT,1536.4,UNKNOWN


In [31]:
print("""processCcd jobs run on stack images in SXDS

Total number of jobs: {}
Jobs completed: {}
mean per job: {} cpu hours
mean per completed job: {} cpu hours
Total time: {} cpu hours
Total time on completed jobs: {} cpu hours
""".format(
np.sum(t['job_type'] == 'processCcd'),
np.sum((t['job_type'] == 'processCcd') &(t['State'] == 'COMPLETED') ),
np.mean(t['CompHrs'][(t['job_type'] == 'processCcd') ]),
np.mean(t['CompHrs'][(t['job_type'] == 'processCcd') &(t['State'] == 'COMPLETED') ]),
np.sum(t['CompHrs'][(t['job_type'] == 'processCcd') ]),
np.sum(t['CompHrs'][(t['job_type'] == 'processCcd') &(t['State'] == 'COMPLETED') ])
)
)


processCcd jobs run on stack images in SXDS

Total number of jobs: 13845
Jobs completed: 8288
mean per job: 1.3292379920548933 cpu hours
mean per completed job: 1.663863416988417 cpu hours
Total time: 18403.3 cpu hours
Total time on completed jobs: 13790.1 cpu hours



In [32]:
print("""coadd jobs run on stack images in SXDS

Total number of jobs: {}
Jobs completed: {}
mean per job: {} cpu hours
mean per completed job: {} cpu hours
Total time: {} cpu hours
Total time on completed jobs: {} cpu hours
""".format(
np.sum(t['job_type'] == 'coadd'),
np.sum((t['job_type'] == 'coadd') &(t['State'] == 'COMPLETED') ),
np.mean(t['CompHrs'][(t['job_type'] == 'coadd') ]),
np.mean(t['CompHrs'][(t['job_type'] == 'coadd') &(t['State'] == 'COMPLETED') ]),
np.sum(t['CompHrs'][(t['job_type'] == 'coadd') ]),
np.sum(t['CompHrs'][(t['job_type'] == 'coadd') &(t['State'] == 'COMPLETED') ])
)
)

coadd jobs run on stack images in SXDS

Total number of jobs: 1
Jobs completed: 1
mean per job: 42.0 cpu hours
mean per completed job: 42.0 cpu hours
Total time: 42.0 cpu hours
Total time on completed jobs: 42.0 cpu hours



In [35]:
#Most of these are failures from memory shortages at teh coadd stage before I separated them
print("""photoPipe jobs run on stack images in SXDS

Total number of jobs: {}
Jobs completed: {}
mean per job: {} cpu hours
mean per completed job: {} cpu hours
Total time: {} cpu hours
Total time on completed jobs: {} cpu hours
""".format(
np.sum(t['job_type'] == 'photoPipe'),
np.sum((t['job_type'] == 'photoPipe') &(t['State'] == 'COMPLETED') ),
np.mean(t['CompHrs'][(t['job_type'] == 'photoPipe') ]),
np.mean(t['CompHrs'][(t['job_type'] == 'photoPipe') &(t['State'] == 'COMPLETED') ]),
np.sum(t['CompHrs'][(t['job_type'] == 'photoPipe') ]),
np.sum(t['CompHrs'][(t['job_type'] == 'photoPipe') &(t['State'] == 'COMPLETED') ])
)
)

photoPipe jobs run on stack images in SXDS

Total number of jobs: 5818
Jobs completed: 9
mean per job: 1.1659676864902029 cpu hours
mean per completed job: 19.27777777777778 cpu hours
Total time: 6783.6 cpu hours
Total time on completed jobs: 173.5 cpu hours

