# LSST DRP Computational Requirements

Created September 2015 to satisfy the requirements of the DRP Computational Budget key performance metric (DR1; see DLP-314, DM-3083, DM-3897). See also LDM-138, LSE-81. This is based on a Python script provided by K-T Lim.

In [2]:
# Disable ShimWarnings caused by AstroPy assuming an older version of IPython
from IPython.utils.shimmodule import ShimWarning
import warnings
warnings.simplefilter("ignore", category=ShimWarning)
from astropy import units as u
from math import pi

We ultimately need to end up with a value in TFLOPS (ie, Tera FLating point Operations Per Second). We make measurements of execution time on particular computers, of known clock speed (cycles per second), and convert them to FLOPS as required.

In [3]:
flop = u.def_unit('FLOP') # FLoating point OPeration
cycle = u.def_unit('cycle', 4*0.68*flop) # Conversion rate based on average efficiency of 2011 TOP500
tflops = u.def_unit('TFLOPS', (flop*10**12)/(1*u.second))

The `Telescope` class describes the instrument we are using to observe. We have only one: LSST itself.

In [4]:
class Telescope(object):
    def __init__(self, nCcds, nFilters, fieldOfView):
        self.nCcds = nCcds
        self.nFilters = nFilters
        self.fieldOfView = fieldOfView
    
lsst = Telescope(189, 6, pi*(1.75*u.degree)**2)

The `Survey` class describes the survey being carried out -- including duration, number of objects observerd, etc. For this exercise, we are using the six months of the LSST wide & fast survey.

In [5]:
class Survey(object):
    def __init__(self, duration, visits, epochs, fields, stars, surveyStars, forcedSources, computeTime):
        self.duration = duration            # Survey duration
        self.nVisits = visits               # Total number of visits in the survey
        self.nEpochs = epochs               # Number of visits to any given object
        self.nFields = fields               # Survey area expressed in terms of fields-of-view
        self.nStars = stars                 # Total number of stars
        self.nOutOfPlaneStars = surveyStars # Stars observed out of the galactic plane
        self.nForcedSources = forcedSources # Total number of forced source measurements
        self.computeTime = computeTime

# Back of envelope calculation of number of fields; prefer Science Requirements
#surveyArea = 2*pi*u.steradian
#fields = surveyArea / lsst.fieldOfView

lsstWideAndFastDR1 = Survey(0.5*u.year,
                            2750000 / 20, # OpSim + 10% margin
                            1056 / 20,
                            2604,         # Science Requirements
                            14706186098,  # LSE-81 D56
                            5141033287,   # LSE-81 E56
                            1.19507E+12,  # LSE-81 H56  
                            9*u.week)

The `Computer` class describes the various computers we have used to make measurements in terms of their clock speed.

In [12]:
class Computer(object):
    def __init__(self, clockSpeed):
        self.clockSpeed = clockSpeed

abe = Computer(2.33e9 * cycle / u.second)      # Machine used for PT 1.1 processing
lsst_dev = Computer(3.1e9 * cycle / u.second)  # lsst-dev.ncsa.illinois.edu
tiger3 = Computer(2.8e9 * cycle / u.second)    # tiger3.princeton.edu
ksk_mbp = Computer(3e9 * cycle / u.second)     # Macbook Pro used for difference imaging tests at Bremerton

Next, we define the scientific and algorithmic approaches being taken.

In [13]:
class CoaddParameters(object):
    def __init__(self, pixelDensityFactor, imageReuse, deblendFactor):
        # Increased pixel density factor for co-adds
        self.pixelDensityFactor = pixelDensityFactor
        # Number of times each image is reused in making deep coadds
        self.imageReuse = imageReuse
        # Deblend factor: extra coadd measurement time due to deblending
        self.deblendFactor = deblendFactor
        
coaddParams = CoaddParameters(1,   # LSE-81 G157 suggests 2; Jim Bosch suggests this is outdated
                              4,   # Per LSE-81 G229
                              0.1) # LDM-138 otherInput_2 B21; no obvious justification

class TemplateParameters(object):
    def __init__(self, templateTypes, templateVersions, templateDepth):
        # Number of different types of templates to be prepared
        self.nTypes = templateTypes
        # Different versions corresponding to e.g. differing airmass
        self.nVersions = templateVersions
        # Number of images being coadded for each template
        self.depth = templateDepth
        
templateParams = TemplateParameters(6, # One per filter
                                    3, # Per LSE-81 G166
                                    5) # Per LSE-81 G228

## Stack processing

We are now ready to quantify the various processing stages the stack goes through. Here, following LDM-138, we include difference imaging, generation of coadds (both deep coadds for measurement and templates for differencing), detection and measurement of sources and coadds, and multi-epoch object characterization. We do *not* include single frame measurement or astro- or photo-metric relative calibration. (Presumably because they don't run at the same time on the same hardware, but it's not completely clear why.)

### Generate difference images

According to [DM-3274](https://jira.lsstcorp.org/browse/DM-3274), difference imaging (with no associated source measurement) was benchmarked as taking 35 seconds per CCD on 2k by 4k CCDs when run on `lsst-dev`. We expect this to scale linearly with pixel count to the 4k by 4k CCDs used by LSST since it is involves applying a kernel to each pixel. We therefore estimate the total time as 70 seconds per CCD.

In [19]:
def calcDiffimCycles(timePerCcd, telescope, survey, computer):
    return timePerCcd * telescope.nCcds * survey.nVisits * computer.clockSpeed

diffimCycles = calcDiffimCycles(70*u.second, lsst, lsstWideAndFastDR1, lsst_dev)
diffimCycles

<Quantity 5.6392875e+18 cycle>

### Constructing coadds

We construct two types of coadds: templates for difference imaging and deep coadds for measurement. The time to warp-and-add a CCD image to the coadd is the same for both, but the number and depth of the coadds created are different.

The warp-and-add step is contained within the `makeCoaddTempExp` and `assembleCoadd` tasks. We it using HSC engineering data on `tiger3`. Each coadd consisted of 60 calibrated exposures (`calexp`s) from HSC, each 2k by 4k pixels. We assume that the time scales linear with number of pixels to LSST's 4k by 4k CCDs.

In [42]:
nCalexps, hscToLsstScale = 60, 2
makeTempExpTime = u.second * ((190.72 - 10.17) + (189.47 - 10.24))/2 # Measurement repeated to check for consistency
assembleCoaddTime = u.second * ((49.31 - 10.24) + (49.22 - 9.73))/2
coaddCcdTime = hscToLsstScale * (makeTempExpTime + assembleCoaddTime) / nCalexps
coaddCcdTime

<Quantity 7.305666666666666 s>

However, this may not be a reliable estimate of the coaddition time in practice. Previous versions of this calculation have assumed that coaddition was similar to differencing imaging ("warp + arithmetic") and, as per above, the difference imaging calculation is a factor of ~10 larger. This is likely because differencing also involves PSF matching, an operation which may dominate the run time, and which is required for some times of coaddition but which is not included in the figure above.

### Generate difference templates

In [43]:
def calcTemplateCoadditionCycles(timePerCcd, coaddParams, templateParams,
                                 telescope, survey, computer):
    return (timePerCcd * telescope.nCcds * coaddParams.pixelDensityFactor * 
            survey.nFields * templateParams.nTypes * templateParams.nVersions *
            templateParams.depth * computer.clockSpeed)
    
templateCycles = calcTemplateCoadditionCycles(coaddCcdTime, coaddParams, templateParams,
                                              lsst, lsstWideAndFastDR1, tiger3)
templateCycles

<Quantity 9.060729763679999e+17 cycle>

### Generate deep coadds

The time taken to coadd CCDs was as the combined runtime of `makeCoaddTempExp` and `assembleCoadd` run on HSC engineering data on `tiger3.princeton.edu` using version `v11_0_rc2` of the stack.

In [44]:
def calcDeepCoadditionCycles(timePerCcd, coaddParams, telescope, survey, computer):
    # Note that each time an image is reused, it will need to be re-warped, so this effect is multiplicative
    return timePerCcd * telescope.nCcds * computer.clockSpeed * coaddParams.imageReuse * survey.nVisits

deepCoaddCycles = calcDeepCoadditionCycles(coaddCcdTime, coaddParams, lsst, lsstWideAndFastDR1, abe)
deepCoaddCycles

<Quantity 1.7694580365e+18 cycle>

### Coadd source detection and measurement

Sources will be detected and measured on all coadds generated in the previous step. This will populate the catalog of `CoaddSources`. The detection and measurement are two separate steps, but we can use the same basic logic (with different timing) for both.

In [34]:
def calcCoaddProcessCycles(timePerCcd, coaddParams, telescope, survey, computer):
    perCoadd = (timePerCcd * telescope.nCcds * computer.clockSpeed * coaddParams.pixelDensityFactor *
                (coaddParams.deblendFactor + 1))
    coaddsPerField = telescope.nFilters + 1 # One per filter, plus combined
    return perCoadd * coaddsPerField * survey.nFields

coaddSrcDetCycles = calcCoaddProcessCycles(6.19*u.second, coaddParams, lsst, lsstWideAndFastDR1, abe)
coaddSrcMeasCycles = calcCoaddProcessCycles(10.50*u.second, coaddParams, lsst, lsstWideAndFastDR1, abe)
coaddProcessCycles = coaddSrcDetCycles + coaddSrcMeasCycles
coaddProcessCycles

<Quantity 1.4736887458524e+17 cycle>

### Multi-epoch object characterization

We don't have a great way to quantify multifit performance. However, the combination of multifit and forced source processing will be roughly equivalent to measuring every non-galactic-plane source in every epoch. Note that we are relying on the coadd measurements for in-plane sources.

In [35]:
def calcMultiEpochCycles(timePerObject, survey, computer):
    galacticPlaneObjects = survey.nStars - survey.nOutOfPlaneStars
    galacticPlaneForcedSrcs = galacticPlaneObjects * survey.nEpochs
    outOfPlaneForcedSrcs = survey.nForcedSources - galacticPlaneForcedSrcs
    return outOfPlaneForcedSrcs * timePerObject * computer.clockSpeed

multiEpochCycles = calcMultiEpochCycles(0.096*u.second, lsstWideAndFastDR1, lsst_dev)
multiEpochCycles

<Quantity 2.053529076379699e+20 cycle>

### Alert SQDA

SDQA Pipeline shall provide low-level data collection functionality for science data quality analysis of Level 1, 2, and Calibration Processing pipelines. It has been prototyped as `pipeQA`, and it's that that these results are based on.

In [36]:
def calcAlertSDQA(timePerVisit, survey, computer):
    return timePerVisit * survey.nVisits * computer.clockSpeed

sdqaCycles = calcAlertSDQA(600*u.second, lsstWideAndFastDR1, abe)
sdqaCycles

<Quantity 1.92225e+17 cycle>

## Totals

In [37]:
totalCycles = diffimCycles + templateCycles + deepCoaddCycles + coaddProcessCycles + multiEpochCycles + sdqaCycles
totalFlop = totalCycles.to(flop)
totalFlop

<Quantity 5.82099910469151e+20 FLOP>

The required FLOPS of the compute hardware depends on the duration of the DRP run. The final total is:

In [38]:
(totalFlop / lsstWideAndFastDR1.computeTime).to(tflops)

<Quantity 106.94075368701336 TFLOPS>

Excluding any potential future algorithmic efficiency improvements.