/
body.tex
351 lines (288 loc) · 16.1 KB
/
body.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
\section{Introduction}
As LSST DM moves from construction through commissioning and into operations
a number of rehearsals have been proposed to help prepare for the execution
of the survey. Specific rehearsals are outlined in \citedsp{LDM-503} but in
the more comprehensive cases, i.e. the Operations (Ops) Rehearsals (LDM 503-09, LDM 503-11, and
LDM 503-12), the contents of those document alone, do not sufficiently outline
the scope, content, action and interaction that are being rehearsed.
From the software side, \citedsp{LDM-564}
summarizes the DM software features that should be available and helpfully
identifies those software releases in the context of the rehearsals.
However, the Ops Rehearsals are not simply periods to test hardware and
software systems, they are opportunities to develop and understand operations
processes and to observe the interactions of hardware, software and personnel.
This document attempts to outline the Ops Rehearsals in greater detail for the following reasons:
\begin{itemize}
\item Depending on the impact, missing or late software features and hardware systems may either
require mitigation (e.g., shims, fake data, etc...) or might even be grounds for
postponement.
\item The purpose of an Ops Rehearsal is not to debug
a freshly deployed system, but rather to understand whether that system does
what is needed.
\item The effort to carry out rehearsals will
require coordination of personnel and facilities.
\end{itemize}
Note: This remains a work in progress. The current draft attempts to describe the
process for the first Ops rehearsal. The subsequent rehearsals have only been
outline to the extent to properly understand their scope.
\clearpage
\section{LDM 503-09: Operations Rehearsal \#1 for Commissioning}
\underline{Nominal Date:} April 2019
\underline{Original Description:}
\begin{itemize}[topsep=-8pt]
\item Choose TBD weeks during commissioning to simulate.
\item Pick which parts of plan we could rehearse.
\item The commissioning team (via Chuck Claver) suggests Instrument Signal Removal should be the focus
of this (or the next rehearsal).
\end{itemize}
\subsection{An Updated Goal:}
The primary goal is to simulate nominal operations, both daytime and nighttime, for a 2--3 day
period including the daily meeting(s) that would occur among the SciOps and
Data Facility staff. These activities will be accompanied by simulated
observations obtained in a ``sampling'' mode in order to exercise:
\begin{enumerate}[topsep=-8pt]
\item the transfer, archiving and ingestion of raw data
\item offline processing of calibrations and science data
\item curation of the resulting data products
\end{enumerate}
At the time of this initial rehearsal, we do not expect a functioning
observatory system, instead:
\begin{itemize}[topsep=-8pt]
\item Sampling mode has been used to describe early LSST commissioning
observations where observations occur based on the needs of the commissioning
team. Such observations would typically include some basic set of calibrations
(e.g. biases and flats) followed by nighttime observations that might be used to
test and quantify system performance.
\item A basic set of raw data will be transferred to a mountaintop computer
which will then in turn mimic observations by sending those images from the
summit to NCSA via the long-haul networks.
\item The contents of the dataset will be minimal (no larger than the
calibration and nightly observations that might be expected from ComCam).
The current plan is to draw from a suitable set of test-stand data and
then present these as though they were coming from the telescope.
\item On arrival at the LDF the observations will be ingested into the current
data-backbone which can in turn be used to feed the data through a batch
production service to produce ``calibrations'' and ``reduced science products.''
\item In the context of this rehearsal, the sophistication (or correctness)
of the pipeline(s) is not paramount. What is important is that the raw and
resulting products are tracked and can be superficially examined by LDF and
SciOps team members. The degree of realism would depend on both the data
being sent and availability of working pipeline tasks.
\end{itemize}
\subsection{Pre-Requisites:}
There are three broad categories of pre-requisites that are needed:
\begin{enumerate}[topsep=-8pt]
\item Persons must be identified to fill roles within the rehearsals.
\item Services (or facsimiles) need to exist that will be used/tested throughout the
rehearsal.
\item Elements that would not otherwise
be available in the pre-operations LSST project will be prepared.
\end{enumerate}
\subsubsection{Pre-Requisites: Roles:}
Persons need to be identified that will staff various roles in this
rehearsal. These roles are drawn from those in operations which come
from three groups: Observatory Operations (ObsOps), Science Operations
(SciOps), and the LSST Data Facility (LDF).
\begin{itemize}[topsep=-8pt]
\item Coordinator: This person acts as an independent executor of the
rehearsal. They would be responsible for executing outside actions that drive the
simulation (e.g., initiating a script that would start data flowing from
summit to LDF).
\item ObsOps, Observing Specialist
\item SciOps, QA Scientist
\item SciOps, Verification and Validation Scientist
\item LDF, Operator
\item LDF, QA Scientist
\item LDF, Admin
\end{itemize}
\subsubsection{Pre-Requisites: Services \& Service Components:}
The broader pieces of the DM system that need to operate for this rehearsal are:
\begin{itemize}[topsep=-8pt]
\item A service must operate at the mountaintop that will send data. This can
be as simple as a shell script that draws from a list of files and transfers
them to NCSA with some cadence.
\item Nominally the long-haul networks need to be available at the time of this rehearsal.
(Note: at the nominal time of this rehearsal we can only expect transfer rates (BASE to LDF)
of order 10 MB/s. Therefore, $\sim$500 raft-scale images should require $\sim$8~hrs. In
addition, outages due to movement of equipment at the base may occur. A copy of test data
should be kept at LDF to mitigate data transfer problems/outages during the rehearsal.)
\item A data backbone endpoint to receive and ingest incoming files must exist.
\item A mechanism must exist to distribute jobs to a compute resource
to process the "new" data--Batch Production.
\item A workflow system to configure and launch jobs must exist.
\item Pipeline(s) to processes the data must be in place.
\item A minimally functional science platform where raw and processed data products can be
examined by staff must exist.
\end{itemize}
Additional monitoring services for the networks, batch production, compute
resource(s), and data backbone are desirable.
\subsubsection{Pre-Requisites: Work:}\label{prework}
Work that must be completed prior to Ops Rehearsal but which is not part of
the DM development:
\begin{itemize}[topsep=-8pt]
\item Generate a mock data set. This must have the ability to be ingested with
either Gen2 or Gen3 Butler. It is not necessary that the generated data
products be curated for a long period.
\item Create a shim service that sends data from summit to LDF.
\item Specify appropriate pipeline(s) that will be run during the rehearsal.
\item Test that services in the preceding section can adequately function
for the purposes of this rehearsal.
\item Allocate compute and storage resources and specify location of stored
products.
\item Location to record information about incidents, problem backlog,
processing and QA summaries (for the initial test this could be as simple a
set of confluence pages).
\end{itemize}
\subsection{Rehearsal Outline:}
During normal operations the time observing occurs depends on local nighttime
in Chile. This is not necessary for the rehearsal and so data delivery and
can be shifted to occur in a normal working day. Prior to the execution of
the rehearsal the work outlined in Section~\ref{prework} must be completed
and tested.
%%\item Pre-checklist: Assemble proto-ops team, all component services
%%from DM are ready with payloads, data sets, configurations, etc.
%%(assumes pre-integration work).
A basic outline of the processes that would occur during this rehearsal
follows:
\begin{enumerate}[topsep=-8pt]
\item (ALL: ObsOps+SciOps+LDF) afternoon stand-up operations meeting
\item (ObsOps) mock transmit nightly calibration exposures to LDF for ingestion
\item (LDF) generate nightly master calibrations
\item (SciOps) select configuration and calibrations
\item (ObsOps) mock transmit nightly science images and ingest
\item (LDF) run science pipeline (.e.g. ISR) in offline/batch mode
\item (LDF) generate processing reports for discussion in stand-ups
\item (SciOps) examine input and output data from nightly observations and
processing
\item (SciOps) generate quality reports for discussion in stand-ups
\item (ALL) monitor progress of nightly “campaigns,” characterize and assess,
make records of failures, diagnose issues, generate problem backlog
\item (ALL) create mock nightly reports
\item repeat (a total of 3 times)
\end{enumerate}
While there are "realistic" components within the outline, much of focus by
the actors should be on the processes. How is this going to look in
day-to-day operations? If there are problems, what happens? Who gets a
call and when? What information needs to be available between a geographically
distributed team (and when)? Are the lines of communication between those
groups sufficient?
%\clearpage
%\subsection{Software Products and Services Needed:}
%
%Based on the actions being undertaken in this rehearsal the following services are needed:
%a shim for Camera DAQ and Archiving Services, Data Backbone Services (w/ minimal ability to make data visible to LSP),
%and Batch Production Services with appropriate Pipelines.
%These services are implemented within the following sofware products:
%\begin{itemize}
%\item Batch Production Software (Michelle Butler)
%\item Science Pipeline Software (Robert Lupton)
%\item Supporting Software (e.g., Data Butler): (Jim Bosch)
%\end{itemize}
%Since many of these are in a nascent state, often a shim (or some user-driven actions)
%may be needed to emulate some elements.
%
%\clearpage
\subsection{Assess:}
Among the activities in the Rehearsal Outline it is expected that some might influence the long-term development
within DM. An example is exercising tools and services (e.g. the LSP) with a mind toward operational needs. Another
example is to inform the processes and metrics needed to make decisions about configuration and calibration selection in the context of both production success and production failure.
Example questions that can be asked during the assessment phase are:
\begin{itemize}[topsep=-8pt]
\item Was the rehearsal successful? How long did it take? What anomalies/failure modes were identified, and how did the team cope?
\item What fixes are needed, and on what timescale (e.g., next ops rehearsal, or we are go for commissioning)?
\item What improvements in procedures, documentation, frameworks, systems, and algorithms were identified and at what priority?
\item How is time and effort budgeted to plan and execute priority changes and improvements? How will the next rehearsal be planned?
\end{itemize}
\subsection{Addendum:}
Operations Rehearsal \#1 occurred in May 2019. A short note, DMTN-119, gives a summary
report of its execution.
%when documents are updated this should change to \citedsp{DMTN-119}
\clearpage
%
%
%
\section{LDM 503-11: Operations Rehearsal \#2 for Commissioning}
\underline{Nominal Date:} December 2019 - February 2020
\underline{Original Description:}\\
More complete commissioning rehearsal:
\begin{itemize}[topsep=-8pt]
\item How do the scientists look at data?
\item How do they provide feedback to the telescope?
\item How do we create calibrations?
\item How do we update calibrations?
\end{itemize}
\subsection{An Updated Goal:}
The primary goal is to rehearse for commissioning operations prior to the ComCam
verification and validation era (including the mini-surveys).
Similar to Ops Rehearsal \#1, we would emulate both daytime and nighttime,
for a 3--5 days, would include daily meetings, exercise data movement and
processing. Additionally this rehearsal could include: application of software
changes, simulated outages, or non-standard (unprocessable) engineering
observations. If the Auxiliary Telescope Spectrograph has become available,
one alternative or extension that should be considered would be to use AuxTel
data and a pipeline as part of these exercises.
In the current time frame of this rehearsal, we do not expect a functioning
telescope + camera. Instead:
\begin{itemize}[topsep=-8pt]
\item ComCam should be either at the summit or Tuscon and on a test stand.
Therefore, we could use ComCam with a Camera Control System to obtain test-stand
images and send them through the DAQ for archiving and batch processing. In
addition simulated (or if ComCam is on the telescope, real) raft-scale data
would be used.
\item If simulated data are used then a set of raw data will be transferred to
a mountaintop computer which will then in turn mimic observations by sending
those images from the summit to NCSA via the long-haul networks.
\item The contents of the dataset would roughly match those expected during
ComCam verification activities. Thus, the dataset would be comprised of
calibration and nightly observations but might also include engineering data
(that might not be processed with a normal pipeline).
\item On arrival at the LDF the observations will be ingested into the current
data-backbone which can in turn be used to feed the data through a batch
production service to produce ``calibrations'' and ``reduced science products.''
If the DAQ2.5 hardware/software are available then prompt processing could
also be attempted for some observations.
\item Similar to the Ops Rehearsal \#1, the sophistication (or correctness)
of the pipelines are not paramount. What is important is that the raw and
resulting data products are tracked and can be superficially examined by LDF and
SciOps team members. The degree of realism would depend on both the data
being sent and availability of working pipeline tasks.
\end{itemize}
\clearpage
%
%
%
\section{LDM 503-12: Operations Rehearsal \#3 for Commissioning}
\underline{Nominal Date:} August 2021
\underline{Original Description:}\\
Dress rehearsal: commissioning starts in April so by this stage we should
be ready to do everything needed.
\subsection{An Updated Goal:}
Here the primary goal is to rehearse for commissioning operations prior to
LSSTCam start of integration and test (i.e. while LSSTCam is on the summit
but not yet integrated on the telescope). Similar to Ops Rehearsal \#2,
we would emulate both daytime and nighttime,
for a 3--5 days, would include daily meetings, exercise data movement and
processing. Additionally this rehearsal could include: application of software
changes, simulated problems, or non-standard (unprocessable) engineering
observations.
\begin{itemize}[topsep=-8pt]
\item LSSTCam should be at the summit in the clean room on its test stand.
LSSTCam would be exercised with its Camera Control System to obtain test-stand
images and send them through the DAQ for archiving and batch processing. This
could be supplemented with on-sky data from ComCam to exercise pipeline
processing.
\item The contents of the data would roughly match those expected during
LSSTCam verification activities but the use of on-sky data from ComCam would
not be supplemented (to ``simulate" data volume) but real-time processing
could be exercised.
\item On arrival at the LDF the observations will be ingested into the
data-backbone which can in turn be used to feed the data through a batch
production service to produce calibrations, reduced science products, and
quality assessments.
\item Similar to the other Ops Rehearsal \#1, the sophistication (or correctness)
of the pipelines are not paramount. What is important is that the raw and
resulting data products are tracked and can be examined by LDF and
SciOps team members. The degree of realism would depend on both the data
being sent and availability of working pipeline tasks.
\end{itemize}
\clearpage