forked from SridharaDasu/SnCWhitePaper2015
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnsf_ca_2015_SnC.tex
890 lines (768 loc) · 54.4 KB
/
nsf_ca_2015_SnC.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
\documentclass[11pt,a4paper]{article}
\textwidth 6.5 true in
\textheight 8.5 true in
\oddsidemargin 0 true in
\evensidemargin 0 true in
\topmargin -0.25 true in
\usepackage{authblk}
\usepackage{graphicx}
\begin{document}
\title{NSF Cooperative Agreement - Computing Section}
\author[1]{Lothar~Bauerdick}
\author[2]{Ken~Bloom}
\author[3]{Sridhara~Dasu}
\author[4]{Peter~Elmer}
\author[5]{David~Lange}
\author[6]{Kevin~Lannon}
\author[7]{Salvatore~Rappoccio}
\author[1]{Liz~Sexton-Kennedy}
\author[8]{Frank~Wuerthwein}
\author[8]{Avi~Yagil}
\affil[1]{Fermi National Accelerator Laboratory}
\affil[2]{University of Nebraska -- Lincoln}
\affil[3]{University of Wisconsin -- Madison}
\affil[4]{Princeton University}
\affil[5]{Lawrence Livermore National Laboratory}
\affil[6]{University of Notre Dame}
\affil[7]{State University of New York -- Buffalo}
\affil[8]{University of California -- San Diego}
\renewcommand\Authands{ and }
\maketitle
\newpage
\section{Software and Computing}
\subsection{Introduction}
The NSF support for software and computing at US Universities
played a crucial role in the success of the CMS program, having
contributed to almost all the published work thus far, including the
discovery of the Higgs boson that completed the Standard Model of
particle physics. Continued NSF support for software and computing is
mandatory for future successes, including perhaps discovery of new
physics. In this section we briefly describe the current status and
future plans of the US CMS software and computing project, focussing on
its Tier-2 program, on which US and international CMS physicists rely
for extracting physics from the expected large CMS datasets.
The scale of computing resources necessary is directly coupled to the
foreseen output from the detector. The trigger rates have been
increased by an order of magnitude compared to the original goals at
the time of CMS computing TDR. The discovery of Higgs at low mass and
continued investigation of EWK scale physics requires low thresholds.
During the recently started new
phase of data acquisition, i.e., Run-2 (2015-18) and Run-3 (2021-23)
of LHC, about 300 fb$^{-1}$ will be accumulated. This
300-fb$^{-1}$-dataset presents two orders of magnitude increase in
data volume compared to the Run-1 (2009-12) dataset: An order of magnitude
increase in integrated luminosity, a factor of three increase in
trigger output rate to facilitate continued access to electro-weak
scale physics, and a factor of three or so increase in
event-complexity due to increased energy and instantaneous luminosity
leading to event pileup.
As the beam energy has reached more or less its maximum expected, it
is highly likely that from here on out analyses will tend to use all
the data accumulated over time, from the 3 fb$^{-1}$ at 13 TeV accumulated in 2015 to
the 300 fb$^{-1}$ expected by the end of Run 3 in 2025.
{\bf fkw: You say 3/fb here but use 4/fb in the spreadsheet. We should be consistent and change one or the other of them. I'd prefer changing the spreadsheet to 3 but I don't feel strongly.}
While the analysis and Monte Carlo production computing needs scale
roughly linearly with integrated luminosity, the reconstruction time
per event for both data and simulation grows roughly exponentially
with instantaneous luminosity.
Unfortunately, Moore's law scaling of computing capabilities and evolution of
storage have slowed down from x2 gains every 15-18 months 10 years ago to a modest x2 gain every 4-7 years expected in the future.
As a result, the overall computing
needs outpace expected technological advances and reasonable
funding scenarios by a factor 2-4 in Run 2 and 3, with even larger shortfalls projected for the HL-LHC era.
%requiring significant innovations in order to guarantee that the
%physics potential of the data taken is fully realized.
%Innovations in resource utilization,
%adaptation to modern computing architectures, and improved workflows,
%will need to make up for the limitations in raw scaling of resources. We
%briefly describe these evolutionary changes in the offing and project
%how agile computing, utilizing owned, opportunistic and commercial
%cloud resources, with dynamic data management and just-in-time data
%movement over wide-area networks, will work to meet our challenge.
Our vision for meeting the challenge of growth of computing needs
beyond what is affordable via a simple Moore's law extrapolation is
threefold. First, we will gain efficiencies by being overall more
agile in the way we use the traditional FNAL based Tier-1 and seven
university based Tier-2s center resources. Second, we will grow the
resource pool by more tightly integrating resources at all other
US-CMS universities, DOE and NSF supercomputing centers, and
commercial cloud providers as much as possible. And third, we will
pursue an aggressive R\&D program towards improvements in software
algorithms, data formats, and procedural changes for how we analyze
the data we collect and simulate, in order to significantly reduce the
computing needs.
Primary goal of our program is to empower physicists at all 48 US-CMS member institutions
to conveniently analyze CMS data. % oblivious of the computing resource provisioning.
For the next 5 years, we propose to capitalize on NSF investment in networking at US universities
as well as developments from other NSF projects [AAA, gWMS, OSG, PRP]
to integrate much more tightly resources at US-CMS member institutions beyond Tier-1 and Tier-2 centers
into the centrally operated services infrastructure of the CMS experiment.
In this context effort funded via the Tier-2 program will become responsible to maintain
infrastructure in the Science DMZs of US-CMS member institutions jointly with IT professionals at those institutions.
The effort funded via this proposal will provide consultation to campus IT organizations and ultimately maintain services
on hardware inside the various Science DMZs, in order to support the desired integration of campus IT with CMS IT.
%The central services provided by those supported in this project
%will be providing seamless access to distributed world-wide
%resources to all US CMS universities. In this context the
%Tier-2s will become responsible to maintain simple ``headnodes'',
%which provide seamless access or provide conslutation for small
%installations, which could serve as portals to the campus
%resources at all US CMS institutions, democratizing access to
%computing resources.
This proposal focusses on the NSF supported University
based computing, especially for the most diverse %chaotic
physicist-driven
scientific data analysis activities. A brief look at the computing
R\&D necessary for the HL-LHC phase (2025+), during which
another two orders of magnitude in data volume is expected,
is also discussed.
\subsection{University Facilities (WBS 2)}
The tiered computing model of the LHC experiments, based on a
distributed infrastructure of regional centers outlined by
the MONARC project {\bf ref}, includes a Tier-0 center at CERN,
one US based Tier-1 center at FNAL (WBS 1) and seven US
university based Tier-2 centers (WBS 2) at
{\bf Caltech, Florida, MIT, Nebraska, Purdue, UC San Diego and Wisconsin}.
Resources available at these centers funded through prior NSF
support are summarized in Tables~\ref{compute-resources}~and~
\ref{storage-resources}.
The original MONARC model of organization of CMS computing resources
slowly morphed into something that is much more flexible, and makes significantly more use
of networking. The US-CMS Tier-2 program has been the primary driver of this global evolution.
%in a tiered structure is now dated. While we retain Tier-0 at CERN for
%prompt processing, both calibration and reconstruction, the
%functionality at higher tiers is changing. Especially at the Tier-2s,
%we are evolving to a set of institutions providing portions of
%resources, focusing on local expertise, in a continuum infrastructure
%of services. Nevertheless, dedicated facilities at the existing
%Tier-2s to address the core analysis computing needs must be met.
The US-CMS Tier-2 program has thus provided the premier Tier-2 centers globally,
as well as the intellectual leadership to make radical changes. With the current proposal
we continue this transition, focusing on maintaining leadership in operations of a
production infrastructure, and leadership in engineering changes to afford
doing ever more demanding physics with fixed hardware budgets.
{\bf fkw: I think we need to turn the bulleted list below into short prose. Personally, I'd prefer
to just list major accomplishments in prose. Basically, take credit here for stuff that we introduced to CMS.
E.g. Commisioning the global data transfers matrix across all T1s and T2s, initiating the centralized MC production,
leading the commissioning of the Grid worldwide for CMS, introducing the concepts of late-binding WMS,
data federation, dynamic data placement, MiniAOD, global queue across analysis and production processing, what else?
I don't think we need to associate institutions to each item here. Let's just take credit for all of them as a collective of 7 T2s.}
The advantage of strengthening the existing university sites is multi-fold:
\begin{itemize}
\item Each university group brings unique experience and expertise to bear
\begin{itemize}
\item MIT: Dynamic data management and production operations expertise
\item Nebraska: Dr. Bockleman et al, brought in numerous innovations to CMS middleware
\item San Diego: Connections to SDSC, OSG and core CMS software developers
\item Wisconsin: Connections to HT-Condor and OSG core-developers
\end{itemize}
\item Connection to strong physics groups at the universities
\begin{itemize}
\item Student and postdoc physics analysts exercise the system providing
appropriate usecases for tuning, and provide prompt feedback for operations.
\item Faculty collaborations at the University level can bring in additional
campus or cloud resources
\item Opportunistic computing resources at the Universities amount to ~37\%.
\end{itemize}
\item Cost of infrastructure is subsidized at the Universities.
\item Cost of personnel is also lower.
\item Friendly competition amongst the sites results in increased productivity.
\end{itemize}
%fkw commented this out because I could not see why it is here:
%CMS computing workflows fall under few broad categories, namely, prompt
%calibration and reconstruction, which is primarily a Tier-0 functionality,
%centrally scheduled reconstruction of LHC data and Monte Carlo, which
%can be distributed world-wide at all tiers, centrally scheduled production
%of simulated data, and chaotic user analysis, which is primarily done at
%Tier-2s and any opportunistically available resources.
%CMS data is organized in several tiers ranging from RAW data acquired
%from the detector or simulated, to RECO format for reconstructed data,
%FEVT combining the two, full set of analysis objects (AOD) and
%compressed AOD, i.e., miniAOD. Ubiquitous access to AOD and miniAOD
%for the analysts is the key enabler for prompt production of physics
%results.
\subsubsection{Current Status}
The seven US Tier-2s rank amongst the top ten providers of the 50 such
CMS centers world-wide. Together they provide about 35\% of CMS
Tier-2 resources, outlined in Tables~\ref{compute-resources}~and~\ref{storage-resources}.
The compute resources at Tier-2s serve both production and physicist analysis cases.
The resource utilization at the US Tier-2s in the past month adds up to about 30,000 jobs
in steady state split equally between production and analysis workflows.
These centers together are hosting 10 PB of CMS centrally and
user produced data on their storage systems.
%The US CMS Tier-2s not only maintain resources, but also provide many additional services.
We continue to request in this proposal 2 FTE at each of the 7 Tier-2 institutions.
On average, 1.6 FTE of these are necessary at each center to provide high-quality service that results in
very high availability, upwards of 95\%. The personnel are responsible
for all aspects of provisioning these resources, from specifications through
deployment to operations, taking advantage of local considerations.
The remaining 7x0.4 FTE of effort take
on other roles within the larger US-CMS software and computing project.
CMS benefits from this close connection because of innovations and pioneering deployments initiated at US Tier-2s, such as
the most recent work in testing and commissioning of the world-wide
CMS data federation using AAA technologies.
\begin{table}
\begin{center}
\begin{tabular}{|l|c|c|c|}
\hline
& \multicolumn{3}{|c|}{\bf Number of Job Slots} \\ \cline{2-4}
{\bf Tier 2 Center} & {\bf Purchased} & {\bf Opportunistic} & {\bf Total} \\ \hline
Caltech & 5,780 & 384 & 6,164 \\
Florida & 4,126 & 6,068 & 10,194 \\
MIT & 5,200 & 2,056 & 7,256 \\
Nebraska & 5,840 & 3,717 & 9,557 \\
Purdue & 6,636 & 9,581 & 16,217 \\
UCSD & 5,256 & SDSC & 5,256 \\
Wisconsin & 7,860 & 2,713 & 10,573 \\ \hline
{\bf Total} & {\bf 40,698} & {\bf 24,502 +} & {\bf 65,200} \\ \hline
\end{tabular}
\caption[]
{
{\bf fkw: should this table be updated to the end of FY2015 purchased infrastructure?}
Useable batch slots currently deployed at US Tier2 centers.
The San Diego Supercomputer Center has in the past provided access
to resources via the NSF XRAC allocation process, and is committed to in addition
provide spare capacity on an opportunistic basis in the future.
%, which is in addition to routine
%job slots available at other Tier-2s.
}
\label{compute-resources}
\end{center}
\end{table}
\begin{table}
\begin{center}
\begin{tabular}{|l|c|}
\hline
{\bf Tier 2 Center} & {\bf Storage (TB)} \\ \hline
Caltech & TBD \\
Florida & TBD \\
MIT & TBD \\
Nebraska & TBD \\
Purdue & TBD \\
UCSD & TBD \\
Wisconsin & 2300 \\ \hline
{\bf Total} & {\bf TBD} \\ \hline
\end{tabular}
\caption[]
{
Useable storage space currently deployed at US Tier2 centers.
}
\label{storage-resources}
\end{center}
\end{table}
Run-1 analyses still on-going are based on the Run 1 AOD format, comprising 220kB per event on average.
Average AOD event sizes in Run 2 are roughly 500kB. The growth is partly due to increased pile-up, and partly by design,
to allow rerunning particle flow reconstruction in its entirety from the AOD.
In addition, the High Level Trigger output rate was increased by x3 from the end of Run 1 to the beginning of Run 2.
To contain costs and speed up physics analysis, US-CMS initiated the introduction of a
refined analysis format called ``MiniAOD''. Implemented in collaboration with CERN, this format is 1/10th the size of the AOD,
and typically requires somewhere between 1/2 to 1/5th the CPU power to
analyze. These reductions are accomplished by storing only more refined information,
requiring remaking of the MiniAOD in response to improved calibrations, or significant
improvements of physics objects. During Fall of 2015 CMS went through two iterations
of MiniAOD, with at least one more to follow before Moriond 2016 conference results.
MiniAOD is expected to satisfy the needs of at least 90 \% of all physics analyses, i.e.
it is envisioned that a few analyses will require more detailed information not present
in the MiniAOD, and will thus need to access the AOD or maybe even the RAW format.
The operational model today is that Analysis groups process these ``primary data'' to produce custom
Ntuples for their analyses at a event processing rate of 1-10Hz or so. The custom Ntuples are then typically
analyzed at x100 larger event rates. The transformation from primary data useful to the entire collaboration to custom data useful
only to a handful analyses is thus dramatically accelerating science and reducing computing costs with the drawback that non-negligible
amounts of disk space at Tier-2s need to be provisioned to host that custom data. US-CMS organizes this by assigning a fixed set of
University groups to each Tier-2 as their host Tier-2 for these custom data samples.
\subsubsection{Future Plans}
\noindent{\bf Tier-2 Resources}
To define plans for the future, we use a simple model that scales
the present usage of job slot count and storage, to future years based
expected LHC luminosity plan (see Table~\ref{projection}).
For the sake of simplicity, CPU requirements are estimated in units of number of
batch slots needed and the storage is determined by the amount of
data anticipated to be accumulated and equal amount of MC simulated.
In addition to the MiniAOD, we expect to require disk space to accommodate
$\sim$ 10\% of the data in AOD format on disk to allow the $\le$ 10\% of
analyses that require such detail.
In Table~\ref{projection}, we start by showing the actual numbers for
2015 scale of operations. We have accumulated about 4 fb$^{-1}$
of data from LHC runs and produced 10 fb$^{-1}$ of simulated data.
Including that simulated and accumulated in Run-1, we are running
30000 production jobs on 17 pb of data at US Tier-2s currently,
equally split amongst production and analysis jobs. We show the
average availability of job slots and storage at the US Tier-2s to
start the scaling from. We extrapolate to out years using the
LHC run plan, which sets the scale for integrated luminosity.
We assume that the production job slot usage scales as a function of
incremental luminosity expected that year. The analysis job slot usage
scales as a function of the accumulated by that time, because the analysts
will be combining all of the 13-TeV data in Run-2 and Run-3.
We expect that the storage in AOD format is limited to 10\% of
the data volume. The miniAOD format data is expected to be
used by most analysts reducing the needed storage volume. A
factor 50\% replication across all US Tier-2s is used to provide
the necessary speedy access and redundancy.
Note that in addition to the production data, we also have to provide
access to custom user data. We also must budget for disk space for
the CMS upgrade activities, in the process of tuning their specification
for the technical design reports. Wherever possible the data is not replicated,
nor is it assumed to be backed up to tape. Implicit in this operational model
is the assumption that the data can be reproduced easily enough if lost due
to disk failures at a Tier-2. Finally, there is a need for some RAW data on disk,
plus staging space to in front of the Tier-1 tape archive, as well as at the Tier-2s
for staging data as it is being reconstructed or simulations as they are being
produced.
The 2015 data does not mirror the anticipate usage in the out years,
which is likely larger due to complexity of events at higher luminosities
etc. However, we are projecting starting from current situation for
simplicity.
\begin{table}
\begin{center}
\begin{tabular}{|l|c|c|c|c|c|c|c|c|}
\hline
&\multicolumn{2}{|c|}{\bf Luminosity (fb$^{-1}$)}&\multicolumn{2}{|c|}{\bf Job Slots}&\multicolumn{2}{|c|}{\bf Storage (pb)}&\multicolumn{2}{|c|}{\bf Per Tier-2} \\ \cline{2-9}
{\bf Year}&{\bf Incr.}&{\bf Cumul.}&{\bf Prod.}&{\bf Ana.}&{\bf AOD}&{\bf MiniAOD}&{\bf Job Slots}&{\bf Storage (pb)} \\ \hline
2015& 4& 4& 15000& 15000& 3.8& 13& 5814& 2.0 \\ \hline
2016& 36& 40& 18000& 20000& 9& 31& 5429& 5.7 \\ \hline
2017& 40& 80& 20000& 40000& 18& 62& 8571& 11 \\ \hline
2018& 40& 120& 20000& 60000& 27& 92& 11429& 17 \\ \hline
2019& 0& 120& 10000& 60000& 27& 92& 10000& 17 \\ \hline
2020& 0& 120& 10000& 60000& 27& 92& 10000& 17 \\ \hline
2021& 60& 180& 30000& 90000& 40& 138& 17143& 25 \\ \hline
2022& 60& 240& 30000& 120000& 54& 184& 21429& 34 \\ \hline
2023& 60& 300& 30000& 150000& 67& 230& 25714& 42 \\ \hline
\end{tabular}
\caption[]
{
Projection of resources by the year from actual usage in 2015 for
out years through 2023, based on LHC luminosity expectation is
shown. The cost of resources which satisfy the projected needs,
including current Moore's law scaling, i.e., 10\% to 20\% reduction
in costs per year, ranges between \$308K and \$473K.
}
\label{projection}
\end{center}
\end{table}
In order to estimate the cost of provisioning the estimated needs in
Table~\ref{projection}, we start with the recent actual incurred costs
for hardware purchases, which is, \$118 per job slot and \$56 per TB.
It is well known that Moore's law scaling has slowed down. Based
on recent trends we anticipate 10\% to 20 \% reduction in cost of
provisioning resources in the out years. With these assumptions
we estimate that cost of provisioning resources at US Tier-2s
is an average of \$308K and \$473K per year.
The network bandwidth requirement will also scale with increased data
size and wide-area distributed computing. Typically sites are
connected through 100 Gbps network presently, and we expect
multi-100 Gbps connections in the coming years. Up to now, networking at Tier-2 centers has always been funded
via sources outside the US-CMS software and computing project. We expect this to stay that way, and are thus not budgeting
any costs for networking as part of this proposal.
This bottoms-up estimate of resource needs and costs are significantly
higher than what we would like. We are proposing in remaining sections
several ideas to mitigate the growing needs by innovation. Use of
non-traditional resources is the primary future goal. We note that
in recent years significant resources were available opportunistically
through the Tier-2s. We anticipate to increase this pool in the future.
However, we note that innovation needed to use a variety of resources
and environments requires R\&D and thus continued support of personnel.
We allocated XX\% of the effort of the Tier-2 personnel for this
purpose, in addition to the effort in the research areas described
in later sections.
\noindent{\bf Non-traditional resources}
In the past, resources beyond the traditional Tier-1 and Tier-2 sites were generically lumped into the category of Tier-3. The prevailing model of these resources was that they were structured more or less as small Tier-2 sites operated independently of the Tier-2 program by dedicated local administrators. With the rise of significant computing capabilities across U.S. university campuses, and in particular, driving by substantial NSF-ACI investment into networking infrastructure across more than 100 campuses nation wide, this model is changing. The Tier-3 site now functions more as a portal. It provides a local portal for university researchers to access larger-scale US-CMS computing resources. It also provides a portal for the CMS central computing infrastructure to access campus computing resources. We propose to strengthen and expand this new model for non-traditional computing resources by seamlessly integrating CMS and campus IT infrastructures in a way that minimizes administrative effort while maximizing flexibility. This approach is based on the NSF-funded ``Pacific Research Platform'' and the CMS Connect effort which utilizes the OSG's CI-Connect platform.
The central hub of our proposal is a single node that will be deployed at each participating institution. This node should be viewed as a ``Tier-3 in a box,'' a single, self-contained appliance that when deployed into a campus Science DMZ will bridge the CMS and campus infrastructures, enabling local CMS researches to access both sets of resources via a single portal. This node will provide interactive data analysis, batch submission, CVMFS software cache, XRootD data cache, and XRootD an server to export local data. The HTCondor batch systems implemented on these nodes are all connected to the global CMS HTCondor pool via glideinWMS. Similarily any University computing resources are integrated requiring nothing more than ssh access to a US-CMS account on the local University cluster. Local CMS university groups will thus be empowered to transparently use any and all local resources the University allows them to share in combination with the entire Tier-1 and Tier-2 system. Official CMS data is cached locally by the node as needed. Private data produced by the local university group is served out to the Tier-1 and Tier-2 system via the XRootD server integrated into the node. Each Tier-2 will also have an XRootD cache in order to transparently cache the private data of any of the local university groups to avoid IO latencies.
Deployment and maintenance of these nodes will be undertaken as a partnership between local campus IT and US-CMS Tier-2 personnel, following the model of the PRP. The PRP deploys single nodes into the Science DMZs of 20 institutions across the West Coast, including the US-CMS institutions UC Davis, UC Santa Barbara, UC Riverside, Caltech, and UC San Diego. These pieces of hardware are collaboratively maintained between the campus IT organizations and the PRP and SDSC teams at UCSD. In this model, local campus IT is responsible for the maintaining the hardware and local user accounts, while OS and software service (including necessary OSG and CMSSW element) maintenance is undertaken by a collaboration of US-CMS Tier-2 personnel and local campus IT effort. This is made manageable with minimal effort beyond the initial deployment by management of OS, US-CMS and OSG services, and local configurations via a central Puppet infrastructure.
We are proposing to scale out deployment and operations of this model across the US to as many US-CMS institutions as possible, focusing on the 25 institutions that have received ScienceDMZ funding from NSF-ACI since 2012. The hardware costs as well as the human effort to deploy and operate this system will be borne out of the Tier-2 portion of this proposal. At a cost of $\sim$\$10,000 per Tier-3 in a box, this is a modest fraction of the total Tier-2 hardware budget across the seven Tier-2s and the 5 years of this proposal. We fully understand that the above model will not be appropriate for all collaborating institutions within US-CMS. We thus augment it with an additional hosted service---CMS Connect---built on the OSG-connect/CI-Connect model pioneered by the University of Chicago OSG/ATLAS group. This service will provide identical functionality to the Tier-3 in a box for institutions that are either lacking appropriate network connectivity or a local IT organization that would be capable and/or willing to collaborate on the hardware and user account maintenance. There will be only a single instance of this "CMS-Connect" infrastructure for all these remaining groups. University groups will generally be better served by the more flexible and customized Tier-3 in a box approach, but with the combination of the two approaches ensures that all groups will be served.
%%%
%%% FKW's original text
%%%
%In the last few years, NSF-ACI made some very substantial investment
%into networking infrastructure across more than 100 campuses
%nationwide. Among them are 25 of the 40 collaborating universities in
%US-CMS. We propose to build on this NSF investment by working with all
%of them, as well as any of the remaining 15 universities interested,
%to fully integrate their campus IT operated hardware infrastructures
%and ScienceDMZs into the US-CMS Tier-2 infrastructure. This will be
%done following the model of the NSF funded ``Pacific Research
%Platform'' (PRP) using Open Science Grid (OSG) tools and processes.
%The PRP deploys single nodes into the ScienceDMZs of 20 institutions
%across the West Coast, including the US-CMS institutions UC Davis, UC
%Santa Barbara, UC Riverside, Caltech, and UC San Diego. These pieces
%of hardware are collaboratively maintained between the campus IT
%organizations and the PRP and SDSC teams at UCSD such that local IT is
%responsible for hardware and user account maintenance, and UCSD is
%responsible for all OS and software service maintenance. The
%functionality implemented includes interactive data analysis, batch
%submission, CVMFS software cache, XRootd data cache, and XRootd server
%to export local data. The hardware is effectively a Tier-3 in a box
%without any of the human maintenance needs from the local CMS
%community. The deployment model includes careful custom integration
%into any existing University clusters accessible to the local
%group. This is made manageable with minimal effort beyond the initial
%deployment by management of OS, US-CMS and OSG services, and local
%configurations via a central Puppet infrastructure at UCSD.
%
%The local CMS community is thus empowered to transparently use any and
%all local resources the University allows them to share in combination
%with the entire Tier-1 and Tier-2 system. Official CMS data is cached
%locally as needed. Private data by the local community is served out
%to the Tier-1 and Tier-2 system via XRootd servers. Each Tier-2 will
%also have an XRootd cache in order to transparently cache the private
%data of any of the local communities to avoid IO latencies due to WAN
%reads given the finite speed of light. The HTCondor batch systems
%implemented on these pieces of hardware are all connected to the
%global CMS HTCondor pool via glideinWMS. Similarily any University
%clusters are integrated requiring nothing more than ssh access to a
%US-CMS account on the local University cluster. Sharing policies are
%controlled locally following local rules at each University. We expect
%that some Universities will enable access to all of US-CMS to share
%their spare capacity, while others will be more restrictive. All of
%this is presently already deployed and operated by PRP and SDSC for
%the US-ATLAS group at UC Irvine, and is being deployed at the CMS
%institutions listed above. Operations for the UC groups is funded via
%a mix of NSF and state funds. We are proposing to scale out deployment
%and operations of this model across the US to as many US-CMS
%institutions as possible, focusing on the 25 institutions that have
%received ScienceDMZ funding from NSF-ACI since 2012. The hardware
%costs as well as the human effort to deploy and operate this system
%will be borne out of the Tier-2 portion of this proposal. At a cost of
%$\sim$\$10,000 per Tier-3 in a box, this is a modest fraction of the
%total Tier-2 hardware budget across the seven Tier-2s and the 5 years
%of this proposal.
%
%We fully understand that the above model will not be appropriate for
%all 40 collaborating institutions within US-CMS. We thus augment it
%with an additional hosted service build on the OSG-connect model
%pioneered by the University of Chicago OSG/ATLAS group. This service
%will provide identical functionality to the Tier-3 in a box for
%institutions that are either lacking appropriate network connectivity
%or a local IT organization that would be capable and/or willing to
%collaborate on the hardware and user account maintenance. There will
%be only a single instance of this "CMS-Connect" infrastructure for all
%these remaining groups. Groups within US-CMS are thus generally better
%off with a Tier-3 in a box, especially when they have sizable private
%data collections and large groups of students and post-docs.
Finally, we will fully integrate cloud services access into this
infrastructure in such way that local University groups can use local
funds to purchase cloud resources to augment their personal access to
computing resources, and thus accelerate their science. We expect to
be collaborating on this functionality with the HEPCloud project at
FNAL as well as the Open Science Grid.
In addition to all of the above functionality geared towards data
analysis, we propose to also integrate Supercomputing resources at DOE
and NSF funded national facilities mostly for the purpose of
simulation and reconstruction, i.e. the production of the official CMS
datasets. Again, we expect to collaborate heavily with HEPCloud and
OSG on the detailed access mechanisms and policies. At this point,
December 2015, HEPCloud is focused on AWS, while OSG is working with
Comet (NSF) and Cori (DOE) to understand the technical, operational,
and security processes for use of these supercomputers via OSG
interfaces.
\noindent{\bf Facilities Support Personnel}
Two persons at each facility are necessary to provide full coverage.
However, recent experience indicates that about 30-50\% of those
person's effort can be freed up for other work. Most of the effective
people involved in CMS computing are former HEP physicists, who have
now become experts in computing. They are able to provide wide-ranging
expertise in physics software development. The additional services we
expect Tier-2 personnel to provide are in the areas:
\begin{itemize}
\item Support for non-Tier-2 university portals to CMS cloud
\begin{itemize}
\item We expect each Tier-2 to support about 7 universities in their neighborhood.
\end{itemize}
\item Computing services for CMS upgrades and research to address future needs
\begin{itemize}
\item Development of simulation program for upgrade detectors
\item Production of simulation data for upgrade detectors
\item Participation in computing research
\item Participation in DIANA/HEP and other community wide computing projects for future.
\end{itemize}
\end{itemize}
\subsection{Operations (WBS 3)}
In addition to operating the Tier-2 facilities, personnel supported by this
project contribute to the operations of the distributed computing system of
the CMS experiment. The tasks performed by these staff members support the
efficient processing of data and successful execution of both production
and analysis computing jobs.
\subsubsection{Current Status}
US-CMS personnel fill a variety of roles in CMS computing operations.
MIT staff support Tier-0 operations for the experiment, overseeing the
day-to-day operation of the facility, which is of critical importance. Other
MIT personnel play leading roles in
operating the experiment's data transfer system and providing support for
the distributed grid infrastructure. UCSD maintains the CMS job submission
infrastructure. Nebraska provides support for AAA operations and for
network performance reliability. Johns Hopkins supports the operation of
the Frontier system that provides run conditions and other configuration
information for reconstruction and analysis jobs running on the distributed
infrastructure. {\bf Question from Kevin: Is the following statement still correct in the CVMFS era? Florida takes responsibility for software distribution
throughout the grid sites of the experiment.}
\subsubsection{Future Plans}
All of these activities are expected to continue in the coming years, as
they will always be necessary to the operation of the experiment. They
will become even more critical to the success of CMS as the number of sites
(including opportunistic sites) grows and highly distributed storage access
over the WAN using AAA increases. Additional operations support for smooth
operation of U.S. university portals (Tier-3-in-a-box) and efficient
harnessing of opportunistic resources is also anticipated.
\subsection{Computing Infrastructure and Services (WBS 4)}
CMS as a global experiment depends on a variety of computing infrastructure services (CIS), several of which
have been long term US-CMS commitments. In particular, US-CMS is traditionally leading in the areas
of data management and centralized production.
To meet two of the three goals outlined in the introduction, increased efficiencies across Tier-1 and Tier-2
and integration of additional resources outside those tiers, requires CIS to become substantially more agile.
During LS1, substantial improvements towards a more agile data management infrastructure
were made by creating a global XRootD Data Federation, and
development and deployment of a Dynamic Data Placement and Management (DDM).
The NSF funded effort led both of these. Data management and workflow
management development is done at Cornell, XRootD development at
UCSD, and DDM development at MIT. Additional
support was provided by NSF through the ``Any Data, Anytime, Anywhere'' (AAA) project to Nebraska, UCSD
and Wisconsin.
In the following, we briefly summarize those recent achievements, and then describe necessary future
work to be done within the present proposal.
\subsubsection{Current Status}
\noindent{\bf Dynamic Data Placement and Management (DDM)}
Underpinning the computing infrastructure in Run-1 was data
distribution mechanism implemented using PhEDEx software. The workflow
involved operators moving large chunks of data on command down the
hierarchical grid so that after initial calibration and reconstruction
at Tier-0, the raw data were moved to be archived at Tier-1 and
reprocessed as necessary. The (re)reconstructed data from Tier-1s were
further processed to obtain lower volume Analysis Objects Data (AOD)
versions whose copies were transferred to Tier-2s and placed on disk
storage for random access in chaotic analysis workflows. Similarly
the Monte Carlo (MC) simulation data was produced at Tier-2s,
aggregated, reconstructed and archived at Tier-1s. The MC AODs were
then placed on command by the operators at various Tier-2s. The net
result was multiple copies of data placed statically at various
facilities around the globe. It was observed that much of the disk
volume was occupied by often rarely used data.
The DDM software was developed and
commissioned during LS1 to address these shortcomings. DDM is now
deployed at all tiers to automatically prune the unused but archived
data using well defined policies. For example, the archived full
event, i.e., raw plus reconstructed quantities (FEVT) is pruned from
disk to keep sufficient space for the Tier-0 and Tier-1 reconstruction
workflows to execute smoothly. Most importantly we are able to keep at
least one copy of all AOD on disk somewhere in the CMS data
federation, and duplicate multiple times as needed for popular data.
DDM uses PhEDEx to manage the actual transfers. It thus replaces decisions made by human operators during
Run1 with algorithms based on dataset ``popularity'', i.e. use.
\noindent{\bf CMS Data Federation (AAA)}
CMS Data Federation is built to provide seamless international-scale data
access under the auspices of the AAA project. AAA removes the requirement of co-location of storage and
processing resources. The infrastructure is transparent, in that
users have the same experience whether the data they analyze is
halfway around the world or in the room next door. {\bf Comment from Kevin: Unfortunately the next statement is not true in my experience, but it does make data access failures more rare. I understand this is a marketing statement, but if we also want it to be true, we should change ``never'' to ``rarely''} It is reliable, in
that end users never see a failure of data access when they run their
application. It enables greater access to the data, in that users no
longer have the burden of purchasing and operating complex disk
systems. In fact, any data can be accessed anytime from anywhere with
an internet connection. The key to success of AAA is the improved
wide-area network access due to enhancements made to our dedicated LHC
network.
AAA is made possible by XRootD software, which allows the creation of
data federations. A data federation serves a global namespace via a
tree of XRootD servers.
%The leaves of this tree are referred to as
%data sources, as they serve data from the local storage systems.
%Each storage system is independent of the others, allowing for a broad
%range of implementations and groups to participate in the federation
%as long as they expose an agreed-upon namespace through the XRootd
%software. The non-leaf nodes have no storage, but may redirect client
%applications to a subscribed data source that has the requested file.
%Each host is subscribed to at most one redirector, called a manager;
%loops are disallowed. If the requested file is not present on a server
%subscribed to the redirector, then the client will be redirected to
%the current host's manager. The manager continues the process until
%either a source is found or the client is at the root of the tree. An
%application may thus be redirected to any host in the federation,
%irrespective of the branch point it initially accesses.
The CMS Data Federation is now fully deployed across all tiers of its
computing infrastructure. Easy access to this data federation across
the wide-area network is democratizing the computing abilities of
University groups across the world. Local campus clusters controlled by
non-CMS entities are easily integrated in the CMS computing
environment. Temporary access to dedicated large resources can be
purchased on commercial clouds or obtained from national or campus
research facilities.
The main advantages of AAA vs DDM are that AAA fully supports partial file reads,
a typical analysis job accesses less than 10\% of the data in each file, and ease of integration
of non-traditional resources. The drawback of AAA as deployed today is that data still needs to be placed first,
and replication of datasets need to be controlled according to needs.
The two systems AAA and DDM thus augment each other perfectly.
\noindent{\bf Improvements to CMS Workflows}
The main objectives of the workflows management middleware is to
process data as quickly as possible, maintain uniform load across all
resource sites and enable fast recovery in case of a site service
interruption, e.g., by relocating jobs on an alternate site, while
keep track of the integrity of the combined dataset.
During Run 1 each Tier was used for only a small subset of all workflows.
This led to inefficiencies and delays in processing due to inflexibility.
During LS1, we expanded dramatically what workflows can be run where, making the overall system
much more flexible. In addition, all resource usage, distributed analysis and centralized production,
is now scheduled via a single global HTCondor pool, allowing for relative prioritization of different
activities.
\noindent{\bf Resources beyond the Tiered System}
CMS is in exploratory phase for smoothly integrating opportunistic
resources for production and routine use. National research computing
sites such as NERSC and SDSC have large resources, but often requiring
additional work in adaptation of our software suite to smoothly
operate there. Some access restrictions are worked around with user
level code, e.g., CVMFS through Parrot and Docker/Shifter containers
on Cray supercomputers. Commercial clouds such as AWS have also been
used, but have cost-implications placing constraints on workflows.
Campus clusters not purchased via US-CMS funds are accessible both
through their OSG connection if existing generically
%, or by placing
%suitable head-nodes at the participating university CMS group
%facilities. This latter use is of particularly important for analysis
%groups at access their home resources seamlessly processing data from
%the central CMS data federation using centrally supported code and
%conditions repositories using technologies such as CVMFS and caching
%SQUIDs.
The CMS High Level Trigger farm (HLT) has been integrated into offline computing operations
via its OpenStack Cloud interface. HLT resources are now available for general processing anytime the
DAQ is not running.
%Final stages of physics analysis often involve workflows that are not
%centrally managed CMSSW framework jobs. Technologies such as CMS
%Connect are able to use campus grid and department level computer
%clusters to bring additional opportunistic uses for these cases.
\subsubsection{Future Plans}
The PhEDEx data management system has served CMS extremely well for more than 10 years.
However, it is ill equipped for the more agile needs of the future. E.g. its internal mechanisms for
source selection for a given transfer is much less agile than the bit-torrent-like multi source client
used in XRootD. It's internal back-off mechanism for handling transfer failures leads to long transfer
tails, and it is generally very difficult if not impossible to fill modern large bandwidth network pipes using PhEDEx.
A significant redesign of PhEDEx is necessary.
Such a redesign also provides an opportunity to consider discontinuing poorly supported protocols like SRM,
as well as duplication of protocols like replacing gridftp with XRootD, as the latter is required anyway
for agile operations.
Among the AAA toolset, a proxy-cache was developed that is not yet used in CMS.
Initial tests indicate that the cache system performs exceptionally well. As we gain more experience with this technology,
we may want to transition a sizable fraction of the US-CMS disk space at Tier-2s and Tier-3s into XRootD proxy caches
to gain additional efficiencies.
The entire end-to-end centralized data production process still has far too many human effort intensive aspects.
Significantly more automation is needed to make the overall system both more agile and more efficient. Today it is still not uncommon
that some resources, especially in US-CMS, are oversubscribed while others, especially elsewhere in the world, remain unused.
Enforcing dynamically changing processing priorities is also still very difficult due to multiple layers of queuing and workflow
restrictions. Significant efficiencies are yet to be gained here.
Finally, commissioning of resources beyond the tiered system is still at the very beginning, and while having large potential
for additional resources, will require significant effort still.
\subsection{Software and Support (WBS 5)}
Multicore computing systems have become ubiquitous in the past
decade. However, efficient use of available resources, especially
memory volume and access, required adaptation of our software to
suitable multithreaded frameworks. Keeping up with technology
evolution in the market requires continuous investigation and CMS
framework and utilities software development. Cornell, Princeton and
UCSD groups are engaged with central CMS in this essential software
development and support.
\subsubsection{Current Status}
A systematic effort to make the core CMSSW thread-safe has
successfully deployed it in the past year. The event display
for CMS has been reworked to work on a variety of platforms
conveniently.
\noindent{\bf Development of MiniAOD}
Physics analysis often involves a much smaller portion of
reconstructed data than is available. While the raw data acquired from
CMS is about 1 MB per event, the reconstructed objects more than
double that size typically. The AOD defined for Run-1 was successful
but was designed in a rather lax way resulting in a 400 kB size per
event, which when scaled to 300 fb$^{-1}$ results in unaffordable
data volume. Further, rate of event processing matters in time to
production of physics results, so size of event being small is also
beneficial for computational loads.
The US CMS personnel supported by NSF played key role in development
of the MiniAOD. Careful pruning of unused collections of objects, packing them
in appropriately sized containers resulted in redefined miniAOD which
is less than 50 kB per event. The miniAOD is now visualized as the main data format that will
be used by bulk of CMS analysts, while niche usecases involving the
original AOD format will be supported as needed. In rare cases FEVT
access may also be needed. As the miniAOD improves we anticipate AOD
replica counts will become small.
\subsubsection{Future Plans}
{FIXME: We need to list what is required to be done by the people
supported by this WBS.}
\subsection{Technologies and Upgrade R\&D (WBS 6)}
The main thrust of the R\&D effort of the project is to control the
rate of growth of computing required, and thus cost incured, and to retain
flexibility with regard to possible future changes in computer architecture.
There are fundamentally two largely independent effects that drive cost,
those that scale with event complexity, i.e. average pile-up (PU) per event,
and those that scale with integrated luminosity, or total data volume.
The cost of event reconstruction is driven by occupancy of the tracker.
Higher instantaneous luminosity leads to higher pile-up thus higher
occupancy and with it a near exponential growth in CPU time per event
in the pattern recognition step of the track reconstruction.
For example, an increase in the average number of PU events from 20-30
was measured to result in an increase of x3 of event reconstruction time in the current CMS software release.
This range of PU matches the expected running conditions during 2015/16.
To set the scale, the time to reconstruct a 13TeV top pair event exceeds the time to simulate
the same event at an average PU $\sim$ 25 which we expect to reach early on in 2016.
Any speed-ups of the reconstruction software, especially the tracking pattern recognition,
thus directly translate into computing cost savings.
Analysis, MC production, and data reprocessing all scale roughly linearly with total integrated luminosity, or total
data volume, leading to the x30 increase from the beginning of Run 2 in 2015 to the end of Run 3 in 2023 mention previously.
The software and computing R\&D program for the next 5 years is geared towards two timelines. It is meant to
engage in fundamental R\&D towards solving the challenges of scaling out computing for the High Luminosity LHC era
(2025-2035) but also to provide near term improvements that can be put into production during LS2 (2019/20)
in order to address the challenges of Run 3 (2021-23). Given limited resources in personnel, our strategy is to focus
on the long term with an eye towards adopting lessons learned in this process to address the Run 3 challenges.
{\bf fkw: the prose below is probably meant to go into the introduction in some form?}
High-Luminosity is the least
pleasant way to go exploring. Unlike high(er) energy, one has to cope
with increased event size (due to pile-up), pile-up complexity
increases due to many overlapping events, data set size increases due
to long running period required, impacting CPU, storage and network
resources. For example, an increase in the average number of PU events from 20-30
was measured to result in increase of a x3 of event reconstruction
time. Such size increases (or larger) are unaffordable and must be
prevented.
\subsubsection{Current Status}
During LS1, US-CMS drove multiple developments all focused on overall cost reductions in computing.
We led the algorithmic improvements and code optimizations in the pattern recognition software that reduced the reconstruction
time per event by xY for an average PU $\sim$ 30. We instigated the introduction of MiniAOD, reducing the event size by x10
and the average analysis processing time per event by xYY {\bf fkw: figure out what a defensible number of this is given the data we have
from the Data Popularity service}.
We prepared the core framework to be multithreaded blablabla --- Sridhara to fill in text from David Lange here ...
And we transitioned the computing infrastructure and services that CMS depends towards a suite of services that are much more agile.
US-CMS was the primary driver in all of these, and thus has a solid track record of innovation to be able to do more science
given fixed hardware investments. We propose to continue being an innovation leader in global CMS.
\subsubsection{Future Plans}
{\bf Reconstruction Software:}
Cornell, Princeton, and UCSD are collaborating on an ambitious R\&D program towards redesigning the core Kalman filter tracking algorithms of CMS for parallel architectures While the bulk of this R\&D is funded at the level of 3 FTE for 3 years
via an independent NSF grant {\bf [cite the PIF]}, the present proposal includes a modest effort of XX focused on deriving short term
benefits from the independently funded long term focused R\&D agenda.
Deriving short term benefits is particularly interesting in light of the planned roll-out of large Supercomputers at both DOE and NSF
based on the next two generations of Intel MIC processors. E.g. Cori Phase 2 at NERSC is expected to
include 9,300 Intel Knights Landing processors by 2017. Aurora at ANL is expected to deploy 50,000 Intel Knights Hill processors in 2018.
Similar plans exists in the NSF for the Stampede Supercomputer at TACC.
The degree to which CMS can benefit from these large scale resources
for its core processing needs in Run2 and Run3
will depend crucially on successfully transitioning lessons learned from the externally funded
long term R\& D program into production. This transition is within the scope of the present proposal.
{\bf R\&D towards a new data analysis model:}
For the HL-LHC era, CMS must contemplate a fundamental shift in the boundary between ``primary data'' and ``custom data''.
Already in Run 1, the custom data Ntuples typically were analyzed at event rates ranging from 100 Hz to 10 kHz.
Ntuple analysis is even today
in many cases IO rather than CPU limited. In contrast, the production of these custom Ntuples is almost always CPU limited. Even for the MiniAOD of Run 2, typical event processing rates reach little more than a few Hz.
The trade-off at work here is flexibility versus speed. A data format for the entire CMS collaboration must be flexible in content for two reasons. First it needs to satisfy many types of data analyses, and second it must be "forward compatible", i.e. a MiniAOD produced today
must still be useful a few months from now when the state of the art in physics object definition, jet energy calibrations, etc. etc. have
changed to incorporate a variety of improvements. The R\&D questions here include: Can we speed up MiniAOD to the kinds of
event processing rates typical for custom Ntuples? If we can, what does it mean for the Tier-2 infrastructure to support IO limited
jobs at large scale? E.g. do we need specialized batch system entry points for IO limited jobs for which disks need to be co-scheduled?
Can we reuse some of the industry standard products for IO limited jobs, or is this impossible because we would loose the benefits of partial file reads in ROOT IO?
\subsection{Coordination with CMS (WBS 7)}
US CMS S\&C personnel are well integrated in the CMS-wide coordination
efforts and hold management positions.
\subsubsection{Current Status}
Current support under this category includes S\&C coordination at
Princeton, reconstruction coordination at Wisconsin and UCSD, and user
support at UCSD.
\subsubsection{Future Plans}
It is anticipated that approximately a third of the management positions
in CMS are held by US personnel, of which NSF computing supported
personnel needs will have to be covered by this project. The need
is likely to remain approximately constant.
\end{document}
\end