Skip to content

Commit 7a3caf4

Browse files
committed
Minor fixes
1 parent 0c877a6 commit 7a3caf4

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

body.tex

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ \section{Architecture design} \label{sec:arch}
121121
New Data Butler backends were implemented during the PoC, including the S3 Datastore and the PostgreSQL Registry.
122122

123123
The Butler datastore is located in an S3 Bucket and follows the same hierarchical structure that POSIX datastore does.
124-
Consumed and produced datasets are read and written directly from S3 as bytes, whenever possible, and only downloaded to temporary files for objects whose formatters do not support serialization.
124+
Consumed and produced datasets are read and written directly from S3 as bytes, whenever possible, and only downloaded to temporary files for objects whose formatters do not support streaming.
125125
Since the directory structure is preserved by the S3 datastore the entire data repository is trivially transferable between the cloud and a local filesystem.
126126

127127
The Butler registry is an RDS PostgreSQL database that keeps track of all LSST science files.
@@ -180,7 +180,7 @@ \section{Execution results of the tract-sized DRP workflow}
180180
\label{sec:results}
181181

182182
After successful execution with the \texttt{ci\_hsc} test dataset, we scaled up the run to one full tract of the HSC-RC2 dataset, as defined in \jira{DM-11345}.
183-
The full HSC-RC2 input repository contains 108108 objects and totals $\sim$1.5TB, including 432 raw visits in 3 tracts and $\sim$0.7TB of calibration data.
183+
The full HSC-RC2 input repository contains 108108 S3 objects and totals $\sim$1.5TB, including 432 raw visits in 3 tracts and $\sim$0.7TB of calibration data.
184184
In this project, we targeted tract=9615 which was executed with the Oracle backend on the NCSA cluster in July 2019 as the S2019 milestone of the Generation 3 Middleware team; see \jira{DM-19915}.
185185
In terms of raw inputs, tract=9615 contribute around 26$\%$, or $\sim$0.2 TB, of the raw data in the HSC-RC2 dataset.
186186
We ignored patch 28 and 72 due to a coaddition pipeline issue as reported in \jira{DM-20695}.
@@ -208,7 +208,7 @@ \section{Execution results of the tract-sized DRP workflow}
208208
Typically \texttt{m4} or \texttt{m5} instances are used for the single frame processing or other small-memory jobs, and \texttt{r4} instances are used for large-memory jobs.
209209
After the workflow finishes, remaining running Spot instances may be terminated on the AWS console.
210210
Besides the 27075 pipetask invocations, Pegasus added 2712 data transfer jobs and one directory creation job.
211-
The total output size from the tract=9615 workflow is $\sim$4.1 TB with 74360 objects.
211+
The total output size from the tract=9615 workflow is $\sim$4.1 TB with 74360 S3 objects.
212212

213213
\subsection{Notes from the successful runs}
214214

@@ -219,7 +219,7 @@ \subsection{Notes from the successful runs}
219219

220220
In the first successful run \texttt{20191026T041828+0000}, a fleet of 40 \texttt{m5.xlarge} instances were used for single frame processing and then a fleet 50 \texttt{r4.2xlarge} memory optimized instances for the rest.
221221
A \texttt{m5.large} on-demand instance served as the master.
222-
The single frame processing part finished in ~4 hours; coadd and beyond took ~16 hours.
222+
The single frame processing part finished in~$\sim$4 hours; coadd and beyond took~$\sim$16 hours.
223223
In this run, the memory requirement of the large-memory jobs was slightly higher than half of a \texttt{r4.2xlarge}, resulting in instance resources not fully used for some time.
224224
This run spanned two billing days.
225225

@@ -451,8 +451,8 @@ \subsection{Tooling improvements}
451451

452452
\section{Summary}
453453

454-
In this \poc~project we have demonstrated the feasibility of LSST DRP data processing on the cloud.
455-
We implemented AWS backends in the LSST Generation 3 Middleware, allowing processing entirely on the AWS platform using AWS S3 object store, PostgreSQL database, and HTCondor software.
454+
In this \poc~project we have demonstrated the feasibility of LSST DRP data processing on the cloud with elastic computing resources.
455+
We implemented AWS backends in the LSST Generation 3 Middleware, allowing processing entirely on the AWS platform using AWS S3 object store (Butler Datastore), PostgreSQL database (Butler Registry), and HTCondor software.
456456
We analyzed cost usage in our test execution, and estimated cost for larger processing campaigns.
457457
The direct collaboration between LSST DM, AWS, and HTCondor team members was immensely helpful in achieving the goals. 
458458
We showcased our progress in a live demonstration in the LSST Project Community Workshop in Aug 2019, as well as a \href{https://confluence.lsstcorp.org/display/DM/Tutorials+at+the+Kavli+workshop}{hands-on tutorial in the Petabytes to Science Workshop} in Nov 2019.

0 commit comments

Comments
 (0)