new changes after meeting #3

womullan · 2022-08-06T00:40:50Z

No description provided.

proposal.tex

gpdf

In general agreement with what I think is the overall spirit: Dask/Spark on Google, BPS on USDF, with the BPS capacity primarily intended to be used for image processing.

But there are still some points where the language is ambiguous.

proposal.tex

gpdf · 2022-09-12T22:48:01Z

proposal.tex


-We will not at this point promise extensive Dask/Spark like services but should work on that in the background with LINCC. I think we all agree this will be scientifically useful, but we need to finish construction as a priority.
+We will not at this point promise extensive Dask/Spark like services but should work on that in the background with LINCC.
+We all agree this will be scientifically useful, but we need to finish construction as a priority and this is not a requirement. We will work with LINCC on it certainly and something will be available but we are not promising this and we are not accepting requirements on it.


What is the "this" that is not a requirement? Having a catalog-processing framework at all, or a fancy one?

gpdf · 2022-09-12T22:51:13Z

proposal.tex


-\textbf{Do we agree to work with LINCC on DASK/Spark ? and promise the minimum batch system and next to the database processing  as the construction deliverable}
+The standard/default allocation for any data rights user will be RSP access.
+Users wilt bulk needs will use BPS (\secref{sec:bpsbatch}) at a Data Facility.


"with" bulk needs

This seems to be implying that, even if we do provide a Dask-like system, we won't provide any means for users to request additional quota.

So if we are sending user with "bulk needs" for catalog analysis (we're still in that section of this document, I think) to BPS, the original DM requirements do say that we have to provide a catalog processing framework for them there. At the very least, some tools for setting up a job to run over all the data.

ok we may need to discuss further.

proposal.tex

gpdf · 2022-09-12T23:24:55Z

proposal.tex

@@ -63,5 +85,5 @@ \subsection{Other catalogs}
 LINCC may again help here by coordinating IDACs to provide neighbor tables/services for other catalogs.


In this section, it's important to note that "externally provided catalogs" was here intended to mean "catalog data brought to the Level 3 systems by the user, and shouldn't be confused with "external catalogs", which in the community generally means "well-known publicly available catalogs" like Gaia or CatWISE.

Therefore it's not a problem that "getting a list has proved fairly inconclusive" - this wasn't meant to be defined in advance, but was part of what a user could submit an RAC request for space to do. A user could say, in 2026, e.g., "I have a list of 1 billion objects from SPHEREx and I want to request 500 GB of space to store that catalog within the RSP to facilitate matching to the Rubin catalogs on Qserv and/or in the Parquet environment -- and here's the science that will enable based on the published performance of that dataset." The RAC would then weigh that against other space requests.

Similarly, a user might have personally recomputed (offsite, say at TACC) improved photometric redshifts for 3B objects from the Rubin Object catalog, and want to store those "next to" - and joinable with - the Object table itself.

This doesn't invalidate your other points, but it's important context for understanding the original requirement.

proposal.tex

resources.tex

gpdf · 2022-09-12T23:29:16Z

resources.tex

@@ -59,17 +66,16 @@ \subsubsection{Unused home space}
 Space on cloud means cost.
 If a  user is not using their space for long period it should be migrated to cheaper storage.
 Along period could be  a year but six months seems long enough.
-Again we have no mechanism for this - Richard even suggested taking it to USDF but I wonder if cold storage on google is better.
+Again we have no mechanism for this - Richard even suggested taking it to USDF but cold storage on google might work.


gpdf · 2022-09-12T23:34:01Z

Some additional comments, as PDF markup, on portions of the document that were not touched by this PR:
DMTN-223-gpdf1-20220912.pdf

womullan added 2 commits August 5, 2022 17:40

new changes

2d9cd6d

post meeting changes

008ef58

timj reviewed Aug 15, 2022

View reviewed changes

proposal.tex Outdated Show resolved Hide resolved

timj reviewed Aug 24, 2022

View reviewed changes

proposal.tex Outdated Show resolved Hide resolved

Tim

b45e626

gpdf requested changes Sep 12, 2022

View reviewed changes

womullan added 2 commits September 13, 2022 12:10

GPDF comments

973c39c

DPDF PDF changes

c072f2c

womullan merged commit 09660c2 into main Sep 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new changes after meeting #3

new changes after meeting #3

womullan commented Aug 6, 2022

gpdf left a comment

gpdf Sep 12, 2022

gpdf Sep 12, 2022

womullan Sep 13, 2022

gpdf Sep 12, 2022

gpdf Sep 12, 2022

gpdf commented Sep 12, 2022

		@@ -63,5 +85,5 @@ \subsection{Other catalogs}
		LINCC may again help here by coordinating IDACs to provide neighbor tables/services for other catalogs.

new changes after meeting #3

new changes after meeting #3

Conversation

womullan commented Aug 6, 2022

gpdf left a comment

Choose a reason for hiding this comment

gpdf Sep 12, 2022

Choose a reason for hiding this comment

gpdf Sep 12, 2022

Choose a reason for hiding this comment

womullan Sep 13, 2022

Choose a reason for hiding this comment

gpdf Sep 12, 2022

Choose a reason for hiding this comment

gpdf Sep 12, 2022

Choose a reason for hiding this comment

gpdf commented Sep 12, 2022