New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new changes after meeting #3
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general agreement with what I think is the overall spirit: Dask/Spark on Google, BPS on USDF, with the BPS capacity primarily intended to be used for image processing.
But there are still some points where the language is ambiguous.
|
||
We will not at this point promise extensive Dask/Spark like services but should work on that in the background with LINCC. I think we all agree this will be scientifically useful, but we need to finish construction as a priority. | ||
We will not at this point promise extensive Dask/Spark like services but should work on that in the background with LINCC. | ||
We all agree this will be scientifically useful, but we need to finish construction as a priority and this is not a requirement. We will work with LINCC on it certainly and something will be available but we are not promising this and we are not accepting requirements on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the "this" that is not a requirement? Having a catalog-processing framework at all, or a fancy one?
proposal.tex
Outdated
|
||
\textbf{Do we agree to work with LINCC on DASK/Spark ? and promise the minimum batch system and next to the database processing as the construction deliverable} | ||
The standard/default allocation for any data rights user will be RSP access. | ||
Users wilt bulk needs will use BPS (\secref{sec:bpsbatch}) at a Data Facility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"with" bulk needs
This seems to be implying that, even if we do provide a Dask-like system, we won't provide any means for users to request additional quota.
So if we are sending user with "bulk needs" for catalog analysis (we're still in that section of this document, I think) to BPS, the original DM requirements do say that we have to provide a catalog processing framework for them there. At the very least, some tools for setting up a job to run over all the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok we may need to discuss further.
@@ -63,5 +85,5 @@ \subsection{Other catalogs} | |||
LINCC may again help here by coordinating IDACs to provide neighbor tables/services for other catalogs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this section, it's important to note that "externally provided catalogs" was here intended to mean "catalog data brought to the Level 3 systems by the user, and shouldn't be confused with "external catalogs", which in the community generally means "well-known publicly available catalogs" like Gaia or CatWISE.
Therefore it's not a problem that "getting a list has proved fairly inconclusive" - this wasn't meant to be defined in advance, but was part of what a user could submit an RAC request for space to do. A user could say, in 2026, e.g., "I have a list of 1 billion objects from SPHEREx and I want to request 500 GB of space to store that catalog within the RSP to facilitate matching to the Rubin catalogs on Qserv and/or in the Parquet environment -- and here's the science that will enable based on the published performance of that dataset." The RAC would then weigh that against other space requests.
Similarly, a user might have personally recomputed (offsite, say at TACC) improved photometric redshifts for 3B objects from the Rubin Object catalog, and want to store those "next to" - and joinable with - the Object table itself.
This doesn't invalidate your other points, but it's important context for understanding the original requirement.
@@ -59,17 +66,16 @@ \subsubsection{Unused home space} | |||
Space on cloud means cost. | |||
If a user is not using their space for long period it should be migrated to cheaper storage. | |||
Along period could be a year but six months seems long enough. | |||
Again we have no mechanism for this - Richard even suggested taking it to USDF but I wonder if cold storage on google is better. | |||
Again we have no mechanism for this - Richard even suggested taking it to USDF but cold storage on google might work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Google"
Some additional comments, as PDF markup, on portions of the document that were not touched by this PR: |
No description provided.