Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new changes after meeting #3

Merged
merged 5 commits into from Sep 13, 2022
Merged

new changes after meeting #3

merged 5 commits into from Sep 13, 2022

Conversation

womullan
Copy link
Contributor

@womullan womullan commented Aug 6, 2022

No description provided.

proposal.tex Outdated Show resolved Hide resolved
proposal.tex Outdated Show resolved Hide resolved
Copy link

@gpdf gpdf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general agreement with what I think is the overall spirit: Dask/Spark on Google, BPS on USDF, with the BPS capacity primarily intended to be used for image processing.

But there are still some points where the language is ambiguous.

proposal.tex Outdated Show resolved Hide resolved
proposal.tex Outdated Show resolved Hide resolved
proposal.tex Outdated Show resolved Hide resolved

We will not at this point promise extensive Dask/Spark like services but should work on that in the background with LINCC. I think we all agree this will be scientifically useful, but we need to finish construction as a priority.
We will not at this point promise extensive Dask/Spark like services but should work on that in the background with LINCC.
We all agree this will be scientifically useful, but we need to finish construction as a priority and this is not a requirement. We will work with LINCC on it certainly and something will be available but we are not promising this and we are not accepting requirements on it.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "this" that is not a requirement? Having a catalog-processing framework at all, or a fancy one?

proposal.tex Outdated

\textbf{Do we agree to work with LINCC on DASK/Spark ? and promise the minimum batch system and next to the database processing as the construction deliverable}
The standard/default allocation for any data rights user will be RSP access.
Users wilt bulk needs will use BPS (\secref{sec:bpsbatch}) at a Data Facility.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"with" bulk needs

This seems to be implying that, even if we do provide a Dask-like system, we won't provide any means for users to request additional quota.

So if we are sending user with "bulk needs" for catalog analysis (we're still in that section of this document, I think) to BPS, the original DM requirements do say that we have to provide a catalog processing framework for them there. At the very least, some tools for setting up a job to run over all the data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok we may need to discuss further.

proposal.tex Show resolved Hide resolved
@@ -63,5 +85,5 @@ \subsection{Other catalogs}
LINCC may again help here by coordinating IDACs to provide neighbor tables/services for other catalogs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this section, it's important to note that "externally provided catalogs" was here intended to mean "catalog data brought to the Level 3 systems by the user, and shouldn't be confused with "external catalogs", which in the community generally means "well-known publicly available catalogs" like Gaia or CatWISE.

Therefore it's not a problem that "getting a list has proved fairly inconclusive" - this wasn't meant to be defined in advance, but was part of what a user could submit an RAC request for space to do. A user could say, in 2026, e.g., "I have a list of 1 billion objects from SPHEREx and I want to request 500 GB of space to store that catalog within the RSP to facilitate matching to the Rubin catalogs on Qserv and/or in the Parquet environment -- and here's the science that will enable based on the published performance of that dataset." The RAC would then weigh that against other space requests.

Similarly, a user might have personally recomputed (offsite, say at TACC) improved photometric redshifts for 3B objects from the Rubin Object catalog, and want to store those "next to" - and joinable with - the Object table itself.

This doesn't invalidate your other points, but it's important context for understanding the original requirement.

proposal.tex Show resolved Hide resolved
resources.tex Outdated Show resolved Hide resolved
@@ -59,17 +66,16 @@ \subsubsection{Unused home space}
Space on cloud means cost.
If a user is not using their space for long period it should be migrated to cheaper storage.
Along period could be a year but six months seems long enough.
Again we have no mechanism for this - Richard even suggested taking it to USDF but I wonder if cold storage on google is better.
Again we have no mechanism for this - Richard even suggested taking it to USDF but cold storage on google might work.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Google"

@gpdf
Copy link

gpdf commented Sep 12, 2022

Some additional comments, as PDF markup, on portions of the document that were not touched by this PR:
DMTN-223-gpdf1-20220912.pdf

@womullan womullan merged commit 09660c2 into main Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants