Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-35617: Add photometric repeatability metric #9

Merged
merged 1 commit into from Aug 4, 2022
Merged

Conversation

bechtol
Copy link
Contributor

@bechtol bechtol commented Jul 21, 2022

No description provided.

@@ -272,3 +277,71 @@ def getInputSchema(self) -> KeyedDataSchema:

def __call__(self, data: KeyedData, **kwargs) -> Vector:
return cast(Vector, data[self.vectorKey.format(**kwargs)])


class AndSelector(VectorAction):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs docstring. This is useful to split out, but we should think about if we want to update KeyedDataSelectorAction to make use of it. It would save duplication, but add an additional level in . hierarchy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose not to add the AndSelector and instead use MultiCriteriaDownselectVector.

python/lsst/analysis/tools/actions/vector/selectors.py Outdated Show resolved Hide resolved
python/lsst/analysis/tools/actions/vector/selectors.py Outdated Show resolved Hide resolved
python/lsst/analysis/tools/actions/vector/selectors.py Outdated Show resolved Hide resolved
python/lsst/analysis/tools/actions/vector/vectorActions.py Outdated Show resolved Hide resolved
python/lsst/analysis/tools/actions/vector/vectorActions.py Outdated Show resolved Hide resolved
@bechtol bechtol force-pushed the tickets/DM-35617 branch 3 times, most recently from 8fb7b45 to f6ee8f5 Compare July 28, 2022 04:45
@bechtol bechtol marked this pull request as ready for review July 28, 2022 13:55
@jeffcarlin jeffcarlin self-requested a review July 29, 2022 22:20
Copy link
Collaborator

@jeffcarlin jeffcarlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think it looks good! I did a test run on rc2_subset, and it ran fine.

There are a few comments, but otherwise the main barrier to merging is making sure the plot works. (I tried to test that, but came up against some errors.)

python/lsst/analysis/tools/actions/scalar/scalarActions.py Outdated Show resolved Hide resolved
mask *= subMask # type: ignore
return cast(Vector, data[self.vectorKey.format(**kwargs)])[mask]


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the purpose of MultiCriteriaDownselectVector (or any of the "Downselect" modules, actually). This looks like it does the same thing as "AndSelector" in vector.selectors, but just returns the subselection rather than the mask. In that case, why not use the selector, and simply apply the mask in the code?

Copy link
Contributor Author

@bechtol bechtol Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DownselectVector returns a vector where a single configurable action has been used to apply selection criteria. Usually this will be a shorter vector due to the applied filtering.

In this instance, I needed to apply several per-group selection criteria (e.g., number of repeated measurements, mean signal-to-noise ratio) to keep only the per-group signal-to-noise measurements of interest.

The first time I implemented this, I introduced the AndSelector, but then I felt that this made the syntax more complicated/verbose than I wanted. Basically I needed to introduce the new AndSelector and then set the multiple selectors for that selector. It added an extra layer of nesting that seemed unwarranted.

I also considered to make DownselectVector always allow for the possibility of multiple selectors by using selectors = ConfigurableActionStructField rather than selector = ConfigurableActionField. This also seemed unsatisfying because one would need to add an extra layer in cases where there is only a single criterion applied. For example

self.process.filterActions.xStars = DownselectVector(
    vectorKey="mags", selector=VectorSelector(vectorKey="starSelector")
)

would become

self.process.filterActions.xStars = DownselectVector(vectorKey="mags")
self.process.filterActions.xStars.selectors.star = VectorSelector(vectorKey="starSelector")

I wasn't sure that we wanted to introduce that additional layer.

I'm still not fully convinced that introducing MultiCriteriaDownselectVector is the optimal solution though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be remembering the And selector you had, but I could have sworn there was something different elsewhere. Anyway, the way I had imagined this to work is that you would build the mask you want (with all the various selectors) in the build stage, and then you would just apply that mask in the filter stage, and thus you would only need the one selector (VectorSelector), and then you would have the vector around for anything else you might need it for. I am not saying things MUST be done how I thought them out, just that that process made sense to me reading things out. I wanted to bring that up to the discussion for consideration, but if you determine your way is best I am happy to have the new selector, but it does make the other kind of redundant.

python/lsst/analysis/tools/analysisPlots/analysisPlots.py Outdated Show resolved Hide resolved
python/lsst/analysis/tools/analysisPlots/analysisPlots.py Outdated Show resolved Hide resolved
python/lsst/analysis/tools/analysisPlots/analysisPlots.py Outdated Show resolved Hide resolved
@@ -325,3 +330,70 @@ def getInputSchema(self) -> KeyedDataSchema:

def __call__(self, data: KeyedData, **kwargs) -> Vector:
return cast(Vector, data[self.vectorKey.format(**kwargs)])


# class AndSelector(VectorAction):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't intend this to go in (see later comment) it should be deleted

errCol = f"{fluxCol}{self.uncertaintySuffix.format(**kwargs)}"
result = cast(Vector, data[fluxCol]) / data[errCol] # type: ignore

return np.array(cast(Vector, result))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it you I discussed this with before and it was resolved? (Sorry I have been looking at so many reviews recently)

def getInputSchema(self) -> KeyedDataSchema:
return ((self.vectorKey, Vector),)

def __call__(self, data: KeyedData, **kwargs) -> Vector:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MyPy is basically saying it cant work out what type you want bands to be, because match arms are defining it to be different things:

  1. is defining it as tuple[Any, ...] - a multi element tuple of any type
  2. I think it is infering the same as 1
  3. it thinks it is a tuple with a single element of type Any i.e. tuple[Any]
  4. None

solution to all this is before the match statement add:
bands: tuple[Any, ...] | None
That declares the type for the variable before it is used anywhere


def __call__(self, data: KeyedData, **kwargs) -> Scalar:
mask = self.getMask(**kwargs)
values = data[self.vectorKey.format(**kwargs)][mask]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for now, there is no way for MyPy to know if when you do data[key] the result will be a Vector or Scalar. If it is a Scalar then you can't do [mask] because a number does not understand __getitem__. To get around this do:

from typing import cast
...
values = cast(Vector, data[self.vectorKey.format(**kwargs)])[mask]

@bechtol bechtol merged commit 066e017 into main Aug 4, 2022
@bechtol bechtol deleted the tickets/DM-35617 branch August 4, 2022 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants