New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-40633: Ignore many numpy warnings #155
Conversation
# of a task. When DM-39114 is implemented, this step should not | ||
# be required and may be removed. | ||
weakref.finalize(results, _plotCloser, *weakrefArgs) | ||
return results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return should be out of the with context.
b58d41e
to
9665675
Compare
@@ -155,11 +155,11 @@ def makePlot( | |||
* Statistics that are shown on the plot or used by the plotting code: | |||
* ``approxMagDepth`` | |||
The approximate magnitude corresponding to the SN cut used. | |||
* ``f"{self.plotName}_sigmaMAD"`` | |||
* ``f"{self.plotName}_nanmedian"`` | |||
The sigma mad of the distances to the line fit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why has this changed from sigmaMad to median?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, that was a find/replace fail I missed correcting.
The sigma mad of the distances to the line fit. | ||
* ``f"{self.identity or ''}_median"`` | ||
The median of the distances to the line fit. | ||
* ``f"{self.identity or ''}_hardwired_sigmaMAD"`` | ||
* ``f"{self.identity or ''}_hardwired_nanmedian"`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here
@@ -36,7 +36,7 @@ | |||
from scipy.stats import binned_statistic_2d, binned_statistic_dd | |||
|
|||
from ...interfaces import KeyedData, KeyedDataSchema, PlotAction, Scalar, Vector | |||
from ...statistics import nansigmaMad | |||
from ...statistics import nanmedian, nansigmaMad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think nanMedian and nanSigmaMad would be easier to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but I was following the numpy naming convention. Having said that, it might make more sense to not follow numpy so as to distinguish them instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that make sense. Less confusion about where they come from.
9665675
to
ab9063f
Compare
------- | ||
count : `Scalar` | ||
The number of unique rows in a given column. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we get rid of the more extensive docs? Same comment for the other places as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timj and/or @jonathansick can correct me if I'm wrong, but I thought the Parameters
section in a class docstring is meant to document __init__
, even for callable types. I don't think we should add docs for __init__
to Config
classes.
If we are to keep these, they should be docs for __call__
on every Action
, but IMO this is unnecessary in most cases as here the doc for count
is redundant with the class docstring.
|
||
def divide(dividend: Scalar | Vector, divisor: Scalar | Vector) -> Scalar | Vector: | ||
"""Return dividend/divisor.""" | ||
with warnings.catch_warnings(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not catch the issue rather than the warnings?
if divisor == 0:
return NaN
Or something? I guess it does the same thing but feels more honest. Also doesn't solve my problem that what I really want is it to also tell me why it has failed. Maybe we just need a metric for each whatever that is number of 0, number of NaN etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can, but it would be more complicated than that since divisor can be a vector.
Yes, earlier I was trying to say we should probably have metrics for each algorithm's individual flag columns where they are meaningful (and in the hopefully rare case of bad values without a flag set), but see further comments below.
|
||
def nansigmaMad(vector: Vector) -> Scalar: | ||
"""Return the sigma_MAD of a vector.""" | ||
return cast(Scalar, sps.median_abs_deviation(vector, scale="normal", nan_policy="omit")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that the sigma MAD or just the MAD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scale="normal"
makes it sigma_MAD.
Would we be better off having a data clean up step before it gets fed to any actions? Something that makes a mask that covers all the NaNs etc and logs a message about how many there were, then we do the actions on clean data? |
As I said above, we should have metrics for algorithmic flag columns. For the most part, numpy errors are cropping up where we are not adding the appropriate flag column(s) to the standard Visit/CoaddFlagSelector. Whether we should add selectors for flag columns is an interesting question. For example, for a size vs magnitude plot both axes have their own relevant flags. I think it would be useful to have a labelled summary with the number of sources with an x flag set, a y flag set, neither or both. This can get more complicated with extra columns (e.g. if there is a S/N selector using a column that isn't on either axis). |
8e9945a
to
3c1b532
Compare
d62184d
to
f8c562a
Compare
f8c562a
to
ff43f0f
Compare
These warnings spam the logs with mostly useless messages. You can tell which action they occur in and it's usually obvious why - divide by zero, nan, etc - but these are typically expected and not necessarily a problem for the task.