`QuantileDifferenceReason` and `StandardDeviationReason` #28

FBruzzesi · 2021-12-17T08:02:15Z

Hey! I was thinking if it would make sense to add two more reasons for regressions tasks, namely something like HighLeveragePointReason and HighStudentizedResidualReason.

Citing Wikipedia:

Leverage is a measure of how far away the independent variable values of an observation are from those of the other observations. High-leverage points, if any, are outliers with respect to the independent variables (link)
A studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation. [...] This is an important technique in the detection of outliers. (link)

The text was updated successfully, but these errors were encountered:

koaning · 2021-12-17T10:01:42Z

Do you have an example (preferably something semi-real life) that demonstrates the utility of this technique?

FBruzzesi · 2021-12-17T13:00:45Z

Regarding studentized residuals:

The first thing that comes to mind is that in regression problems:
- Absolute difference may require domain knowledge to set a threshold;
- Relative difference can be very misleading when dealing with true values close to zero.
Therefore by standardizing/studentizing residuals one can use a default threshold.

Remark that the magnitude of diagonal elements H_ii of the hat matrix H = X @ np.linalg.inv(X.T @ X) @ X.T, involved in the computation, decreases quickly with the increase of the size of X (in the order of p/n, where n, p = X.shape).
Hence for simplicity it's often used the z-score with mean zero, as model errors should be 0-centered, and a threshold of 3 can be a good default.
Regarding high leverage:

As for the previous point, it may be hard to compute H for large values of n, and better methods for outlier detection can be used in most cases.

koaning · 2021-12-17T13:22:36Z

@FBruzzesi ah yeah that makes it more clear.

On the studentized residuals ... I think a bell-curve assumption for error might work for some instances, but not all. I'm wondering if the makes sense to introduce a QuantileDifferenceReason and a StandardDeviationReason for this realm of use-cases. Any concerns with using these two?

Regarding HighLeveragePointReason I'm leaning towards asking users to implement an outlier detection system for their use-case. How to detect an outlier tends to be very use-case specific.

FBruzzesi · 2021-12-17T15:10:34Z

I believe those two Reason(s) you are proposing should cover the majority of cases.
Also totally make senso to keep custom outlier detection model, especially since there is a OutlierReason already implemented.

koaning · 2021-12-17T15:28:51Z

I've changed the title of this issue to reflect this.

I'm not sure when I'll have time to work on this feature though. Part of me is also wondering if we should first find a representative dataset such that we might have a valid demo for these tools. Any suggestions for a dataset are very welcome.

FBruzzesi · 2021-12-17T15:31:29Z

I can work on it and try to find a toy dataset where it applies

koaning · 2021-12-17T15:33:46Z

Grand! Let me know if you appreciate any support/review.

My advice might be to first try to run the problem on the dataset before worrying too much about implementation. It's much easier to tackle the theoretical part of a problem when there's a practical example done.

FBruzzesi · 2021-12-17T19:32:21Z

Hey @koaning, I have few questions/observations:

StandardDeviationReason doesn't feel like a great name for the feature, as I would relate it to the overall model rather than single point predictions. How about something like AbsoluteDifferenceStdReason or StandardizedErrorReason?
Just want to make sure we agree on what QuantileDifferenceReason means. I implemented a check for residuals to be in within the [q1 - 1.5 IQR, q3 + 1.5 IQR] range where q1, q3 and IQR are first quartile, third quartile and interquartile range respectively. And more generally withing [quantiles[0] - multiplier * IQR, quantiles[1] + multiplier * IQR].
As a styling question: can I check values validity with assert statement (e.g. positive threshold and quantiles within 0-1 range)?
Finally, how should I proceed forward? I tested on diabets toy dataset from sklearn, yielding few examples for both Reasons.

Please let me know if something isn't clear, I may comment here with some code snippet as well if needed.

koaning · 2021-12-17T20:04:57Z

StandardDeviationReason doesn't feel like a great name for the feature

I think StandardizedErrorReason sounds good for now. I'll noodle on it a bit.

Just want to make sure we agree on what QuantileDifferenceReason means.

I was thinking that we sort the residuals and allow the user to say something like "assign doubt to all rows where the error is larger than the 95% quantile".

As a styling question: can I check values validity with assert statement

I usually resort to assert but sometimes in combination of np.all or np.isclose.

... yielding few examples for both Reasons

Did these yield the wrong labels? One thing you might want to try is to flip a few labels randomly upfront and to see if you can retrieve the flipped labels with this trick. It's not a perfect proxy, but it's a plausible demo.

FBruzzesi · 2021-12-17T21:02:58Z

I was thinking that we sort the residuals and allow the user to say something like "assign doubt to all rows where the error is larger than the 95% quantile".

This looks very deterministic, meaning that for any given model you will doubt 5% of the results. On the other hand using the usual boxplot ranges mentioned above (or any other user favourite quantiles-multipliers) may or may not result in doubt. Imagine having a error 0 centered and "very" symmetrical, then the former would still doubt some results, while the latter wouldn't.

Did these yield the wrong labels?

As this is a regression task I am not even sure what flipping labels exactly means. I am trying to add/multiply the feature matrix by random noise, then check if the rows I get back by DoubtEnsemle are the most pertubed ones.

koaning · 2021-12-18T15:35:54Z

Let me try to explain the "flipping labels experiment". Suppose we have a dataset X, y in a dataframe. Let's take, say 10%, of all rows and designate these to be shuffled.

Next, we take the y values that are designated to be shuffled and we shuffle these such that the original value is replaced by another value.

We now have a dataset where we know some of the y values to be false. We can then ask "does our approach find the bad labels?". It's a bit of a hacky way to go about it, since the way we simulate bad labels may certainly not resemble reality. But it's a proxy if nothing else that does suggest if we're able to find bad labels. If nothing else it should give us a hint on how reliable some of our doubt reasons might be.

koaning · 2021-12-18T15:36:35Z

This reminds me, we may want to have a utility submodule to make these kinds of experiments easy.

FBruzzesi · 2021-12-19T11:08:04Z

While working on such test, I find that

We now have a dataset where we know some of the y values to be false.

is kind of misleading, as predicted values are not influenced by the shuffle, however, by random chance, few shuffled y values may get closer to predicted values y_hat, ending up reducing the magnitude of the residual.

Focusing solely on those datapoints satisfying both the following conditions:

Shuffled data
Larger residual than original (remark that this can still be small in relative terms)

Then testing on diabets toy dataset from sklearn with 1000 different random states yields:

For StandardDeviationReason
- True positive rate of ~30%
- False positive rate of ~2%
For QuantileDifferenceReason (with the above mentioned boxplot method)
- True positive rate of ~13%
- False positive rate of ~0.2%
For QuantileDifferenceReason (by just sorting residuals and doubting those prediction with residual > 0.95-quantile)
- True positive rate of ~25%
- False positive rate of ~3.5%

koaning · 2021-12-19T12:37:54Z

Cool!

Just to confirm, could you varify the precision/recall values?

Also, when are you training your model, before or after the shuffling? If we're to match reality, we should train the model after we've shuffled.

FBruzzesi · 2021-12-19T14:01:17Z

Here are some of the stats:

reason	recall	precision	fpr
`QuantileDifferenceReason(quantile=0.95)`	0.23	0.39	0.035
`BoxplotReason(multiplier=1.5)` (*)	0.099	0.54	0.002
`StandardDeviationReason(threshold=2.)`	0.28	0.62	0.017

Yes shuffle and training is done is such order.

(*) Any better name for this one? Should we keep all these three reasons?

koaning · 2021-12-19T14:22:42Z

One final question before we move on (although the results themselves are pretty interesting!). Could you check if these numbers change much if you flip more/less labels? I might imagine that 1%, 5%, 10% label errors might yield different results.

FBruzzesi · 2021-12-19T14:48:11Z

The following results are mean scores across 500 different random states per reason-%shuffled pairs

reason	recall	precision	fpr	%shuffled
QuantileDifferenceReason	0.31	0.05	0.051	1%
QuantileDifferenceReason	0.40	0.24	0.042	5%
QuantileDifferenceReason	0.37	0.42	0.033	10%
QuantileDifferenceReason	0.29	0.62	0.023	20%
BoxplotReason	0.10	0.07	0.004	1%
BoxplotReason	0.13	0.34	0.003	5%
BoxplotReason	0.11	0.47	0.002	10%
BoxplotReason	0.06	0.52	0.0007	20%
StandardDeviationReason	0.29	0.072	0.037	1%
StandardDeviationReason	0.34	0.296	0.029	5%
StandardDeviationReason	0.30	0.461	0.023	10%
StandardDeviationReason	0.26	0.703	0.014	20%

koaning · 2021-12-19T15:51:27Z

Nicely done! It's interesting to see that the StandardDeviationReason seems to outperform the other two reasons.

As far as I'm concerned a PR for StandardDeviationReason can get started.

If you happen to have any benchmarking code to share I might consider saving that for the documentation as well.

FBruzzesi · 2021-12-19T20:36:21Z

@koaning I just found an error in the QuantileDifferenceReason implementation; I am updating the table above. I want to mention again that such reason will doubt a certain percentage of results no matter what.

Regarding some sample code, not sure where I should/could share it.

koaning · 2021-12-20T09:10:43Z

@FBruzzesi if it's a notebook you can put it in a Github gist if that's easier for you.

Issue #28, StandardizedErrorReason class

koaning · 2021-12-21T08:17:40Z

I've just merged #29. Before making a new release though I'm wondering if it makes sense to add the QuantileDifferenceReason as well. @FBruzzesi would you prefer to add it?

koaning · 2021-12-21T08:18:36Z

Actually ... the new method is listed on the readme so I should release a patch. Lemme do that real quick.

koaning · 2021-12-21T08:29:46Z

Done! I'll also make an announcement tomorrow for it. Got a twitter handle? If so I can give you a shoutout.

FBruzzesi · 2021-12-21T08:43:25Z

I feel like you are not actually conviced by these other methods! I will make a notebook illustrating them as soon as I have the time and maybe we can discuss whether to add them afterword.

FBruzzesi · 2021-12-21T08:45:09Z

Also, you should be able to find me on twitter as @BruzzesiFr

koaning · 2021-12-21T14:10:24Z

Just to be explicit; I very much appreciate the work you're doing here! But what method are you referring to now? The BoxplotReason?

I figured moving on to the QuantileDifferenceReason made sense because of its performance on your initial benchmark. I'll gladly consider other options but I do prefer a benchmark that backs up the reasoning.

Am looking forward to your notebook 👍

FBruzzesi · 2021-12-27T08:54:31Z

@koaning finally found the time to write a notebook, you can find it here.

koaning · 2021-12-27T09:37:17Z

Interesting!

I've added utility methods to the main branch that allows folks to play around with "flipping" labels in a subset. I'll likely also add some plotting functionality around it so we can get some "precision_at_k" and "recall_at_k" plots to compare approaches. My impression so far is that for some dataset/model/reason combinations it's very easy to find bad labels while for others it's barely better than random sorting.

koaning · 2021-12-27T09:38:11Z

I'll likely merge the plotting tonight and I'll also push a new version.

Out of curiosity, since you've given the library a spin already, are there any features missing in your opinion with regards to plotting?

FBruzzesi · 2021-12-27T10:38:56Z

As you may have noticed I work much more with regression problems than classification tasks. There are a lot of custom plotting I do when it comes to check results/predictions, and currently working on a (still private) library to standardize few of these checks.

That said, not sure of what you could integrate here, maybe something as simple as residual plot with different colors for doubted/non-doubted points, similar to what I tried to do in the notebook I just shared. Feel free to assign me such task if needed.

FBruzzesi · 2021-12-29T10:56:23Z

@koaning should we proceed to close this issue?

koaning changed the title ~~High leverage and studentized residual reasons~~ QuantileDifferenceReason and StandardDeviationReason Dec 17, 2021

koaning assigned FBruzzesi Dec 17, 2021

FBruzzesi mentioned this issue Dec 19, 2021

Issue #28, StandardizedErrorReason class #29

Merged

koaning added a commit that referenced this issue Dec 21, 2021

Merge pull request #29 from FBruzzesi/dev_issue28

9202b8a

Issue #28, StandardizedErrorReason class

koaning mentioned this issue Dec 22, 2021

Add flip_labels function. #31

Closed

koaning closed this as completed Dec 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`QuantileDifferenceReason` and `StandardDeviationReason` #28

`QuantileDifferenceReason` and `StandardDeviationReason` #28

FBruzzesi commented Dec 17, 2021

koaning commented Dec 17, 2021

FBruzzesi commented Dec 17, 2021

koaning commented Dec 17, 2021

FBruzzesi commented Dec 17, 2021 •

edited

koaning commented Dec 17, 2021

FBruzzesi commented Dec 17, 2021

koaning commented Dec 17, 2021

FBruzzesi commented Dec 17, 2021 •

edited

koaning commented Dec 17, 2021

FBruzzesi commented Dec 17, 2021

koaning commented Dec 18, 2021

koaning commented Dec 18, 2021

FBruzzesi commented Dec 19, 2021 •

edited

koaning commented Dec 19, 2021

FBruzzesi commented Dec 19, 2021 •

edited

koaning commented Dec 19, 2021

FBruzzesi commented Dec 19, 2021 •

edited

koaning commented Dec 19, 2021

FBruzzesi commented Dec 19, 2021

koaning commented Dec 20, 2021

koaning commented Dec 21, 2021

koaning commented Dec 21, 2021

koaning commented Dec 21, 2021

FBruzzesi commented Dec 21, 2021

FBruzzesi commented Dec 21, 2021 •

edited

koaning commented Dec 21, 2021

FBruzzesi commented Dec 27, 2021

koaning commented Dec 27, 2021

koaning commented Dec 27, 2021

FBruzzesi commented Dec 27, 2021

FBruzzesi commented Dec 29, 2021

QuantileDifferenceReason and StandardDeviationReason #28

QuantileDifferenceReason and StandardDeviationReason #28

Comments

FBruzzesi commented Dec 17, 2021

koaning commented Dec 17, 2021

FBruzzesi commented Dec 17, 2021

koaning commented Dec 17, 2021

FBruzzesi commented Dec 17, 2021 • edited

koaning commented Dec 17, 2021

FBruzzesi commented Dec 17, 2021

koaning commented Dec 17, 2021

FBruzzesi commented Dec 17, 2021 • edited

koaning commented Dec 17, 2021

FBruzzesi commented Dec 17, 2021

koaning commented Dec 18, 2021

koaning commented Dec 18, 2021

FBruzzesi commented Dec 19, 2021 • edited

koaning commented Dec 19, 2021

FBruzzesi commented Dec 19, 2021 • edited

koaning commented Dec 19, 2021

FBruzzesi commented Dec 19, 2021 • edited

koaning commented Dec 19, 2021

FBruzzesi commented Dec 19, 2021

koaning commented Dec 20, 2021

koaning commented Dec 21, 2021

koaning commented Dec 21, 2021

koaning commented Dec 21, 2021

FBruzzesi commented Dec 21, 2021

FBruzzesi commented Dec 21, 2021 • edited

koaning commented Dec 21, 2021

FBruzzesi commented Dec 27, 2021

koaning commented Dec 27, 2021

koaning commented Dec 27, 2021

FBruzzesi commented Dec 27, 2021

FBruzzesi commented Dec 29, 2021

`QuantileDifferenceReason` and `StandardDeviationReason` #28

`QuantileDifferenceReason` and `StandardDeviationReason` #28

FBruzzesi commented Dec 17, 2021 •

edited

FBruzzesi commented Dec 17, 2021 •

edited

FBruzzesi commented Dec 19, 2021 •

edited

FBruzzesi commented Dec 19, 2021 •

edited

FBruzzesi commented Dec 19, 2021 •

edited

FBruzzesi commented Dec 21, 2021 •

edited