Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cross-validated calibration #3851 #4009

Closed
wants to merge 1 commit into from

Conversation

durovo
Copy link
Contributor

@durovo durovo commented Dec 9, 2017

Implement a cross-validated calibration scheme (see #3851 ). This is still a work in progress. I will be completing the following tasks after discussing the API:

  1. Add more tests
  2. Fix memory leaks
  3. Documentation

EDIT:
Fixed the code to conform to the style guidelines.

@karlnapf
Copy link
Member

karlnapf commented Dec 9, 2017

This is great! Thanks!
I will review this in detail tomorrow

@durovo durovo force-pushed the feature/cvcalibration branch 3 times, most recently from 3b08714 to 3410ab7 Compare December 10, 2017 06:44
Copy link
Member

@vigsterkr vigsterkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot for the patch!
here are some changes that would be great to have

#include <shogun/lib/config.h>
#include <shogun/machine/Machine.h>

#ifndef _CALIBRATION_H__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to the beginning of the file with the define as well.... this way the above includes will be included only once even if shogun/evaluation/Calibration.h included more than once

CBinaryLabels* CCalibration::apply_binary(CFeatures* features)
{

CBinaryLabels* result_labels = (CBinaryLabels*)m_machine->apply(features);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this explicit casting will cause a lot of trouble if the returned label is not binary of the machine :) you should either do a dynamic_cast to check for this or simply get_label_type() == LT_BINARY

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I use SG_ERROR to indicate that the machine should return binary labels when the labels are not binary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could use REQUIRE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @vigsterkr, I have used the approach that you have mentioned (using get_label_type and REQUIRE). However, shogun seems to have a LabelsFactory class which has methods for doing the same thing. Is that the recommended way of doing this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm oh yeah labelfactory, indeed its the best option

{
index_t num_calibration_machines =
m_calibration_machines->get_num_elements();
CMulticlassLabels* result_labels =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above... do a safe typecheck

result_labels->get_multiclass_confidences(i);
for (index_t j = 0; j < num_samples; ++j)
{
confidence_values[j] = confidence_values[j] / temp_confidences[j];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be simply:

confidence_values[j] /= temp_confidences[j];

{
index_t num_calibration_machines =
m_calibration_machines->get_num_elements();
CMulticlassLabels* result_labels =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensure type

}
}

for (index_t i = 0; i < num_classes; ++i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.. omp?

temp_values = result_labels->get_multiclass_confidences(i);

#pragma omp parallel for
for (index_t j = 0; j < temp_values.vlen; j++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linalg::scale

m_calibration_method = calibration_method;
}

void CCrossValidatedCalibration::init()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing SG_ADD macros

~CSigmoidCalibrationMethod();

private:
float64_t a, b;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_a and m_b

}

CSigmoidCalibrationMethod::CSigmoidCalibrationMethod() : CCalibrationMethod()
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to init a and b variables and SG_ADD them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should they be initialized to 0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep that's a good value as initial value

@durovo
Copy link
Contributor Author

durovo commented Dec 10, 2017

Thanks a lot for the detailed review. I'll make all the required changes and make a pr. From my understanding, in general, I have to make the following rectifications:

  1. The ifdef should contain the whole header file.
  2. I can parallelize using omp wherever possible.
  3. Safe typecasting.
  4. Push repeated code to a common function.
  5. SG_ADD everywhere
    Also,
  6. Use linalg whenever possible

@vigsterkr
Copy link
Member

@durovo yes more or less that's the idea. as soon as you fix those the travis builds should be green. the reason they fail now is because you miss SG_ADD

@karlnapf
Copy link
Member

@vigsterkr was faster with reviewing :)

One comment: Let's maybe first make it work without parallel omp, as that might complicate things quite a bit. We can work on that once the other comments around duplicate code and style are resolved.

We could then solve parallel xvalidation and parallel calibration in the same go.

Also, what I would love to see is using the x-validation code in shogun to calibrate, rather than copying it (@vigsterkr mentioned this in his comments).

Great first start, thanks for the effort. Let's get this going, make sure to ask in IRC if you have questions or get stuck. Looking forward to the update

* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* (at your option) any later version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(in all the files)

index_t num_samples;

// normalize the probabilities
for (index_t i = 0; i < num_classes; ++i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of this can be done in a vectorised fashion using linalg, see e.g. https://github.com/shogun-toolbox/shogun/blob/develop/src/shogun/preprocessor/KernelPCA.cpp#L92

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm currently trying to vectorize wherever possible.


extern MultiLabelTestEnvironment* multilabel_test_env;

#ifdef HAVE_LAPACK
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is LAPACK needed for here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I borrowed this from another test. I will write proper test cases after we discuss the api. Or should I write them now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After API is frozen

{
SGVector<float64_t> scores = pred->get_multiclass_confidences(i);
if (i == 0)
confidence_sums = scores;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than writing unit tests here, a good sanity check would be to reproduce the plots in http://scikit-learn.org/stable/_images/sphx_glr_plot_calibration_curve_002.png
You could do that via say Python bindings. Once those work, then we can start adding more unit tests ...
Let me know what you think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like a great idea. I will try doing that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The python interfaces for these files are generated automatically right? Or do I have to add these files to some list? I am asking because I am not able to access these (CrossValidatedCalibration, Calibration, etc) from the shogun module in Python.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@durovo yeah as of this state the python interface will not contain these methods.. you'll need to add it to some definition files
but that should be another PR. for the time being you should write unit tests for this and test it from the C++ API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

@durovo
Copy link
Contributor Author

durovo commented Dec 13, 2017

I have tried to make all the suggested changes. However, I have not added omp everywhere which was suggested by @vigsterkr. As @karlnapf suggested, I will add it after I know that the code works correctly without it. I could add it now if required..

@vigsterkr
Copy link
Member

@durovo its fine to do the openmp part later... the point was rather that you should do it in other part of the code as well as use linalg:: when possible.

namespace shogun
{

class CSigmoidCalibrationMethod : public CCalibrationMethod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doxygen. please make sure at least that there's a brief description of the method itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, I'll add the descriptions for all methods (and params) and classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still needed

result = apply_once(temp_machine, data);
result_labels = CLabelsFactory::to_multiclass(result);

for (index_t i = 1; i < num_machines; ++i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why from 1 and not 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result from the first machine has already retrieved just before the loop. This is a dirty hack that I used because I couldn't find any method to get the number of samples in the features.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still start the loop at 0 for readability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. However, what is the standard way of getting the number of samples from a given set of feature? (i.e. how do I know how many labels will be there in the predictions before actually calling apply?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can still get the first element to do that.
And then you should ensure that all of the features have the same number of elements

Copy link
Member

@karlnapf karlnapf Dec 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, isnt that just get_num_vectors ?

SGVector<float64_t> confidence_sums;
confidence_sums = pred->get_multiclass_confidences(0);

for (index_t i = 1; i < num_classes; ++i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above... why index from 1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we can auto that

@vigsterkr
Copy link
Member

@durovo make sure that the next time you submit that the unit tests are passing, see:
53 - unit-CrossValidatedCalibrationTest (SEGFAULT)

on travis CI

@durovo
Copy link
Contributor Author

durovo commented Dec 13, 2017

@vigsterkr strange.. I'm sure that the unit tests are passing on my computer.

EDIT: My apologies, I had not tried running ctest. I had just used make test which did not show the memory errors.


for (index_t i = 1; i < num_classes; ++i)
{
SGVector<float64_t> scores = pred->get_multiclass_confidences(i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you use setUp to set up your cross estimator, and then have multiple tests that test different characteristics.

  • just calling pred (without asserting anything)
  • the sum probability thing you did
  • other things that are meaningful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be done

@karlnapf
Copy link
Member

Any updates here?
I am keen to help if something is unclear ...

@durovo
Copy link
Contributor Author

durovo commented Dec 23, 2017

@karlnapf I'm done with the descriptions. I have almost figured out the tests too. However, I had to shift to a new pc and now I'm facing this issue: #2841
I will get back here with the updates on this soon .

@karlnapf
Copy link
Member

Mmmmh weird. I don't understand this error and never managed to re-produce it locally. Let us know how you get on with it, and if you need help. You can ping me in irc anytime

@durovo
Copy link
Contributor Author

durovo commented Dec 26, 2017

@karlnapf I think the problem is related to some package installed using conda. For now, I'm using a vm for development. I will try experimenting a bit later to figure out the exact reason behind the problem.
I was writing the tests for calibration and I wanted to know that how do we calculate the log loss in shogun? The log loss for the calibrated predictions should be less than the uncalibrated predictions. There is a LogLoss implementation in the loss folder, however I could not find it being used anywhere.

@karlnapf
Copy link
Member

If you have a conda version of shogun installed you need to make sure it is not used in the compilation process ....

@karlnapf
Copy link
Member

Can you define the log loss?

@durovo
Copy link
Contributor Author

durovo commented Dec 26, 2017

I don't have the conda version of shogun installed. The reason behind the problem seems to be something else.
I was talking about this log loss:
http://www.exegetic.biz/blog/2015/12/making-sense-logarithmic-loss/

From what I understand, logarithmic loss should decrease after calibration since in a way we are optimizing this metric.

@durovo
Copy link
Contributor Author

durovo commented Dec 26, 2017

Maybe I misunderstood you.. are you asking me to define the log loss function myself? If that is what you were asking, then I yes certainly. Should I make a new class for it or define it as a throwaway function within the test?

@karlnapf
Copy link
Member

I see, we will need to implement that under CEvaluation don't think Shogun has it. Separate PR :)

@karlnapf
Copy link
Member

I think for testing this, some simpler tests would be a better start. A really simple case with a Mock model that outputs a fixed class probability, and then see whether the calibration works on that one.

I guess before writing tests, you could even first try to reproduce the plots for the sklearn calibration example (or something similar).

@karlnapf
Copy link
Member

As usual, let me know if you need more input on that

@durovo
Copy link
Contributor Author

durovo commented Dec 27, 2017

Understood. Log loss could come later I guess :). I will try some sanity checks just as you suggested and then write the tests.

@durovo
Copy link
Contributor Author

durovo commented Dec 29, 2017

@karlnapf fit_sigmoid method in Statistics.cpp (

CStatistics::SigmoidParamters CStatistics::fit_sigmoid(SGVector<float64_t> scores)
) does not use the target labels to set the values for the vector t (line 863). In the following paper, the target labels are used to set the values of t (see page 10).
https://www.csie.ntu.edu.tw/~cjlin/papers/plattprob.pdf
Is this intentional? Wouldn't this produce somewhat different results?
Also, does the splitting strategy randomize the subsets?
I managed to set up the python bindings and was trying to compare this implementation's results with sklearn's.

@durovo
Copy link
Contributor Author

durovo commented Dec 29, 2017

image
This is the calibration plot that I am getting. The score in the legend is the brier score loss.

@durovo
Copy link
Contributor Author

durovo commented Jan 11, 2018

@karlnapf the curve improves when I modify the fit_sigmoid function to use the true labels. Should I change that too?

@karlnapf
Copy link
Member

Hi!
Sorry for the long delay. This is great that you made the plots!
I have to run now, but will review tomorrow at latest

@durovo
Copy link
Contributor Author

durovo commented Jan 11, 2018

Understood :). Thanks a lot.

@karlnapf
Copy link
Member

karlnapf commented Jan 14, 2018

Re the sigmoid fit, are you talking about the line if (label[i] > 0)?
And in shogun, this is scores[i]>0?
If so, think about the decision function for SVMs, which is taking the sign of the scores.
However, it actually is `f(x)=\sign ( \sum_i \alpha_i k(x,x_i) +b ). So our implementation misses the bias term. BAD. I will open an issue to investigate this, let's address this is another PR.

EDIT: False alarm, the scores do actually include the bias, so checking them for +1 -1 is totally OK
See CKernelMachine::apply_get_outputs to see what is going on. @durovo let me know what you think

@karlnapf
Copy link
Member

Question: Why does the calibrated one in Shogun appears to not be calibrated in the plot ? :)

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey!
First of all, thanks for all this work! This is really useful! And sorry for the long delay (partly due to holidays/new years, and partly due to the size of this patch)

We need to talk about about the code design here. This is too many new classes and too much redundant code.

I don't like all the replicated x-validation code. What about we re-use the existing code? I would imagine that x-validation could accept call back functions of what to do with every fold and does all the bookkeeping around it. Your wrapped class would then only contain the calibration code (and maybe could use your strategy pattern for the way to calibrate).

A final thing is. I would appreciate if you could strip this PR down to a bare minimum (not yet covering all the cases) so that we have an easier time reviewing it. The longer a PR is, the longer it usually takes for someone to take it up. You can easily split this task into multiple PR (keep your work though). There is too many new classes in this one, too much scrolling ...

A good sequence of PRs could be

  • modify CCrossValidation to be able to deal with callbacks (in a way that would allow to build wrappers like the one you did), and make existing code work with these modifications
  • one after another:
    • calibration method base class & implementation & test
    • wrapper class, implementation, test
    • a clean up PR that adds docs, and addresses minor issues that came up during the code reviews
    • example cookbook, notebook addition (the plot you made is cool for that), etc

We are really excited to get this in, so let's keep on iterating! Maybe ping me in irc if you have questions

@@ -0,0 +1,282 @@
/*
* Copyright (c) The Shogun Machine Learning Toolbox
* Written (w) 2012 - 2013 Heiko Strathmann
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might want to add you own name here :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay :)

@@ -0,0 +1,113 @@
/*
* Copyright (c) The Shogun Machine Learning Toolbox
* Written (w) 2012 - 2013 Heiko Strathmann
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name

namespace shogun
{

class CCalibration : public CMachine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Calibration the best name? Could be ambiguous... CalibratedClassifier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CalibratedClassifier does seem like a better name

namespace shogun
{

class CCalibration : public CMachine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class needs a @brief description to tell users what it is doing

  • it is a meta class that wraps a classifier
  • it performs x-validation to...
  • ... calibrate the probability outputs
  • this is expensive
  • what method is used to calibrate
  • etc

namespace shogun
{

class CCalibrationMethod : public CMachine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit weird that both CCalibration and CCalibrationMethod both inherit from CMachine


extern MultiLabelTestEnvironment* multilabel_test_env;

#ifdef HAVE_LAPACK
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After API is frozen


for (index_t i = 1; i < num_classes; ++i)
{
SGVector<float64_t> scores = pred->get_multiclass_confidences(i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be done

}

SG_DEBUG("starting unlocked calibration\n", get_name())
/* tell machine to store model internally
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls dont copy code comments around

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood.


/* produce output for desired indices */
CCalibration* calibrator = new CCalibration();
calibrator->set_machine((CMachine*)m_machine->clone());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this replicates the data as well .... slow

CFeatures* features;
CLabels* labels;

if (get_global_parallel()->get_num_threads() == 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realised we really don't want to copy/replicate this buggy x-validation codes ...

@durovo
Copy link
Contributor Author

durovo commented Jan 14, 2018

@karlnapf Thanks a lot for the valuable feedback! I don't mind the delays at all, I understand that you must be busy with other work as well. I have already added the descriptions to my local repo but I have not pushed it yet. Maybe I wasn't clear about the fact that I didn't push any code this time. I was just discussing the calibration curves and checking if the fit_sigmoid was working correctly since the calibrated curve that I got does not seem calibrated at all.
I will try and remove the replicated x-validation code. The code would indeed be much cleaner if could reuse the existing x-validation API. I will try to make a more minimal PR just as you suggested.

@durovo
Copy link
Contributor Author

durovo commented Jan 30, 2018

@karlnapf I still have some doubts regarding the fit_sigmoid implementation.
I understand that we can check the scores for +1/-1. However, shouldn't we be fitting the sigmoid parameters with respect to the true labels? Currently, the implementation only takes the decision function of the machine to fit the parameters. Shouldn't it also take the labels as an input too? Doesn't the current approach introduce some kind of a bias?
I have gone through the implementation paper on Platt scaling which I believe was followed in the shogun implementation and also seen the scikit-learn implementation. Both take the decision function and the true labels (y) as input when fitting. To be specific, I believe that

should be if (labels[i] > 0) and we should take labels as an input to the function. Please correct me if I'm wrong.

1 similar comment
@durovo
Copy link
Contributor Author

durovo commented Jan 30, 2018

@karlnapf I still have some doubts regarding the fit_sigmoid implementation.
I understand that we can check the scores for +1/-1. However, shouldn't we be fitting the sigmoid parameters with respect to the true labels? Currently, the implementation only takes the decision function of the machine to fit the parameters. Shouldn't it also take the labels as an input too? Doesn't the current approach introduce some kind of a bias?
I have gone through the implementation paper on Platt scaling which I believe was followed in the shogun implementation and also seen the scikit-learn implementation. Both take the decision function and the true labels (y) as input when fitting. To be specific, I believe that

should be if (labels[i] > 0) and we should take labels as an input to the function. Please correct me if I'm wrong.

@durovo
Copy link
Contributor Author

durovo commented Jan 30, 2018

The libsvm implementation also uses the target labels https://github.com/cjlin1/libsvm/blob/88a1881f03ca139beff93170d7e6f36477fabe54/svm.cpp#L1733

@karlnapf
Copy link
Member

karlnapf commented Feb 2, 2018

Think when is labels[i] > 0 true. It is exactly then true when scores[i]>0 is true.
This is since labels[i] = sign(scores[i])

@durovo
Copy link
Contributor Author

durovo commented Feb 2, 2018

@karlnapf Sorry, for pestering you about this, however, I still don't think it is correct. From what I read, with platt scaling we are fitting our predictions (i.e. sign(scores[i])) to the true labels. The method in shogun gives the correct results when the machine is 100% accurate (i.e. when true_labels[i] == sign(scores[i])). This will not happen when the predictions are not accurate. The sklearn implementation of Sigmoid Calibration uses the true labels, in fact they have a test that checks for changes in calibrated results when the true labels are changed (after training the machine).

@karlnapf
Copy link
Member

karlnapf commented Feb 2, 2018

Hey!
I was in fact wrong! Thanks for pestering me :)
Yes it should use the provided training labels, not the predicted ones. I was on the wrong train there.
Wanna fix this?

@durovo
Copy link
Contributor Author

durovo commented Feb 2, 2018

Yes, I will fix this. :)

@durovo durovo mentioned this pull request Feb 9, 2018
@durovo
Copy link
Contributor Author

durovo commented Feb 11, 2018

@karlnapf After #4166 , the calibration is working correctly
image

@karlnapf
Copy link
Member

Hey that is great! I will review the other one now and then we can get this here in

@karlnapf karlnapf closed this Jul 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants