How big is the calibration data, mapie uses to compute intervals #159

nilslacroix · 2022-04-24T10:11:06Z

Is your documentation request related to a problem? Please describe.
I found this picture of Mapie in your docs, which shows the workflow of Mapie. My question is: How big is this
calibration dataset in case for example of the "cv+" method?

I find this question important, because the calibrated data is subtracted from the training data and can't be changed in size. This could lead to performance issues if the number of samples is small.

nilslacroix · 2022-04-24T10:59:59Z

Also wouldnt it make sense, in case of a regression problem, to sort the training data by the target feature before fitting it on mapie? I mean if you have samples which are sorted from lets say 100.000k to 600.000k Saleprice, in case of a housing problem, the leave-one-out-cv would basically calculate intervals in a space of values, which are simliar to each other and thus make more sense. For example fold1 = 100.000k -120.000k, fold2= 120.000k-140.000k and so on...

gmartinonQM · 2022-04-24T12:52:32Z

@vtaquet , I think the picture was specific to classification, at a time where we had not implemented cross-validation yet, only the split-conformal with cv="prefit" option. This picture is thus obsolete, and the size of the calibration set is defined by the number of calibration folds cv.

vtaquet · 2022-05-02T11:42:35Z

@gmartinonQM , the picture is indeed obsolete and should be updated in a future PR.

@nilslacroix , sorting the training data before splitting it into folds is up to the user and needs to be done before calling MAPIE. Your cross-validation strategy can be defined using the desired sklearn BaseCrossValidator object like KFold but keep in mind that the training and calibration sets need to have similar distributions.

nilslacroix added the documentation Improvements or additions to documentation label Apr 24, 2022

gmartinonQM assigned vtaquet Apr 24, 2022

nilslacroix closed this as completed May 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How big is the calibration data, mapie uses to compute intervals #159

How big is the calibration data, mapie uses to compute intervals #159

nilslacroix commented Apr 24, 2022

nilslacroix commented Apr 24, 2022

gmartinonQM commented Apr 24, 2022 •

edited

vtaquet commented May 2, 2022

How big is the calibration data, mapie uses to compute intervals #159

How big is the calibration data, mapie uses to compute intervals #159

Comments

nilslacroix commented Apr 24, 2022

nilslacroix commented Apr 24, 2022

gmartinonQM commented Apr 24, 2022 • edited

vtaquet commented May 2, 2022

gmartinonQM commented Apr 24, 2022 •

edited