You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your documentation request related to a problem? Please describe.
I found this picture of Mapie in your docs, which shows the workflow of Mapie. My question is: How big is this
calibration dataset in case for example of the "cv+" method?
I find this question important, because the calibrated data is subtracted from the training data and can't be changed in size. This could lead to performance issues if the number of samples is small.
The text was updated successfully, but these errors were encountered:
Also wouldnt it make sense, in case of a regression problem, to sort the training data by the target feature before fitting it on mapie? I mean if you have samples which are sorted from lets say 100.000k to 600.000k Saleprice, in case of a housing problem, the leave-one-out-cv would basically calculate intervals in a space of values, which are simliar to each other and thus make more sense. For example fold1 = 100.000k -120.000k, fold2= 120.000k-140.000k and so on...
@vtaquet , I think the picture was specific to classification, at a time where we had not implemented cross-validation yet, only the split-conformal with cv="prefit" option. This picture is thus obsolete, and the size of the calibration set is defined by the number of calibration folds cv.
@gmartinonQM , the picture is indeed obsolete and should be updated in a future PR.
@nilslacroix , sorting the training data before splitting it into folds is up to the user and needs to be done before calling MAPIE. Your cross-validation strategy can be defined using the desired sklearn BaseCrossValidator object like KFold but keep in mind that the training and calibration sets need to have similar distributions.
Is your documentation request related to a problem? Please describe.
I found this picture of Mapie in your docs, which shows the workflow of Mapie. My question is: How big is this
calibration dataset in case for example of the "cv+" method?
I find this question important, because the calibrated data is subtracted from the training data and can't be changed in size. This could lead to performance issues if the number of samples is small.
The text was updated successfully, but these errors were encountered: