Manage imbalancing in TFT #1040

LuigiDarkSimeone · 2022-06-20T07:33:10Z

PyTorch-Forecasting version: 0.10.2
PyTorch version:
Python version: 3.8.5
Operating System: Windows

I have a dataset of several shops. For each I have a time series of sales.
Shops are spread unequally in the world (1000 in us, 100 in EU), I need to predict the sales based on the location and other variables.
However, such data set is imbalanced.
Is there a way to manage imbalance in TFT? (upsampling, downsampling, apply a weight-balance similar to sklearn, or force each batch to select equal number of example)

fnavruzov · 2022-06-20T10:51:23Z

Have you tried "weight" argument while creating datasets? You can create a column with weights to be used in training

ds = TimeSeriesDataSet(
    data=data[train_data_filter],
    time_idx=time_idx_col,
    target=...,
    weight='weight', # pass name of a weight column in your df, samples/sampler weight(s)
    group_ids=group_ids,
    ...
)

RonanFR · 2022-06-20T21:29:00Z

Hi @LuigiDarkSimeone,

As suggested by @fnavruzov, on way to "rebalance" the dataset could be to use the weight argument of TimeSeriesDataSet. This will generate a weight tensor in addition to the target tensor used while fitting the model.
Note that in this case, the portion of the loss associated to each sample is weighted differently. This is similar to what is done in scikit-learn (sample_weight argument of method .fit(...))
You could also use the weights to alter the probability of a given sample to be part of a mini-batch (sampling scheme). As indicated in the documentation, you can call method to_dataloader with a custom sampler, for example an instance of torch WeightedRandomSampler. You can find a small example here.
You can aso combine both 1) and 2).

N.B: The DeepAR paper empirically shows the benefit of method 2) compared to not using any weights. To the best of my knowledge, they do not present any result based on method 1). That being said, in their setting, the issue is the size of the dataset and the main problem in this case is to be able to select the most relevant samples (since the total number of samples is huge, it may not be possible to go over all samples several times during the training procedure and they show that weighting the samples based on their "velocity" greatly improves the performances).

See also: Weighted loss functions vs weighted sampling?

LuigiDarkSimeone · 2022-06-21T09:52:03Z

First of all thanks to @RonanFR and @fnavruzov, for your replies.
Lately it has been quite hard to get answers in here.
I will have a look at your oprions and test them to get whether they are suitable for my case.

Due to the struggling I am having to get answer, and since you look expert, I would like you to kindly have a look at this question I posted quite a few days ago (which it will never get an answer I guess):

#1032

I know it is not good practice to post another question in a different issue, so I really apologise in advance, but I cannot get over this problem, even after looking the source code.
Hope to hear from you soon

many thanks
Luigi

FrancescoFondaco · 2022-07-07T08:25:40Z

Thanks @RonanFR, @fnavruzov.

I am trying to implement what you've suggested using the "weight" argument in the TimeseriesDataset Class in order to manage imbalances in my dataset.

training = TimeSeriesDataSet(
    myData,
    time_idx="Time_idx",
    target="TVPI",
    group_ids=["Fund"],
    min_encoder_length=8,  
    max_encoder_length=80,
    min_prediction_length=1,
    max_prediction_length=30,
    weight="Weight"
    static_categoricals=...

Where the Weight column contains the weight associated to each sample.

Unfortunetly the described implementation raises the error below:

Would you know how to solve it?
Thanks,
Francesco

RonanFR · 2022-07-07T10:44:03Z

Hi @FrancescoFondaco ,

Can you provide a detailed minimal reproducible example that raises this error ? (small toy dataset of only few lines)

QijiaShao · 2022-07-22T17:20:20Z

Thanks @RonanFR, @fnavruzov.

I am trying to implement what you've suggested using the "weight" argument in the TimeseriesDataset Class in order to manage imbalances in my dataset.
training = TimeSeriesDataSet(
    myData,
    time_idx="Time_idx",
    target="TVPI",
    group_ids=["Fund"],
    min_encoder_length=8,  
    max_encoder_length=80,
    min_prediction_length=1,
    max_prediction_length=30,
    weight="Weight"
    static_categoricals=...
Where the Weight column contains the weight associated to each sample.

Unfortunetly the described implementation raises the error below:

Would you know how to solve it? Thanks, Francesco

Have you figured out this issue? I am having the same issue after adding the "weight" parameter. Thx!

terbed · 2023-05-08T12:09:51Z

Dear @FrancescoFondaco and @QijiaShao,
I suspect the issue is related to the automatic fill-forward nan mechanism. If your time index is not continuous then the missing steps are filled but the weights are missing for those samples. So you should disable automatic filling in case you are using weights. This is just a guess.

Best wished,
Daniel

manitadayon mentioned this issue Nov 6, 2023

Nhits weight argument fix in TimeSeriesDataSet #1432

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manage imbalancing in TFT #1040

Manage imbalancing in TFT #1040

LuigiDarkSimeone commented Jun 20, 2022

fnavruzov commented Jun 20, 2022

RonanFR commented Jun 20, 2022 •

edited

Loading

LuigiDarkSimeone commented Jun 21, 2022

FrancescoFondaco commented Jul 7, 2022

RonanFR commented Jul 7, 2022 •

edited

Loading

QijiaShao commented Jul 22, 2022

terbed commented May 8, 2023

Manage imbalancing in TFT #1040

Manage imbalancing in TFT #1040

Comments

LuigiDarkSimeone commented Jun 20, 2022

fnavruzov commented Jun 20, 2022

RonanFR commented Jun 20, 2022 • edited Loading

LuigiDarkSimeone commented Jun 21, 2022

FrancescoFondaco commented Jul 7, 2022

RonanFR commented Jul 7, 2022 • edited Loading

QijiaShao commented Jul 22, 2022

terbed commented May 8, 2023

RonanFR commented Jun 20, 2022 •

edited

Loading

RonanFR commented Jul 7, 2022 •

edited

Loading