Autotune convergence strategy #891

fclesio · 2019-08-30T15:26:41Z

Hello!

I was checking the Autotune implementation and I'm trying to figure out the strategy used by fastText for the search.

Checking the code we can find the search strategy for the Autotune follows:

For all parameters, the Autotuner have an updater (method updateArgGauss()) that considers a random number provided by a Gaussian distribution function (coeff) and set an update number between a single standard deviation (parameters startSigma and endSigma) and based on these values the coefficients have an update.

Each parameter has a specific range for the startSigma and endSigma that it's fixed in the updateArgGauss method.

Updates for each coefficient can be linear (i.e. updateCoeff + val) or power (i.e. pow(2.0, coeff); updateCoeff * val) and depends from the first random gaussian random number that are inside of standard deviation.

After each validation (that uses a different combination of parameters) one score (f1-score only) it's stored and the best one will be used to train the full model using the best combination of parameters.

Arguments Range

epoch: 1 to 100
learning rate: 0.01 to 5.00
dimensions: 1 to 1000
wordNgrams: 1 to 5
loss: Only softmax
bucket size: 10000 to 10000000
minn (min length of char ngram): 1 to 3
maxn (max length of char ngram): 1 to minn + 3
dsub (size of each sub-vector): 1 to 4

Is that correct or I'm missing something?

The text was updated successfully, but these errors were encountered:

Celebio · 2019-10-01T09:09:09Z

Hi @fclesio ,
You are right.

Best regards,
Onur

Allenlaobai7 · 2019-10-03T09:14:20Z

Hi @fclesio , is there any way to limit the range of the autotune parameters? Such as autotuning for the best epoch between 1 to 20.

Celebio · 2019-10-03T13:30:56Z

Hi @Allenlaobai7 ,
We thought about this and we are not sure to implement it, because it defeats a little bit the purpose of the autotune, we want it to work with less manual intervention as possible. However the fact that you are asking for it shows there is a need for this.

For the moment I suggest you to modify the source code, which is pretty straightforward.
Change this line from
args.epoch = updateArgGauss(args.epoch, 1, 100, 2.8, 2.5, t, false, rng_);
to args.epoch = updateArgGauss(args.epoch, 1, 20, 2.8, 2.5, t, false, rng_);

Best regards,
Onur

vit-suchomel · 2019-10-03T14:38:04Z

Hi @fclesio, @Celebio,
thanks for this thread!
Having seen the code, the range of dsub (size of each sub-vector) is actually 2 to 16 since 1 to 4 is in fact an exponent of 2 to set the value of dsub.
Best, W.

Allenlaobai7 · 2019-10-04T02:18:06Z

For the moment I suggest you to modify the source code, which is pretty straightforward.
Change this line from
args.epoch = updateArgGauss(args.epoch, 1, 100, 2.8, 2.5, t, false, rng_);
to args.epoch = updateArgGauss(args.epoch, 1, 20, 2.8, 2.5, t, false, rng_);

@Celebio Thank you for the prompt response! I will give it a try. I asked this because a trial involving 87 epochs took me 1.5h and therefore requires autotune to run for a very long time.

update: I changed the code but still having trial for epoch=87. Any idea?
update2: build again using cmake and it works. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autotune convergence strategy #891

Autotune convergence strategy #891

fclesio commented Aug 30, 2019

Celebio commented Oct 1, 2019

Allenlaobai7 commented Oct 3, 2019 •

edited

Loading

Celebio commented Oct 3, 2019

vit-suchomel commented Oct 3, 2019

Allenlaobai7 commented Oct 4, 2019 •

edited

Loading

Autotune convergence strategy #891

Autotune convergence strategy #891

Comments

fclesio commented Aug 30, 2019

Celebio commented Oct 1, 2019

Allenlaobai7 commented Oct 3, 2019 • edited Loading

Celebio commented Oct 3, 2019

vit-suchomel commented Oct 3, 2019

Allenlaobai7 commented Oct 4, 2019 • edited Loading

Allenlaobai7 commented Oct 3, 2019 •

edited

Loading

Allenlaobai7 commented Oct 4, 2019 •

edited

Loading