Strange behaviour of loss value over the phase of trainging #690

ziadloo · 2018-11-25T20:37:09Z

I'm using fastText to calculate word2vec for a corpus of text that I have like this:

./fasttext cbow -input corpus.txt -output result/corpus -lr 0.025 -dim 150 \
  -ws 5 -epoch 10 -minCount 5 -neg 10 -loss ns -bucket 2000000 \
  -minn 3 -maxn 6 -thread 72 -t 1e-4 -lrUpdateRate 100

During the phase of training, I dumped the loss value reported by the utility and then plotted the progress of this loss value over the course of training phase:

I omitted x axis labels since they didn't make sense. But for the sake of completeness,

Samples in the plot: ~5.6M
Original text file's size: ~28GB
Read words: 6752M
Number of unique words: 2956135
Number of labels: 0

My question is, why the loss value is going up at first? Does this mean that I'm better off not training so long and stop once the loss value is 2 (at the beginning)?

The text was updated successfully, but these errors were encountered:

Celebio · 2018-11-26T09:20:57Z

Hi @ziadloo ,
The loss reported by the command line is an averaged loss during the training. It is not exactly the loss you would obtain if you had to compute it once the training is done but it gives you an idea of if you are diverging or converging.

In your case the loss seems to diverge, I would recommend to decrease your lr parameter.

Regards,
Onur

ziadloo · 2018-11-26T14:00:27Z

Hi @Celebio ,

Thanks for the insight. Could you please verify my understanding?

Do you mean no increase in loss value is accepted ever?
Is there a rule of thumb what is an accepted value for the loss for a well-trained model?
Are the loss values returned by skipgram and cbow comparable? I mean can decide on the quality of models trained by the two algorithms solely based on their final loss value?

Regards,
Mehran

ziadloo · 2018-11-27T15:44:26Z

In case this might be helpful to anyone else, after changing the params to this:

./fasttext cbow -input corpus.txt -output result/corpus -lr 0.015 -dim 150 \
  -ws 5 -epoch 20 -minCount 5 -neg 10 -loss ns -bucket 2000000 \
  -minn 3 -maxn 6 -thread 72 -t 1e-4 -lrUpdateRate 100

The problem was fixed and the loss value progress changed to this:

EdouardGrave · 2018-12-04T09:17:09Z

Hi @ziadloo,

In general, a large increase of the loss function during training is a sign that the model is diverging. It is sometimes acceptable to observe a small increase of the loss (but no more than a few percent).
Not really, unfortunately.
No, these two models have different losses, and it is not possible to chose a model based on the value of the loss function alone. Even when comparing 2 models trained with skipgram (or cbow), the value of the loss function might not be a good criterion (as an example, increasing the number of negative sample usually increases the value of the loss, but also leads to better models).

Best,
Edouard.

hanson1005 · 2019-02-03T04:53:13Z

Does Fasttext have a built-in command that draws this loss plot over progress (or preferably over epoch) after fitting a skipgram model?
Otherwise, can anyone provide the code you used? Thank you!

ziadloo · 2019-02-03T15:38:59Z

AFAIK, fasttext does not have a built-in command to generate the plots. Personally, I used the progress output of the fasttext and saved it in a file (fasttext ... > ./skipgram.out) and then wrote a small script in python to extract the information I was looking for. It's just that for me, there were too many data points to plot. So I averaged over a window to get a smooth chart (window_size = int(sample_count / 1000.0)). Here's my script, even though I'm sure you need to modify it before you can use it:

fp = open('./skipgram.out') 

start = False
i = 0
loss = []
line = fp.readline().strip()
while line or i < 10:
    if line.startswith('Progress: 0.0%  words/sec/thread:'):
        start = True
    if start:
        i = i + 1
        columns = line.split(' ')
        if len(columns) >= 11:
            loss.append(float(columns[10]))
    line = fp.readline().strip()

sample_count = len(loss)
window_size = int(sample_count / 1000.0)

summations = []
counts = []
for i, sample in enumerate(loss):
    group = int(i / window_size)
    if len(summations) - 1 < group:
        summations.append(0.0)
        counts.append(0)
    summations[group] = summations[group] + sample
    counts[group] = counts[group] + 1

for group, sums in enumerate(summations):
    print(sums / float(counts[group]))

Then I simply saved the results to a text file and used a spreadsheet software (in my case it was LibreOffice's Calc) to plot the chart.

Cheers,

VETURISRIRAM · 2020-05-06T22:29:15Z

@ziadloo

In your explanation of getting loss values over training phase, you mentioned (fasttext ... > ./skipgram.out).

I tried the same but I think only the final stats are stored as shown below.

How do I save the intermediate loss values too in the file?

Thanks.

sakshishukla1996 · 2020-07-09T09:26:18Z

@VETURISRIRAM I faced the same problem while working on Linux but with Mac, I got the results at every epoch level.

NilsRethmeier · 2021-03-17T11:41:14Z

To make this loss tracking work in Linux you need to capture the loss outputs and then then replace the ^M symbol with newlines \n, since the python code is parsing by line, which in linux is indicated by the \n.

# ./fastText/fasttext skipgram -input sentences.txt -output sentences.txt -dim 100 &> progress.txt
# cat progress.txt | sed -e 's/\r/\n/g' > progress_with_newlines.txt

Then you can use @ziadloo 's python code, i.e. modify the originally posted code

fp = open('./progress_with_newlines.txt') 

start = False
i = 0
loss = []
line = fp.readline().strip()
while line or i < 10:
    if line.startswith('Progress: 0.0%  words/sec/thread:'):
        start = True
    if start:
        i = i + 1
        columns = line.split(' ')
        if len(columns) >= 11:
            loss.append(float(columns[10]))
    line = fp.readline().strip()

sample_count = len(loss)
window_size = int(sample_count / 1000.0)

summations = []
counts = []
for i, sample in enumerate(loss):
    group = int(i / window_size)
    if len(summations) - 1 < group:
        summations.append(0.0)
        counts.append(0)
    summations[group] = summations[group] + sample
    counts[group] = counts[group] + 1

for group, sums in enumerate(summations):
    print(sums / float(counts[group]))

If you do not want to waste time with visualization you can use:

# less progress_with_newlines.txt

and then just page-scroll through the file quickly by holding CRTL+F to see if the loss behaves - i.e. approx. monotonically decreases.

RichardVergin · 2022-11-02T12:03:58Z

Hi @ziadloo,

one question, to replicate your approach: how did you dump the loss value during your model training? Did you work in a jupyter notebook?

Thanks and best,
Richard

Celebio added the machine-learning issue/question to related general ML practice label Dec 3, 2018

EdouardGrave closed this as completed Dec 4, 2018

mlampros mentioned this issue Jan 31, 2019

Access to loss values while or after training mlampros/fastTextR#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange behaviour of loss value over the phase of trainging #690

Strange behaviour of loss value over the phase of trainging #690

ziadloo commented Nov 25, 2018 •

edited

Celebio commented Nov 26, 2018

ziadloo commented Nov 26, 2018 •

edited

ziadloo commented Nov 27, 2018

EdouardGrave commented Dec 4, 2018

hanson1005 commented Feb 3, 2019

ziadloo commented Feb 3, 2019

VETURISRIRAM commented May 6, 2020

sakshishukla1996 commented Jul 9, 2020

NilsRethmeier commented Mar 17, 2021 •

edited

RichardVergin commented Nov 2, 2022

Strange behaviour of loss value over the phase of trainging #690

Strange behaviour of loss value over the phase of trainging #690

Comments

ziadloo commented Nov 25, 2018 • edited

Celebio commented Nov 26, 2018

ziadloo commented Nov 26, 2018 • edited

ziadloo commented Nov 27, 2018

EdouardGrave commented Dec 4, 2018

hanson1005 commented Feb 3, 2019

ziadloo commented Feb 3, 2019

VETURISRIRAM commented May 6, 2020

sakshishukla1996 commented Jul 9, 2020

NilsRethmeier commented Mar 17, 2021 • edited

RichardVergin commented Nov 2, 2022

ziadloo commented Nov 25, 2018 •

edited

ziadloo commented Nov 26, 2018 •

edited

NilsRethmeier commented Mar 17, 2021 •

edited