Skip to content
This repository has been archived by the owner on Mar 19, 2024. It is now read-only.

Strange behaviour of loss value over the phase of trainging #690

Closed
ziadloo opened this issue Nov 25, 2018 · 10 comments
Closed

Strange behaviour of loss value over the phase of trainging #690

ziadloo opened this issue Nov 25, 2018 · 10 comments
Labels
machine-learning issue/question to related general ML practice

Comments

@ziadloo
Copy link

ziadloo commented Nov 25, 2018

I'm using fastText to calculate word2vec for a corpus of text that I have like this:

./fasttext cbow -input corpus.txt -output result/corpus -lr 0.025 -dim 150 \
  -ws 5 -epoch 10 -minCount 5 -neg 10 -loss ns -bucket 2000000 \
  -minn 3 -maxn 6 -thread 72 -t 1e-4 -lrUpdateRate 100

During the phase of training, I dumped the loss value reported by the utility and then plotted the progress of this loss value over the course of training phase:

loss_progress

I omitted x axis labels since they didn't make sense. But for the sake of completeness,

  • Samples in the plot: ~5.6M
  • Original text file's size: ~28GB
  • Read words: 6752M
  • Number of unique words: 2956135
  • Number of labels: 0

My question is, why the loss value is going up at first? Does this mean that I'm better off not training so long and stop once the loss value is 2 (at the beginning)?

@Celebio
Copy link
Member

Celebio commented Nov 26, 2018

Hi @ziadloo ,
The loss reported by the command line is an averaged loss during the training. It is not exactly the loss you would obtain if you had to compute it once the training is done but it gives you an idea of if you are diverging or converging.

In your case the loss seems to diverge, I would recommend to decrease your lr parameter.

Regards,
Onur

@ziadloo
Copy link
Author

ziadloo commented Nov 26, 2018

Hi @Celebio ,

Thanks for the insight. Could you please verify my understanding?

  1. Do you mean no increase in loss value is accepted ever?
  2. Is there a rule of thumb what is an accepted value for the loss for a well-trained model?
  3. Are the loss values returned by skipgram and cbow comparable? I mean can decide on the quality of models trained by the two algorithms solely based on their final loss value?

Regards,
Mehran

@ziadloo
Copy link
Author

ziadloo commented Nov 27, 2018

In case this might be helpful to anyone else, after changing the params to this:

./fasttext cbow -input corpus.txt -output result/corpus -lr 0.015 -dim 150 \
  -ws 5 -epoch 20 -minCount 5 -neg 10 -loss ns -bucket 2000000 \
  -minn 3 -maxn 6 -thread 72 -t 1e-4 -lrUpdateRate 100

The problem was fixed and the loss value progress changed to this:

cbow

@Celebio Celebio added the machine-learning issue/question to related general ML practice label Dec 3, 2018
@EdouardGrave
Copy link
Contributor

Hi @ziadloo,

  1. In general, a large increase of the loss function during training is a sign that the model is diverging. It is sometimes acceptable to observe a small increase of the loss (but no more than a few percent).
  2. Not really, unfortunately.
  3. No, these two models have different losses, and it is not possible to chose a model based on the value of the loss function alone. Even when comparing 2 models trained with skipgram (or cbow), the value of the loss function might not be a good criterion (as an example, increasing the number of negative sample usually increases the value of the loss, but also leads to better models).

Best,
Edouard.

@hanson1005
Copy link

Does Fasttext have a built-in command that draws this loss plot over progress (or preferably over epoch) after fitting a skipgram model?
Otherwise, can anyone provide the code you used? Thank you!

@ziadloo
Copy link
Author

ziadloo commented Feb 3, 2019

AFAIK, fasttext does not have a built-in command to generate the plots. Personally, I used the progress output of the fasttext and saved it in a file (fasttext ... > ./skipgram.out) and then wrote a small script in python to extract the information I was looking for. It's just that for me, there were too many data points to plot. So I averaged over a window to get a smooth chart (window_size = int(sample_count / 1000.0)). Here's my script, even though I'm sure you need to modify it before you can use it:

fp = open('./skipgram.out') 

start = False
i = 0
loss = []
line = fp.readline().strip()
while line or i < 10:
    if line.startswith('Progress: 0.0%  words/sec/thread:'):
        start = True
    if start:
        i = i + 1
        columns = line.split(' ')
        if len(columns) >= 11:
            loss.append(float(columns[10]))
    line = fp.readline().strip()

sample_count = len(loss)
window_size = int(sample_count / 1000.0)

summations = []
counts = []
for i, sample in enumerate(loss):
    group = int(i / window_size)
    if len(summations) - 1 < group:
        summations.append(0.0)
        counts.append(0)
    summations[group] = summations[group] + sample
    counts[group] = counts[group] + 1

for group, sums in enumerate(summations):
    print(sums / float(counts[group]))

Then I simply saved the results to a text file and used a spreadsheet software (in my case it was LibreOffice's Calc) to plot the chart.

Cheers,

@VETURISRIRAM
Copy link

@ziadloo

In your explanation of getting loss values over training phase, you mentioned (fasttext ... > ./skipgram.out).

I tried the same but I think only the final stats are stored as shown below.

Screen Shot 2020-05-06 at 5 27 46 PM

How do I save the intermediate loss values too in the file?

Thanks.

@sakshishukla1996
Copy link

@VETURISRIRAM I faced the same problem while working on Linux but with Mac, I got the results at every epoch level.

@NilsRethmeier
Copy link

NilsRethmeier commented Mar 17, 2021

To make this loss tracking work in Linux you need to capture the loss outputs and then then replace the ^M symbol with newlines \n, since the python code is parsing by line, which in linux is indicated by the \n.

# ./fastText/fasttext skipgram -input sentences.txt -output sentences.txt -dim 100 &> progress.txt
# cat progress.txt | sed -e 's/\r/\n/g' > progress_with_newlines.txt

Then you can use @ziadloo 's python code, i.e. modify the originally posted code

fp = open('./progress_with_newlines.txt') 

start = False
i = 0
loss = []
line = fp.readline().strip()
while line or i < 10:
    if line.startswith('Progress: 0.0%  words/sec/thread:'):
        start = True
    if start:
        i = i + 1
        columns = line.split(' ')
        if len(columns) >= 11:
            loss.append(float(columns[10]))
    line = fp.readline().strip()

sample_count = len(loss)
window_size = int(sample_count / 1000.0)

summations = []
counts = []
for i, sample in enumerate(loss):
    group = int(i / window_size)
    if len(summations) - 1 < group:
        summations.append(0.0)
        counts.append(0)
    summations[group] = summations[group] + sample
    counts[group] = counts[group] + 1

for group, sums in enumerate(summations):
    print(sums / float(counts[group]))

If you do not want to waste time with visualization you can use:

# less progress_with_newlines.txt

and then just page-scroll through the file quickly by holding CRTL+F to see if the loss behaves - i.e. approx. monotonically decreases.

@RichardVergin
Copy link

Hi @ziadloo,

one question, to replicate your approach: how did you dump the loss value during your model training? Did you work in a jupyter notebook?

Thanks and best,
Richard

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
machine-learning issue/question to related general ML practice
Projects
None yet
Development

No branches or pull requests

8 participants