Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the training loss and validation loss. #160

Open
fluency03 opened this issue Mar 29, 2016 · 2 comments
Open

Question about the training loss and validation loss. #160

fluency03 opened this issue Mar 29, 2016 · 2 comments

Comments

@fluency03
Copy link

As you have said in the following:

If your training loss is much lower than validation loss then this means the network might be overfitting. Solutions to this are to decrease your network size, or to increase dropout. For example you could try dropout of 0.5 and so on.

If your training/validation loss are about equal then your model is underfitting. Increase the size of your model (either number of layers or the raw number of neurons per layer)

The first part is quite clear. Regarding the second part, my question is:

If training loss << validation loss, it is overfitting; if roughly training loss = validation loss, it is underfitting. Then ,what is the balanced situation? Is it training loss > validation loss or training loss is lower but not much lower than validation loss?

I do not think training loss > validation loss will happen right?

@drohack
Copy link

drohack commented May 5, 2016

I have the same question.

I have however gotten my Training Loss > Validation Loss by increasing the Dropout to > 0.8 though that did cause the Validation Loss/Training Loss about twice as long (2x the epochs) to reach the minimum Validation Loss I could get.

I also have a follow-up. What is a good Validation Loss to get to for decent generation of data (I know this could be different for a given data set). No matter what variables I change I can't get my lowest Validation Loss < 0.5. Most of the time the Validation Loss will get close to 0.5 then start going back up. This would suggest that I'm Overfitting if I'm not mistaken.

About my data:
I have a 1Mb text file (all magic cards stripped down to useful information in json format, you can view it here). I'd like to have a bigger data set, but this is already all the cards ever produced. When running with any given variables I get about 3million - 5million parameters. It also takes only about 15 - 20 epochs to get the Validation Loss to around 0.5 before it won't go any lower, or starts going back up. Each "card" is between 100 - 400 characters long. The cards have been pre-shuffled (mainly so like colored cards are not next to eachother).

The "best" run I've done is the following (lowest Validation Loss):
th train.lua -data_dir data/mtg/ -num_layers 3 -rnn_size 512 -seq_length 300 -train_frac 0.95 -val_frac 0.05 -max_epochs 20 -seed $RANDOM -batch_size 25 -eval_val_every 200 -dropout 0.5
This produced a final Validation Loss of 0.4969 after the full 20 epochs (with the previous 7 epochs all being around 0.5).

All of my test have been on the base data, meaning I haven not been running it with the -init_from command on previous runs. The few times I have tried this the Training Loss either goes out of whack right away, or it doesn't produce any better minimum Validation Loss. Would running from previous save locations help/be any different than running the code for longer? I have the time and power to run this over thousands of epochs. But so far that hasn't seemed to help.

It's hard to really tell if any of my .t7 files are any better than the rest as so far they're fairly comparable. And it's not as though the "cards" it produce are really that "bad". But there are some patterns that I would like to code to pick up on. Like when cards reference themselves, the generated code never produces a card that references it's own name (it'll put some other random name instead). Or cards with bulletin points have "Chose one or both" before them, but none of the generated cards have this.
I know this also has to do with the Temperature when sampling. I've found that anything with a Temperature below 0.5 only creates very rudimentary cards. And anything above 0.9 creates mostly gibberish. I've been generating all of my cards at 0.7.
th sample.lua -length 5000 -temperature 0.7 -primetext "{\"Name\":\"Storm Crow\"," cv/lm_lstm_epoch20.00_0.4969.t7

@calicratis19
Copy link

I think this stackoverflow answer covers the confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants