Hello,
I see your warning about negative KL divergence error for certain versions of tensorflow. Do you know of a version that avoids this, or of any other way to avoid it? I have the version in your yaml 2.7.0.
Also wondering how to overcome being stuck at a certain loss (in my case 86k, close to the starting 93k). This may be the averaging effect you note, but it occurs for any hyperparam combos I try for my data of ~6,000 sequences.
Thank you,
Mike
Hello,
I see your warning about negative KL divergence error for certain versions of tensorflow. Do you know of a version that avoids this, or of any other way to avoid it? I have the version in your yaml 2.7.0.
Also wondering how to overcome being stuck at a certain loss (in my case 86k, close to the starting 93k). This may be the averaging effect you note, but it occurs for any hyperparam combos I try for my data of ~6,000 sequences.
Thank you,
Mike