Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

Make tensorboard robust to NaN and Inf in model params #1206

Closed
wants to merge 1 commit into from

Conversation

kmalik22
Copy link

Summary:
PyText FBLearner training runs with tensorboard will fail if the model has any NaN or Inf parameters during training with the following error:
ValueError: The histogram is empty, please file a bug report.

NaN/Inf params show up quite often in hyperparam sweep runs if a bad initial value of hyperparam is chosen.
When this happens, the whole sweep can fail.

Some examples just from the past two days:
f156398891
f156398480
f156398264
f156399047
f156399125

This diff:

  • Catches tensorboard exceptions, and prints an error to stderr. Note that TensordboardChannel already catches some exceptions while exporting the model
  • Adds a unit-test with a model that has NaN and Inf parameters, to verify no Tensorboard errors

Reviewed By: psuzhanhy

Differential Revision: D19048506

@facebook-github-bot facebook-github-bot added CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported labels Dec 18, 2019
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D19048506

…rch#1206)

Summary:
Pull Request resolved: facebookresearch#1206

PyText FBLearner training runs with `tensorboard` will fail if the model has any `NaN` or `Inf` parameters during training with the following error:
`ValueError: The histogram is empty, please file a bug report.`

NaN/Inf params show up quite often in hyperparam sweep runs if a bad initial value of hyperparam is chosen.
When this happens, the whole sweep can fail.

Some examples just from the past two days:
f156398891
f156398480
f156398264
f156399047
f156399125

This diff:
- Catches tensorboard exceptions, and prints an error to stderr. Note that `TensordboardChannel` already catches some exceptions while exporting the model
- Adds a unit-test with a model that has `NaN` and `Inf` parameters, to verify no Tensorboard errors

Reviewed By: psuzhanhy

Differential Revision: D19048506

fbshipit-source-id: 1449423d9c2502e0ade5d327f4cc597c26edeb64
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D19048506

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 631e07f.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants