Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking size attribute of dst when dst is None #2

Closed
hrishikeshvganu opened this issue Sep 23, 2017 · 1 comment
Closed

Checking size attribute of dst when dst is None #2

hrishikeshvganu opened this issue Sep 23, 2017 · 1 comment

Comments

@hrishikeshvganu
Copy link

hrishikeshvganu commented Sep 23, 2017

In the code below if dst is None the dst.sizes[idx] block in Exception ... will throw an unhandled error.

This is around https://github.com/facebookresearch/fairseq-py/blob/master/fairseq/data.py#L222

for idx in indices:
        # - 2 here stems from make_positions() where we offset positions
        # by padding_value + 1
        if src.sizes[idx] < 2 or \
                (dst is not None and dst.sizes[idx] < 2) or \
                sizes[idx] > max_positions - 2:
            raise Exception("Unable to handle input id {} of "
                            "size {} / {}.".format(idx, src.sizes[idx], dst.sizes[idx]))

To fix this (dst is not None and dst.sizes[idx] < 2) can be modified to (False if dst is None else dst.sizes[idx] < 2)

@edunov
Copy link
Contributor

edunov commented Sep 24, 2017

Thank you @hrishikeshvganu for finding this. It is now fixed.

@edunov edunov closed this as completed Sep 24, 2017
myleott pushed a commit that referenced this issue Sep 26, 2017
yqwangustc pushed a commit to yqwangustc/fairseq that referenced this issue May 3, 2019
…ain_step (facebookresearch#2)

Summary:
Pull Request resolved: fairinternal/fairspeq#2

Pull Request resolved: facebookresearch#689

We found not raising OOM during trainer.train_step causes various
issue, including NCCL hangs / gloo sync errors because gradient is not synced
properly. Before we found the root cause, let's give users an option to raise
OOMs.

Reviewed By: jmp84

Differential Revision: D15170357

fbshipit-source-id: 1c3defd70bf97b2f4e2f1b39661c735907258194
facebook-github-bot pushed a commit that referenced this issue May 3, 2019
…ain_step (#2)

Summary:
Pull Request resolved: fairinternal/fairspeq#2

Pull Request resolved: #689

We found not raising OOM during trainer.train_step causes various
issue, including NCCL hangs / gloo sync errors because gradient is not synced
properly. Before we found the root cause, let's give users an option to raise
OOMs.

Reviewed By: jmp84

Differential Revision: D15170357

fbshipit-source-id: 3e15e4e111a8380612157955509c39821a216ec4
taylanbil referenced this issue in taylanbil/fairseq Aug 15, 2019
pmichel31415 pushed a commit to pmichel31415/fairseq that referenced this issue Aug 24, 2020
moussaKam pushed a commit to moussaKam/language-adaptive-pretraining that referenced this issue Sep 29, 2020
…ain_step (facebookresearch#2)

Summary:
Pull Request resolved: fairinternal/fairspeq#2

Pull Request resolved: facebookresearch#689

We found not raising OOM during trainer.train_step causes various
issue, including NCCL hangs / gloo sync errors because gradient is not synced
properly. Before we found the root cause, let's give users an option to raise
OOMs.

Reviewed By: jmp84

Differential Revision: D15170357

fbshipit-source-id: 3e15e4e111a8380612157955509c39821a216ec4
facebook-github-bot pushed a commit that referenced this issue Apr 14, 2021
Summary:
Motivation:

I want to save checkpoints frequently, due to unreliable jobs in FB cluster that restart frequently. I want to do this without spamming Manifold storage, but still save some historical checkpoints (i.e. every 10k updates), so I can track how WER evolves over time.

To save frequently, I can use a small --save-interval-updates.

To delete old checkpoints to save storage, I can use --keep-interval-updates.

However, this deletes all old checkpoints. This is where --keep-interval-updates-pattern comes in. If I now do:

```
--save-interval-updates 1000
--keep-interval-updates 1
--keep-interval-updates-pattern 10000
```

This will:
1. checkpoint every 1000 updates so that job restarts don't impact us significantly
2. keep only the latest checkpoint to avoid saving a bunch of huge models in manifold
3. make an exception for #2 for every 10k updates so we can track WER over time

Reviewed By: myleott

Differential Revision: D27578403

fbshipit-source-id: 5aec2dc9a22778015f7a3daa017210190af81240
st-vincent1 added a commit to st-vincent1/fairseq that referenced this issue Jul 25, 2022
sunyt32 pushed a commit to sunyt32/fairseq that referenced this issue Mar 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants