Update attention / self-attn based models from a series of experiments #821

rwightman · 2021-08-20T23:15:23Z

remove dud attention, involution + my swin attention adaptation don't seem worth keeping
add or update several new 26/50 layer ResNe(X)t variants that were used in experiments
remove models associated with dead-end or uninteresting experiment results
weights coming soon...

* remove dud attention, involution + my swin attention adaptation don't seem worth keeping * add or update several new 26/50 layer ResNe(X)t variants that were used in experiments * remove models associated with dead-end or uninteresting experiment results * weights coming soon...

…p / fixes for byoanet. Rename resnet26ts to tfs to distinguish (extra fc).

…ore clarity

* add polynomial decay 'poly' * cleanup cycle specific args for cosine, poly, and tanh sched, t_mul -> cycle_mul, decay -> cycle_decay, default cycle_limit to 1 in each opt * add k-decay for cosine and poly sched as per https://arxiv.org/abs/2004.05909 * change default tanh ub/lb to push inflection to later epochs

…g as an alternate to CE w/ smoothing. For training experiments.

…current (distributed) training experiments.

…pdates, repeat augment, and bce loss

… with at least one halo in stage 2,3,4

…ll trying to improve reliability of sgd test.

…f, now figuring out why they were so low...

…d SE-HaloNet33-TS weights.

Update attention / self-attn based models from a series of experiments

rwightman added 22 commits August 20, 2021 16:13

Fix typo

a5a542f

Add resnet26ts and resnext26ts models for non-attn baselines

a8b6569

Improve performance of HaloAttn, change default dim calc. Some cleanu…

8449ba2

…p / fixes for byoanet. Rename resnet26ts to tfs to distinguish (extra fc).

Merge branch 'master' into attn_update

2568ffc

Another attempt at sgd momentum test passing...

fc894c3

Use Tensor.unfold().unfold() for HaloAttn, fast like as_strided but m…

3b9032e

…ore clarity

Update HaloAttn comment

492c0a4

Add a BCE loss impl that converts dense targets to sparse /w smoothin…

ba9c110

…g as an alternate to CE w/ smoothing. For training experiments.

Add RepeatAugSampler as per DeiT RASampler impl, showing promise for …

f262137

…current (distributed) training experiments.

Update training script and loader factory to allow use of scheduler u…

fb94350

…pdates, repeat augment, and bce loss

Fix misnamed arg, tweak other train script args for better defaults.

5db057d

Fix updated validation_batch_size fallback

0639d9a

Adding the attn series weights, tweaking model names, comments...

484e616

Add baseline resnet26t @ 256x256 weights. Add 33ts variant of halonet…

76881d2

… with at least one halo in stage 2,3,4

Add initial AttentionPool2d that's being trialed. Fix comment and sti…

5f12de4

…ll trying to improve reliability of sgd test.

Swap botnet 26/50 weights/models after realizing a mistake in arch de…

8642401

…f, now figuring out why they were so low...

Cleanup weight init for byob/byoanet and related

5bd0471

Add resnet33ts weights, update resnext26ts baseline weights

4027412

Merge branch 'master' into attn_update

24720ab

BotNet models were still off, remove weights for bad configs. Add goo…

cf5ac28

…d SE-HaloNet33-TS weights.

rwightman mentioned this pull request Sep 14, 2021

Fixed default value in argument help message #612

Closed

rwightman merged commit a6e8598 into master Sep 14, 2021

guoriyue pushed a commit to guoriyue/pytorch-image-models that referenced this pull request May 24, 2024

Merge pull request huggingface#821 from rwightman/attn_update

1fbdf53

Update attention / self-attn based models from a series of experiments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Update attention / self-attn based models from a series of experiments #821

Update attention / self-attn based models from a series of experiments #821

Uh oh!

rwightman commented Aug 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Update attention / self-attn based models from a series of experiments #821

Update attention / self-attn based models from a series of experiments #821

Uh oh!

Conversation

rwightman commented Aug 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants