Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CombineLayer and others (Data.get_common_data) should use broadcast_matches=False #666

Closed
albertz opened this issue Sep 18, 2021 · 5 comments · Fixed by #864
Closed

CombineLayer and others (Data.get_common_data) should use broadcast_matches=False #666

albertz opened this issue Sep 18, 2021 · 5 comments · Fixed by #864
Assignees
Labels
potential-new-behavior Discussions about RETURNN behaviour

Comments

@albertz
Copy link
Member

albertz commented Sep 18, 2021

CombineLayer and others rely on Data.get_common_data.

Data.get_common_data uses broadcast_matches=True for the DimensionTag.is_equal opts. Such that you can match a tensor [B,1,T] with [B,4,T].

However, this relies on heuristics, and is not really needed. There is never a reason in RETURNN to create such broadcast dim [1] in the first place. So instead of [B,1,T] you would just have a tensor [B,T]. The broadcasting in RETURNN would automatically happen on non-existing dims. (Data.copy_compatible_to usually does that internally.) This would be much more consistent and reliable.

This needs a new behavior version (#508) because older configs rely on this.

@albertz albertz added the potential-new-behavior Discussions about RETURNN behaviour label Sep 18, 2021
@Zettelkasten
Copy link
Member

Zettelkasten commented Dec 15, 2021

I am running into this issue in several places now while updating my config to use dim tags. E.g. I have an "attention head" spatial dim tag, which I just want to set to have dimension 1 if I want to use single head attention. That breaks things all over the place for me, but luckily, because of out_shape, I can directly locate these errors.

Besides CombineLayer as you mentioned, also GatherLayer has this issue in _get_common_input_position_axes.
Currently this function compares dim tags directly, and I think we can anyway clean this up via Dim.get_all_dimension_tags.

The same also for DotLayer when inferring the variable axes automatically (#856).

@albertz
Copy link
Member Author

albertz commented Dec 16, 2021

As another solution, maybe we can disallow the broadcasting if this is an explicit dim tag by the user? To implement that, we would need to go through all places in RETURNN where a static dim tag is created, and add a new flag like auto_created=True or so. And then allow the broadcasting only if auto_created and dimension == 1.

This would not require a new behavior version. But not sure if really needed... Maybe new behavior version is cleaner and easier.

Edit Actually, this also was implemented in #1388. There exist the flag auto_generated now since a while (introduced for other reasons), and we needed this in a legacy setup where changing the behavior version would have lead to a number of changes, all not big, but that would have lead to different Sisyphus hashes, which we wanted to avoid...

@Zettelkasten
Copy link
Member

It shouldn't be too difficult for a user to update their config to not use broadcast dims, so I'd say let's just add a new behavior version. Dim already has a bunch of attributes already..

@albertz
Copy link
Member Author

albertz commented Dec 16, 2021

Yea ok. Are you doing this?

@Zettelkasten
Copy link
Member

Yea ok. Are you doing this?

Jup, sure.

@Zettelkasten Zettelkasten self-assigned this Dec 16, 2021
Zettelkasten added a commit that referenced this issue Dec 16, 2021
Zettelkasten added a commit that referenced this issue Dec 16, 2021
Zettelkasten added a commit that referenced this issue Dec 17, 2021
albertz pushed a commit that referenced this issue Dec 17, 2021
albertz added a commit that referenced this issue Sep 5, 2023
Auto-generated dims are those via the legacy n_out
or all other internal dims.
Any dim tags created explicitly by the user
should never be treated as broadcast dims.

See also #666.
albertz added a commit that referenced this issue Sep 5, 2023
Auto-generated dims are those via the legacy n_out
or all other internal dims.
Any dim tags created explicitly by the user
should never be treated as broadcast dims.

See also #666.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
potential-new-behavior Discussions about RETURNN behaviour
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants