Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2D/3D convolution/pooling (ConvLayer/PoolLayer/TransposedConvLayer) depends on order of axes #594

Closed
albertz opened this issue Aug 29, 2021 · 13 comments · Fixed by #792
Closed

Comments

@albertz
Copy link
Member

albertz commented Aug 29, 2021

We want to avoid anything which depends on the order of axes.

@albertz
Copy link
Member Author

albertz commented Sep 23, 2021

Some related discussion is in #597. E.g. there was the proposal of in_to_out_dims. E.g. for ConvLayer with 2D conv, it would have 3 entries: 2 for the two spatial dims, and then one for the linear feature transformation. For PoolLayer and others which do not do a linear transformation, it would contain only entries for the spatial dims.

@albertz
Copy link
Member Author

albertz commented Sep 23, 2021

The filter_size and related options would not be a list/tuple but instead a dict[DimensionTag, int] which also specifies the spatial axes it operates on.

@albertz
Copy link
Member Author

albertz commented Sep 23, 2021

One open question is the native format (shape, order of axes) of the tf.Variable. For that, the order of spatial axes matters.

Also, the native order of the input could change for various reasons, so we must not make it dependent on that.

@JackTemaki
Copy link
Collaborator

This would be a very welcome addition, I remember I had some cases where I had to play around a lot to make Returnn run the filters over the axes I wanted... Isn't it fine to have a logic that moves the axes in the optimal order? Just like the [T,B,F] transformation for the nativelstm? I am not sure what is happening right now, but I think a re-ordering based in the selected Dimension tags is possible, and the tf.Variable can be defined accordingly.

@albertz
Copy link
Member Author

albertz commented Sep 23, 2021

One core principle of RETURNN is that the order of the axes should never ever matter in any way. This core principles allows layers to reorder axes for efficiency (e.g. like RecLayer). And this is exactly what this issue here is about, because this is violated by ConvLayer for 2D/3D convolution, and similarly for PoolLayer, and some other similar layers.

The proposal here would also not have any order. It would just be like filter_size={dimtag1: 3, dimtag2: 5} or so.

But then, the order of axes for the underlying tf.Variable is ambiguous.

@albertz
Copy link
Member Author

albertz commented Sep 23, 2021

Isn't it fine to have a logic that moves the axes in the optimal order

We already do that. But this should not matter to the user.

I had to play around a lot to make Returnn run the filters over the axes I wanted

This is not so much this issue here. This issue here is about the order of spatial axes, which matters, although it should not.

I assume in your case, you had some mixup between spatial and feature dim. You do not want to reorder the axes. You simply need to redefine which axis is the feature dim (channel dim). You can use ReinterpretDataLayer with set_axes: {F: ...} for that.

@JackTemaki
Copy link
Collaborator

But then, the order of axes for the underlying tf.Variable is ambiguous.

Ah, you mean for the cases where there is no "more efficient" order ? like it is not important if h or w comes first for images? Well... then just a heuristic is fine or not? Like sort by largest filter size first, or sort by name of dimension tag or so...

@albertz
Copy link
Member Author

albertz commented Sep 23, 2021

But then, the order of axes for the underlying tf.Variable is ambiguous.

Ah, you mean for the cases where there is no "more efficient" order ? like it is not important if h or w comes first for images? Well... then just a heuristic is fine or not? Like sort by largest filter size first, or sort by name of dimension tag or so...

Yea, sth like that. I'm sure there are some possible solutions. Although I'm not so happy with all the potential solutions I thought about so far.

I don't really like heuristics. And what when the filter sizes are the same (which is not so uncommon)?

I also don't really know which order of the filter dims is optimal. Maybe sorting is actually good, and largest first makes it most efficient. Who knows. Probably depends also on the hardware. Or even CUDA or cuDNN version. E.g. it was true for long time that batch-feature-major was more efficient on GPU than batch-spatial-major, but I recently read that some of the more recent professional Nvidia GPUs probably make batch-spatial-major more efficient.

The current best way I can think of is to use heuristics anyway, or maybe just use the input spatial axis order, but then also store the order explicitly somewhere in the TF checkpoint. And when we load the checkpoint, make sure the order is fine, or maybe reorder accordingly.

The dim tag names are also not really good for identification. Esp the internal names (description actually...) are not really fixed and are being changed from time to time. See also #634. But when we make new dim tags more explicit by the user (#597), i.e. no internal names are used, then this should be less ambiguous.

However, when you import some model, and the dim tag names do not match for whatever reason, should the import fail then?

@JackTemaki
Copy link
Collaborator

However, when you import some model, and the dim tag names do not match for whatever reason, should the import fail then?

Oh, I see... yes then this is really not easy to choose...

@albertz
Copy link
Member Author

albertz commented Sep 23, 2021

One option is to make the order explicit in the config. So filter_size=OrderedDict(...).

@albertz
Copy link
Member Author

albertz commented Sep 25, 2021

Or, another option (from the comment here) is to have an argument in_spatial_dims (list/tuple of dim tags / axes) which defines the order. Then filter_size and all the other arguments can stay as they are.

@albertz
Copy link
Member Author

albertz commented Nov 29, 2021

in_spatial_dims is implemented now for ConvLayer and PoolLayer (#789).

However, it is not mandatory yet. Maybe this should be done as well, via new behavior version.

@albertz
Copy link
Member Author

albertz commented Nov 29, 2021

TransposedConvLayer also has the same problems. Fixed via #791.

@albertz albertz changed the title 2D/3D convolution/pooling (ConvLayer/PoolLayer) depends on order of axes 2D/3D convolution/pooling (ConvLayer/PoolLayer/TransposedConvLayer) depends on order of axes Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants