Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support vectorization in OpenGLCompute backend #6348

Merged

Conversation

OmarEmaraDev
Copy link
Contributor

@OmarEmaraDev OmarEmaraDev commented Oct 25, 2021

This patch adds support for vector load and store operations in the
OpenGLCompute backend.
First, a pass identifies the buffers whose loads and stores are all
dense, aligned, and have the same number of lanes. Such buffers are
declared with a vector base type and accessed accordingly. Loads and
stores that do not satisfy those criteria are implemented as gathers and
scatters from buffers whose base type is scalar.

Resolves #4976.
Partially resolves #1687.

This patch adds support for vector load and store operations.
First, a pass identifies the buffers whose loads and stores are all
dense, aligned, and have the same number of lanes. Such buffers are
declared with a vector base type and accessed accordingly. Loads and
stores that do not satisfy those criteria are implemented as gathers and
scatters from buffers whose base type is scalar.

Resolves halide#4976.
Partially resolves halide#1687.
@OmarEmaraDev
Copy link
Contributor Author

I noticed in my simple tests that loads and stores were never aligned. They always had a modulo of 1 and a remainder of 0. For a bounded vectorization of 4, I expected the modulo to be 4 as well, but this was not the case. Forcing alignment works, so I suspect there might be an issue with updating the alignments for loads and stores. Any idea if this is an issue or just an expected behavior?

// If this is a dense vector load and the buffer has a vector base type,
// then index the buffer using the base of the ramp divided by the number
// of lanes.
Expr ramp_base = strided_ramp_base(op->index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to check, but strided_ramp_base takes a stride parameter that defaults to one, so this actually is checking for a stride 1 ramp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsharletg That was my intention. Dense loads/stores are those that have a stride of 1. A stride higher than 1 means that the ramp is skipping elements and is not dense.

// of lanes.
Expr ramp_base = strided_ramp_base(op->index);
if (ramp_base.defined() && buffer_is_vector[op->name]) {
string index_id = print_expr(ramp_base / op->type.lanes());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be a check that the number of lanes in the buffer type is the same as the load? Maybe buffer_is_vector should hold a Type instead of a bool to facilitate this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsharletg The CheckAlignedDenseVectorLoadStore pass checks for the number of lanes in all loads and stores. So if the pass decides to declare the buffer as a vector, it is guaranteed that all of the loads/stores will have the same number of lanes. So a check for lanes here is redundant as far as I can see.

        if (lanes != -1 && op->value.type().lanes() != lanes) {
            are_all_dense = false;
            return;
        }

// of lanes.
Expr ramp_base = strided_ramp_base(op->index);
if (ramp_base.defined() && buffer_is_vector[op->name]) {
string index_id = print_expr(ramp_base / op->value.type().lanes());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as above.

public:
// True if all loads and stores from the buffer are dense, aligned, and all
// have the same number of lanes, false otherwise.
bool are_all_dense = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this answers my questions above. Buffers are only stored as vectors if all the loads and stores are aligned dense vectors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsharletg That's right.

@steven-johnson
Copy link
Contributor

Does this PR need more review / revision?

OmarEmaraDev added a commit to OmarEmaraDev/Halide that referenced this pull request Oct 30, 2021
The ternary operator in GLSL does not work with vector types. While the
mix function have overloads to boolean vectors, it is only supported in
version 4.5, so it is not exactly portable. To work around this, we use
the ternary operator on all elements of the vector type.

Necessary for halide#6348.
@steven-johnson
Copy link
Contributor

Thanks for the fix!

@steven-johnson steven-johnson merged commit c005b9f into halide:master Nov 4, 2021
steven-johnson pushed a commit that referenced this pull request Nov 4, 2021
The ternary operator in GLSL does not work with vector types. While the
mix function have overloads to boolean vectors, it is only supported in
version 4.5, so it is not exactly portable. To work around this, we use
the ternary operator on all elements of the vector type.

Necessary for #6348.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

openglcompute backend doesn't support vector loads and stores OpenGLCompute needs vectorization handling
3 participants