Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternator: support Projection in GSI and LSI #5036

Open
nyh opened this issue Sep 16, 2019 · 6 comments
Open

Alternator: support Projection in GSI and LSI #5036

nyh opened this issue Sep 16, 2019 · 6 comments
Labels
Milestone

Comments

@nyh
Copy link
Contributor

nyh commented Sep 16, 2019

When creating a GSI or LSI with CreateTable, the DynamoDB API provides a Projection parameter determining what gets copied to the materialized view: Just the keys, all the attributes, or the keys plus a few select attributes. We should support this parameter - currently we ignore it and always project all attributes.

Note that for LSI, there is a delicate issue: Even when only specific attributes are projected, DynamoDB allows to read also unprojected attributes (we verify this in test_lsi_get_not_projected_attribute). We can implement this either by projecting everything (as we do today) and not returning the unprojected columns when not specifically requested, or by doing a second query to get the unprojected columns. The second approach is probably closer to what DynamoDB does. Note that the second query is always on the same shard as the first one (same partition key).

We have existing xfailing tests for this issue:

  1. In test_gsi.py: test_gsi_projection_keys_only, test_gsi_projection_include, test_gsi_missing_projection_type, test_gsi_query_select_2, test_gsi_projection_include_keyattributes, test_gsi_projection_include_otherkey, test_gsi_projection_error_missing_nonkeyattributes, test_gsi_projection_error_superflous_nonkeyattributes, test_gsi_projection_error_duplicate, test_gsi_projection_error_duplicate, test_gsi_bad_projection_type,
  2. In test_lsi.py: test_lsi_get_all_projected_attributes, test_lsi_query_select.

And as mentioned above, test_lsi.py::test_lsi_get_not_projected_attribute needs to keep working and not get broken by the solution for LSIs.

@nyh nyh added the area/alternator Alternator related Issues label Sep 16, 2019
@tzach
Copy link
Contributor

tzach commented Sep 16, 2019

or by doing a second query to get the unprojected columns.

AFAIU, this is what DynamoDB is doing
" However, to retrieve any additional attributes, DynamoDB must perform additional read operations against the Thread table. These extra reads are known as fetches, and they can increase the total amount of provisioned throughput required for a query."
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LSI.html#LSI.Projections

@nyh
Copy link
Contributor Author

nyh commented Jul 11, 2020

It will be significantly easier to implement KEYS_ONLY projection, because it just means not to include the ":attrs" in the view - no need to slice it. So we should start with implementing the KEYS_ONLY option.

psarna pushed a commit that referenced this issue Nov 29, 2021
We already have tests for the behavior of the "Select" parameter when
querying a base table, but this patch adds additional tests for its
behavior when querying a GSI or a LSI. There are some differences:
Select=ALL_PROJECTED_ATTRIBUTES is not allowed for base tables, but is
allowed - and in fact is the default - for GSI and LSI. Also, GSI may
not allow ALL_ATTRIBUTES (which is the default for base tables) if
only a subset of the attributes were projected.

The new tests xfail because the Select and Projection features have
not yet been implemented in Alternator. They pass in DynamoDB.
After this patch we have (hopefully) complete test coverage of the
Select feature, which will be helpful when we start implementing it.

Refs #5058 (Select)
Refs #5036 (Projection)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211125100443.746917-1-nyh@scylladb.com>
denesb pushed a commit that referenced this issue Jan 5, 2022
We already have multiple tests for the unimplemented "Projection" feature
of GSI and LSI (see issue #5036). This patch adds seven more test cases,
focusing on various types of errors conditions (e.g., trying to project
the same attribute twice), esoteric corner cases (it's fine to list a key in
NonKeyAttributes!), and corner cases that I expect we will have in our
implementation (e.g., a projected attribute may either be a real Scylla
column or just an element in a map column).

All new tests pass on DynamoDB and fail on Alternator (due to #5036), so
marked with "xfail".

Refs #5036.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211228193748.688060-1-nyh@scylladb.com>
psarna pushed a commit that referenced this issue Apr 11, 2022
This patch implements the previously-unimplemented Select option of the
Query and Scan operators.

The most interesting use case of this option is Select=COUNT which means
we should only count the items, without returning their actual content.
But there are actually four different Select settings: COUNT,
ALL_ATTRIBUTES, SPECIFIC_ATTRIBUTES, and ALL_PROJECTED_ATTRIBUTES.

Five previously-failing tests now pass, and their xfail mark is removed:

 *  test_query.py::test_query_select
 *  test_scan.py::test_scan_select
 *  test_query_filter.py::test_query_filter_and_select_count
 *  test_filter_expression.py::test_filter_expression_and_select_count
 *  test_gsi.py::test_gsi_query_select_1

These tests cover many different cases of successes and errors, including
combination of Select and other options. E.g., combining Select=COUNT
with filtering requires us to get the parts of the items needed for the
filtering function - even if we don't need to return them to the user
at the end.

Because we do not yet support GSI/LSI projection (issue #5036), the
support for ALL_PROJECTED_ATTRIBUTES is a bit simpler than it will need
to be in the future, but we can only finish that after #5036 is done.

Fixes #5058.

The most intrusive part of this patch is a change from attrs_to_get -
a map of top-level attributes that a read needs to fetch - to an
optional<attrs_to_get>. This change is needed because we also need
to support the case that we want to read no attributes (Select=COUNT),
and attrs_to_get.empty() used to mean that we want to read *all*
attributes, not no attributes. After this patch, an unset
optional<attrs_to_get> means read *all* attributes, a set but empty
attrs_to_get means read *no* attributes, and a set and non-empty
attrs_to_get means read those specific attributes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220405113700.9768-2-nyh@scylladb.com>
alexander-tur added a commit to soft-stech/scylladb that referenced this issue Sep 29, 2022
…g the Projection field. We do not yet support all the settings Projection (see scylladb#5036), but the default which we support is ALL, and DescribeTable should return that in its description.

Fixes scylladb#11470
alexander-tur added a commit to soft-stech/scylladb that referenced this issue Oct 5, 2022
…g the Projection field. We do not yet support all the settings Projection (see scylladb#5036), but the default which we support is ALL, and DescribeTable should return that in its description.

Fixes scylladb#11470
alexander-tur added a commit to soft-stech/scylladb that referenced this issue Oct 9, 2022
…g the Projection field. We do not yet support all the settings Projection (see scylladb#5036), but the default which we support is ALL, and DescribeTable should return that in its description.

Fixes scylladb#11470
nyh pushed a commit that referenced this issue Oct 19, 2022
… describes GSIs and LSIs.

The return from DescribeTable which describes GSIs and LSIs is missing
the Projection field. We do not yet support all the settings Projection
(see #5036), but the default which we support is ALL, and DescribeTable
should return that in its description.

Fixes #11470

Closes #11693
nyh pushed a commit that referenced this issue Nov 6, 2022
… describes GSIs and LSIs.

The return from DescribeTable which describes GSIs and LSIs is missing
the Projection field. We do not yet support all the settings Projection
(see #5036), but the default which we support is ALL, and DescribeTable
should return that in its description.

Fixes #11470

Closes #11693

(cherry picked from commit 636e14c)
nyh pushed a commit that referenced this issue Nov 6, 2022
… describes GSIs and LSIs.

The return from DescribeTable which describes GSIs and LSIs is missing
the Projection field. We do not yet support all the settings Projection
(see #5036), but the default which we support is ALL, and DescribeTable
should return that in its description.

Fixes #11470

Closes #11693

(cherry picked from commit 636e14c)
denesb pushed a commit that referenced this issue Nov 7, 2022
… describes GSIs and LSIs.

The return from DescribeTable which describes GSIs and LSIs is missing
the Projection field. We do not yet support all the settings Projection
(see #5036), but the default which we support is ALL, and DescribeTable
should return that in its description.

Fixes #11470

Closes #11693

(cherry picked from commit 636e14c)
denesb pushed a commit that referenced this issue Nov 7, 2022
… describes GSIs and LSIs.

The return from DescribeTable which describes GSIs and LSIs is missing
the Projection field. We do not yet support all the settings Projection
(see #5036), but the default which we support is ALL, and DescribeTable
should return that in its description.

Fixes #11470

Closes #11693

(cherry picked from commit 636e14c)
denesb pushed a commit that referenced this issue Nov 7, 2022
… describes GSIs and LSIs.

The return from DescribeTable which describes GSIs and LSIs is missing
the Projection field. We do not yet support all the settings Projection
(see #5036), but the default which we support is ALL, and DescribeTable
should return that in its description.

Fixes #11470

Closes #11693

(cherry picked from commit 636e14c)
@doublex
Copy link

doublex commented Jul 18, 2024

@nyh
As a temporary workaround:
Is it possible to use an "ALTER MATERIALIZED VIEW" statement to simulate KEYS_ONLY?

@nyh
Copy link
Contributor Author

nyh commented Jul 18, 2024

@nyh As a temporary workaround: Is it possible to use an "ALTER MATERIALIZED VIEW" statement to simulate KEYS_ONLY?

CQL's ALTER MATERIALIZED VIEW cannot change the SELECTion of the view, so you cannot use it to get rid of the :attrs column in the view (which is what you'd want to do).
One hack that may still be possible without any code changes is to use CQL to DROP MATERIALIZED VIEW the view created by Alternator, and create a new one with CREATE MATERIALIZED VIEW whose SELECT doesn't include :attrs. This may work, but I didn't test it actually does. One of the questions is whether the code that reads from a view is ok with the fact there is no :attrs column in that table.

@doublex
Copy link

doublex commented Jul 19, 2024

@nyh
Thank you for your assistance.
ScyllaDB does not allow to create the alternator materialized-view manually.
Alternator could create the appropriate materialized view.

E.g.:

base-table "data":
{
    primary_key: binary,
    si_hash_key: binary,
    si_range_key: binary,
    payload: binary,
}
CREATE MATERIALIZED VIEW alternator_data."data:index" AS
    SELECT primary_key, si_hash_key, si_range_key
    FROM alternator_data.data
    WHERE si_hash_key IS NOT NULL AND si_range_key IS NOT NULL
    PRIMARY KEY (si_hash_key, si_range_key, primary_key)
    WITH CLUSTERING ORDER BY (si_range_key ASC, primary_key ASC);

Error message, docs: https://opensource.docs.scylladb.com/stable/cql/mv.html
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot include more than one non-primary key column 'si_range_key' in materialized view primary key"

The create materialzed viewstatement was taken from DESCRIBE MATERIALIZED VIEW alternator_data."data:index";

@mykaul
Copy link
Contributor

mykaul commented Jul 23, 2024

It will be significantly easier to implement KEYS_ONLY projection, because it just means not to include the ":attrs" in the view - no need to slice it. So we should start with implementing the KEYS_ONLY option.

Agreed, looks like an easy win with high 'profit' - but I don't know how common is the use of KEYS_ONLY.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants