Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv Index.Walk function matches the foreign key by prefix only #20096

Closed
stuartcarnie opened this issue Nov 19, 2020 · 3 comments · Fixed by #20097
Closed

kv Index.Walk function matches the foreign key by prefix only #20096

stuartcarnie opened this issue Nov 19, 2020 · 3 comments · Fixed by #20097
Assignees
Labels
area/2.x OSS 2.0 related issues and PRs kind/bug

Comments

@stuartcarnie
Copy link
Contributor

stuartcarnie commented Nov 19, 2020

What

When the kv store searches the index, it utilises a forward cursor matching by prefix:

influxdb/kv/index.go

Lines 192 to 193 in 15b9531

cursor, err := indexBucket.ForwardCursor(foreignKey,
WithCursorPrefix(foreignKey))

The cursor will return all keys that have a matching prefix, in the following loop, including those keys which are longer:

influxdb/kv/index.go

Lines 206 to 208 in 15b9531

for ik, pk := indexCursor.Next(); ik != nil; ik, pk = indexCursor.Next() {
keys = append(keys, pk)
}

For example, if the foreignKey is hello and the index contains the keys hello, hello-world and hello-there, the loop will gather them all.

In most cases, the index is typically the result of combining 16-byte IDs, so this is not a problem. However, for dbrp, it creates an additional orgID + database foreign key:

influxdb/dbrp/service.go

Lines 70 to 76 in b274e15

byOrgAndDatabase: kv.NewIndex(kv.NewIndexMapping(bucket, indexBucket, func(v []byte) ([]byte, error) {
var dbrp influxdb.DBRPMappingV2
if err := json.Unmarshal(v, &dbrp); err != nil {
return nil, err
}
return indexForeignKey(dbrp), nil
}), kv.WithIndexReadPathEnabled),

Creation of DBRPs fail:

  1. Create bucket1

  2. Create a dbrp mapping for bucket1 with db jenkins-aws, rp never:

    curl -H 'Content-type: application/json' --header "Authorization: Token $INFLUX_TOKEN" -XPOST "$INFLUX_HOST/api/v2/dbrps" --data-binary '{"bucket_id": "<BUCKET1_ID>","database": "jenkins-aws","default": true, "organization_id": "<org-id>", "retention_policy": "never"}'
  3. Create bucket2

  4. Attempt to create a dbrp mapping for bucket2 with db jenkins, rp never:

    curl -H 'Content-type: application/json' --header "Authorization: Token $INFLUX_TOKEN" -XPOST "$INFLUX_HOST/api/v2/dbrps" --data-binary '{"bucket_id": "<BUCKET2_ID>","database": "jenkins","default": true, "organization_id": "<org-id>", "retention_policy": "never"}'

    Fails with the following error:

    another DBRP mapping with same orgID, db, and rp exists
    
@stuartcarnie stuartcarnie added area/2.x OSS 2.0 related issues and PRs kind/bug labels Nov 19, 2020
stuartcarnie added a commit that referenced this issue Nov 19, 2020
This commit breaks other tests, so it is unclear if the
`Influx.Walk` function is intended to match by prefix.

If so, it would suggest code documentation clarifying
`Index.Walk` is a prefix match. Additionally, a new `Index.WalkExact`
function that performs an exact match by the `foreignKey` argument.

Closes #20096
@stuartcarnie stuartcarnie self-assigned this Nov 19, 2020
@GeorgeMac
Copy link
Contributor

Hey @stuartcarnie 👋

Sorry, definitely a gap in the documentation here. This index is currently a non-unique many to one secondary index.

resource >- association
dashboard >- org

And so the index is structured like so:

indexprefix/foreign_key/primary_key -> primary_key

bucketsbyorgv1/org_foo/bucket_one -> bucket_one
bucketsbyorgv1/org_foo/bucket_two -> bucket_two
bucketsbyorgv1/org_foo/bucket_three -> bucket_three
bucketsbyorgv1/org_foo/bucket_four -> bucket_four

Such that we can have multiple entries per foreign key. Meaning we could retrieve the many resources for a particular foreign key in a single range / cursor with a prefix lookup.

As this was originally designed around creating a URM by User index.

The Walk function visits each key / value pair in the source bucket. Not the index bucket. You are effectively walking the source, based on entries in the index:

influxdb/kv/index.go

Lines 218 to 229 in 15b9531

values, err := sourceBucket.GetBatch(keys...)
if err != nil {
return err
}
for i, value := range values {
if value != nil {
if cont, err := visit(keys[i], value); !cont || err != nil {
return err
}
}
}

I have thought about adding support for a secondary index with a uniqueness constraint. In that scenario walk should find one item. Or perhaps a better name might be Find or similar. Infact it could just be another type which also consumes the IndexMapping type. The mapping type describe a source and index bucket name and a function for retrieving the associated index foreign key, for sure when backfilling and unpopulated index.

@stuartcarnie
Copy link
Contributor Author

stuartcarnie commented Nov 19, 2020

@GeorgeMac thanks for the reply! I do understand the many to one feature. There is a bug or unexpected behaviour, as described in the issue, that because Walk matches the foreign key using a prefix match, it will find any keys with that prefix.

So, back to the dbrp example, if a user adds a mapping to a database named jenkins-aws, the indexes foreign key would be <OrgID>jenkins-aws. The next attempt to add a mapping to a database named jenkins fails as the lookup (a prefix match) for jenkins finds <OrgID>jenkins-aws.

It seems it is expected behaviour as a number of other tests fail when I fix that issue. It seems we might need a WalkExact API to match the foreign key exactly.

@GeorgeMac
Copy link
Contributor

GeorgeMac commented Nov 20, 2020

ooooooh I seee 😂 now I am with you. Took me long enough. I added a comment to your draft PR, should be fixable in Walk.

stuartcarnie added a commit that referenced this issue Nov 22, 2020
This commit modifies the behaviour of the indexWalk function to ensure
it parses the key parts and matches the foreign key exactly.

Closes #20096
stuartcarnie added a commit that referenced this issue Nov 22, 2020
This commit adds a new index and migration to the DBRP service for
retrieving all database and retention policy mappings for a single
organization.

This change was required to resolve an invalid assumption of the DBRP
service, which relied on a prefix match of the byOrgAndDatabase kv.Index
when performing search operations by organization ID only.

Closes #20096
stuartcarnie added a commit that referenced this issue Nov 23, 2020
This commit modifies the behaviour of the indexWalk function to ensure
it parses the key parts and matches the foreign key exactly.

Closes #20096
stuartcarnie added a commit that referenced this issue Nov 23, 2020
This commit adds a new index and migration to the DBRP service for
retrieving all database and retention policy mappings for a single
organization.

This change was required to resolve an invalid assumption of the DBRP
service, which relied on a prefix match of the byOrgAndDatabase kv.Index
when performing search operations by organization ID only.

Closes #20096
stuartcarnie added a commit that referenced this issue Nov 23, 2020
This commit modifies the behaviour of the indexWalk function to ensure
it parses the key parts and matches the foreign key exactly.

Closes #20096
stuartcarnie added a commit that referenced this issue Nov 23, 2020
This commit adds a new index and migration to the DBRP service for
retrieving all database and retention policy mappings for a single
organization.

This change was required to resolve an invalid assumption of the DBRP
service, which relied on a prefix match of the byOrgAndDatabase kv.Index
when performing search operations by organization ID only.

Closes #20096
danxmoran pushed a commit that referenced this issue Nov 24, 2020
This commit modifies the behaviour of the indexWalk function to ensure
it parses the key parts and matches the foreign key exactly.

Closes #20096
danxmoran pushed a commit that referenced this issue Nov 24, 2020
This commit adds a new index and migration to the DBRP service for
retrieving all database and retention policy mappings for a single
organization.

This change was required to resolve an invalid assumption of the DBRP
service, which relied on a prefix match of the byOrgAndDatabase kv.Index
when performing search operations by organization ID only.

Closes #20096
danxmoran pushed a commit that referenced this issue Nov 24, 2020
This commit modifies the behaviour of the indexWalk function to ensure
it parses the key parts and matches the foreign key exactly.

Closes #20096
danxmoran pushed a commit that referenced this issue Nov 24, 2020
This commit adds a new index and migration to the DBRP service for
retrieving all database and retention policy mappings for a single
organization.

This change was required to resolve an invalid assumption of the DBRP
service, which relied on a prefix match of the byOrgAndDatabase kv.Index
when performing search operations by organization ID only.

Closes #20096
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/2.x OSS 2.0 related issues and PRs kind/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants