Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output: Add support for WEIGHT_FLOAT32 with text vector #1448

Merged
merged 6 commits into from
Nov 10, 2022

Conversation

HashidaTKS
Copy link
Contributor

@HashidaTKS HashidaTKS commented Nov 9, 2022

Groonga regarded weight as uint32_t even if its actual type is float32 in grn_output_vector.

So even if we specified WEIGHT_FLOAT32 like the query below, the output weight was not float32 but uint32_t.

How to reproduce:

table_create Memos TABLE_HASH_KEY ShortText

column_create Memos tags COLUMN_VECTOR|WITH_WEIGHT|WEIGHT_FLOAT32 ShortText

load --table Memos
[
{
  "_key": "Groonga is fast",
  "tags": {
    "groonga": 2,
    "full text search": 1
  }
}
]

select Memos

In the query above,

        {
          "groonga": 2,
          "full text search": 1
        }

is not expected, the expected result is like below.

        {
          "groonga": 2.8,
          "full text search": 1.2
        }

@HashidaTKS
Copy link
Contributor Author

@kou @komainu8

Would you review this when you have time?

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move tests to test/command/suite/select/columns/flags/vector/ because they are tests for WITH_WEIGHT and WEIGHT_FLOAT32 flags with COLUMN_VECTOR flag?

lib/output.c Outdated
&_value, &weight, &domain);
if (is_weight_float32) {
length = grn_vector_get_element_float(ctx, vector, i,
&_value, &weight_float, &domain);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
&_value, &weight_float, &domain);
&_value, &weight_float, &domain);

@HashidaTKS
Copy link
Contributor Author

@kou

Thanks, I have addressed your comments.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to include cast in test path? Is cast important for this? Can we reproduce this with --columns[text_tags].type Tags? If so, we should use it. We can add a test for --columns[text_tags].type ShortText in a separated pull request.

How about

  • test/command/suite/select/columns/flags/vector/with_weight_weight_float32.test
  • test/command/suite/select/columns/flags/vector/with_weight.test
    ?

@HashidaTKS
Copy link
Contributor Author

Can we reproduce this with --columns[text_tags].type Tags?

I tried it but didn't reproduce this.
It is because Groonga passes grn_output_uvector in that case (although I don't understand why...) and grn_output_uvector already supports WEIGHT_FLOAT32.

table_create Tags TABLE_PAT_KEY ShortText
[[0,0.0,0.0],true]
table_create Memos TABLE_HASH_KEY ShortText
[[0,0.0,0.0],true]
column_create Memos tags COLUMN_VECTOR|WITH_WEIGHT|WEIGHT_FLOAT32 Tags
[[0,0.0,0.0],true]
load --table Memos
[
{
  "_key": "Groonga is fast",
  "tags": {
    "groonga": 2.8,
    "full text search": 1.2
  }
}
]
[[0,0.0,0.0],1]
select Memos --columns[text_tags].stage output              --columns[text_tags].type Tags              --columns[text_tags].flags COLUMN_VECTOR|WITH_WEIGHT|WEIGHT_FLOAT32              --columns[text_tags].value 'tags'              --output_columns _key,tags,text_tags
[
  [
    0,
    0.0,
    0.0
  ],
  [
    [
      [
        1
      ],
      [
        [
          "_key",
          "ShortText"
        ],
        [
          "tags",
          "Tags"
        ],
        [
          "text_tags",
          "Tags"
        ]
      ],
      [
        "Groonga is fast",
        {
          "groonga": 2.8,
          "full text search": 1.2
        },
        {
          "groonga": 2.8,
          "full text search": 1.2
        }
      ]
    ]
  ]
]

Do you know why Groonga passes grn_output_uvector in that case?

@kou
Copy link
Member

kou commented Nov 10, 2022

Because reference vector uses grn_uvector not grn_vector.

@kou
Copy link
Member

kou commented Nov 10, 2022

Then, how about the following?

table_create Memos TABLE_HASH_KEY ShortText
column_create Memos tags COLUMN_VECTOR|WITH_WEIGHT|WEIGHT_FLOAT32 ShortText
load --table Memos
[
{
  "_key": "Groonga is fast",
  "tags": {
    "groonga": 2.8,
    "full text search": 1.2
  }
}
]
select Memos \
  --columns[copied_tags].stage output \
  --columns[copied_tags].type ShortText \
  --columns[copied_tags].flags COLUMN_VECTOR|WITH_WEIGHT|WEIGHT_FLOAT32 \
  --columns[copied_tags].value 'tags' \
  --output_columns _key,tags,copied_tags

@HashidaTKS
Copy link
Contributor Author

Thanks.
I took some trial and error, and the query to reproduce ended up like below.

table_create Memos TABLE_HASH_KEY ShortText

column_create Memos tags COLUMN_VECTOR|WITH_WEIGHT|WEIGHT_FLOAT32 ShortText

load --table Memos
[
{
  "_key": "Groonga is fast",
  "tags": {
    "groonga": 2.8,
    "full text search": 1.2
  }
}
]

select Memos

@HashidaTKS HashidaTKS changed the title output: Add support for WEIGHT_FLOAT32 when data type is vector weight: Add support for WEIGHT_FLOAT32 when data type is vector Nov 10, 2022
@HashidaTKS
Copy link
Contributor Author

@kou

Please re-review this.

@kou kou changed the title weight: Add support for WEIGHT_FLOAT32 when data type is vector output: Add support for WEIGHT_FLOAT32 with text vector Nov 10, 2022
@kou
Copy link
Member

kou commented Nov 10, 2022

Could you update the description before we merge this?

@HashidaTKS
Copy link
Contributor Author

Could you update the description before we merge this?

Oh, sorry, updated.

@kou kou merged commit 799d533 into master Nov 10, 2022
@kou kou deleted the cast_weight_float32 branch November 10, 2022 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants