Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE REQUEST - Ability to apply stopwords only for specified fields in an index #341

Open
vishnu-uc opened this issue May 11, 2020 · 2 comments

Comments

@vishnu-uc
Copy link

Feature Request
Should have the ability to specify stopwords for particular fields in an index

Use case:
Consider an index containing fields company_name and company_description
It is ideal to apply stopwords for company_desciption, but not for company_name

@sanikolaev
Copy link
Collaborator

@vishnu-uc

Will it work in your case if you:

  • index everything with stopwords disabled
  • parse your query to figure out what of the words should be considered stopwords
  • put them after operator MAYBE for proper ranking

?

E.g.:

mysql> create table t(name text, description text);
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t values(0, 'The boring company', 'American infrastructure and tunnel construction services company founded by Elon Musk in December 2016');
Query OK, 1 row affected (0.00 sec)
mysql> insert into t values(0, 'Boring company', 'American infrastructure and tunnel construction services company founded by Elon Musk in December 2016');
Query OK, 1 row affected (0.00 sec)

mysql> select * from t where match('boring company MAYBE the');
+---------------------+--------------------+--------------------------------------------------------------------------------------------------------+
| id                  | name               | description                                                                                            |
+---------------------+--------------------+--------------------------------------------------------------------------------------------------------+
| 1513757639123664897 | The boring company | American infrastructure and tunnel construction services company founded by Elon Musk in December 2016 |
| 1513757639123664898 | Boring company     | American infrastructure and tunnel construction services company founded by Elon Musk in December 2016 |
+---------------------+--------------------+--------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

For multiple stopwods - boring company MAYBE the MAYBE to MAYBE for etc.

@vishnu-uc
Copy link
Author

We have 100s of indexes, totalling upto few 100s of GB of data, if we disable stopwords the index size will grow, we try to reduce the index size as much as possible.
And also we make complex queries with multiple conditions like
example

SELECT name, description FROM my_index WHERE MATCH('((@attr1("the mayor")) (@attr2(was SENTENCE very SENTENCE sick)) | (@attr3(due SENTENCE to) | @attr3(coronavirus)))');

so parsing the query doesn't seem very intuitive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants