Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for full text searches #3317

Merged
merged 25 commits into from
Jul 31, 2022

Conversation

jsprw
Copy link
Contributor

@jsprw jsprw commented Jul 19, 2022

Add a $fulltext operator that can be used in queries on SQL databases.

Based on #621 by @thekevinbrown. This version has also added support for creating required indexes with the decorator @Index({ type: 'fulltext' }) and full text searching in MongoDB. Documentation could possibly be improved with more clear instructions.

Closes: #620

@codecov-commenter
Copy link

codecov-commenter commented Jul 19, 2022

Codecov Report

Merging #3317 (979b5ec) into master (eff463c) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master     #3317   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          203       205    +2     
  Lines        12456     12534   +78     
  Branches      2872      2889   +17     
=========================================
+ Hits         12456     12534   +78     
Impacted Files Coverage Δ
packages/core/src/typings.ts 100.00% <ø> (ø)
packages/core/src/enums.ts 100.00% <100.00%> (ø)
packages/core/src/platforms/Platform.ts 100.00% <100.00%> (ø)
packages/core/src/utils/QueryHelper.ts 100.00% <100.00%> (ø)
packages/core/src/utils/Utils.ts 100.00% <100.00%> (ø)
packages/knex/src/query/QueryBuilderHelper.ts 100.00% <100.00%> (ø)
packages/knex/src/schema/SchemaComparator.ts 100.00% <100.00%> (ø)
packages/knex/src/schema/SchemaGenerator.ts 100.00% <100.00%> (ø)
packages/mariadb/src/MariaDbPlatform.ts 100.00% <100.00%> (ø)
packages/mongodb/src/MongoDriver.ts 100.00% <100.00%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eff463c...979b5ec. Read the comment docs.

@lgtm-com
Copy link

lgtm-com bot commented Jul 19, 2022

This pull request introduces 1 alert when merging 5f8033e into 2380f0a - view on LGTM.com

new alerts:

  • 1 for Unused variable, import, function or class

Copy link
Member

@B4nan B4nan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, thanks a lot! Did a quick first pass and left some comments. Would be definitely good to describe this more in the docs, otherwise it could get lost.

packages/core/src/utils/Utils.ts Outdated Show resolved Hide resolved
packages/core/src/utils/Utils.ts Outdated Show resolved Hide resolved
packages/knex/src/query/QueryBuilderHelper.ts Outdated Show resolved Hide resolved
packages/knex/src/query/QueryBuilderHelper.ts Show resolved Hide resolved
packages/knex/src/schema/SchemaComparator.ts Show resolved Hide resolved
packages/knex/src/schema/SchemaGenerator.ts Outdated Show resolved Hide resolved
docs/docs/query-conditions.md Outdated Show resolved Hide resolved
packages/mongodb/src/MongoDriver.ts Outdated Show resolved Hide resolved
packages/mongodb/src/MongoPlatform.ts Outdated Show resolved Hide resolved
jsprw and others added 5 commits July 20, 2022 09:46
Co-authored-by: Martin Adámek <banan23@gmail.com>
Co-authored-by: Martin Adámek <banan23@gmail.com>
Co-authored-by: Martin Adámek <banan23@gmail.com>
Co-authored-by: Martin Adámek <banan23@gmail.com>
Co-authored-by: Martin Adámek <banan23@gmail.com>
@lgtm-com
Copy link

lgtm-com bot commented Jul 20, 2022

This pull request introduces 1 alert when merging f6aaf2d into 2380f0a - view on LGTM.com

new alerts:

  • 1 for Unused variable, import, function or class

@B4nan
Copy link
Member

B4nan commented Jul 20, 2022

It would be also great to catch up with the code coverage. But I am open for merging it without full branch coverage, will be happy to fill that gap myself afterwards.

@jsprw
Copy link
Contributor Author

jsprw commented Jul 20, 2022

What to do in the future:

Do I forget any?

@jsprw
Copy link
Contributor Author

jsprw commented Jul 20, 2022

It would be also great to catch up with the code coverage. But I am open for merging it without full branch coverage, will be happy to fill that gap myself afterwards.

I have covered some more. Now we need two more tests:

  • The unused Platform functions.
  • And executing a full text search from a PostgreSQL column with type 'tsvector'.

For the second I tried to switch the column type to 'tsvector' by doing meta.get('Book2').properties.title.columnTypes[0] = 'tsvector' running generator.getUpdateSchemaSQL() + execute() but that didn't work. I ended up with the following: https://github.com/jsprw/mikro-orm-full-text-operators/blob/add-full-text-operators/tests/features/schema-generator/SchemaGenerator.postgres.test.ts#L332. Could you fix that test and then do a search (as https://github.com/jsprw/mikro-orm-full-text-operators/blob/add-full-text-operators/tests/EntityManager.postgre.test.ts#L376)?

@jsprw jsprw requested a review from B4nan July 20, 2022 19:00
B4nan added a commit that referenced this pull request Jul 26, 2022
@B4nan
Copy link
Member

B4nan commented Jul 26, 2022

For the second I tried to switch the column type to 'tsvector' by doing meta.get('Book2').properties.title.columnTypes[0] = 'tsvector' running generator.getUpdateSchemaSQL() + execute() but that didn't work

Oh right, I see what's happening, unknown types will get mapped to UnknownType that defaults to string basically, and ignores provided columnType, good catch! Fixed here.

Btw I had to remove the default value (empty string), otherwise the query would fail. Maybe that's something to handle as well, not sure what the semantics are, never user this feature myself.

@jsprw
Copy link
Contributor Author

jsprw commented Jul 26, 2022

For the second I tried to switch the column type to 'tsvector' by doing meta.get('Book2').properties.title.columnTypes[0] = 'tsvector' running generator.getUpdateSchemaSQL() + execute() but that didn't work

Oh right, I see what's happening, unknown types will get mapped to UnknownType that defaults to string basically, and ignores provided columnType, good catch! Fixed here.

Nice, I will merge the changes and rewrite the test. Thanks!

Btw I had to remove the default value (empty string), otherwise the query would fail. Maybe that's something to handle as well, not sure what the semantics are, never user this feature myself.

You mean the column default? I will test it now to see what you mean. I added tsvector columns with a default of empty string and it worked fine?

Edit: I see what you mean, if the column is changed to type tsvector it doesn't understand the existing default value. How could we hook into this? It has to update the default value (from "" to '""') before changing the column type to tsvector. Also, how could we handle that besides adding a note to the doc? This just comes from the column type in the @Property decorator. Add a custom type only for postgres which we export from @mikro-orm/postgresql?

Edit: I still have no clue how we could check the default type. It only occurs when you are changing a column type to tsvector. I have added a custom type for performant full text searches exported as FullTextType from @mikro-orm/postgresql. Also the code coverage is 100% again.

Copy link
Member

@B4nan B4nan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it another look, last piece I havent really reviewed is the docs.

Generally looking pretty much ready to me, just few nits.

packages/postgresql/src/types/FullTextType.ts Outdated Show resolved Hide resolved
packages/core/src/platforms/Platform.ts Outdated Show resolved Hide resolved
packages/core/src/utils/Utils.ts Outdated Show resolved Hide resolved
packages/mariadb/src/MariaDbPlatform.ts Outdated Show resolved Hide resolved
packages/mongodb/src/MongoDriver.ts Outdated Show resolved Hide resolved
packages/mongodb/src/MongoDriver.ts Outdated Show resolved Hide resolved
@B4nan
Copy link
Member

B4nan commented Jul 31, 2022

I'd like to include this in 5.3, ideally I would ship that today evening or something like that. Let me know if you have time to finish this, I'll be happy to merge it and do the final steps myself if not, in the end its pretty much done.

jsprw and others added 2 commits July 31, 2022 11:04
Co-authored-by: Martin Adámek <banan23@gmail.com>
Co-authored-by: Martin Adámek <banan23@gmail.com>
@jsprw
Copy link
Contributor Author

jsprw commented Jul 31, 2022

I have applied your suggestions, I will not be home until later tonight. So if you could apply the last suggestions, that’d be great (especially if you ship it tonight). Else i can implement them around 22 or 23h.

@B4nan B4nan merged commit 8b8f140 into mikro-orm:master Jul 31, 2022
@jagabs
Copy link

jagabs commented Oct 16, 2022

anyone experiencing this in postgres?

Screen Shot 2022-10-16 at 11 28 26 AM

for migration
this.addSql('create index "tbl_employee_employee_name_index" on "public"."tbl_employee" using gin(to_tsvector(\'simple\', "employee_name"));');

entity
@Index({ type: "fulltext" }) @Property({ columnType: "text", type: FullTextType }) public employeeName: string;

i tried using this
@Index({ type: 'fulltext' }) @Property({ type: FullTextType, onUpdate: (book) => book.title }) searchableTitle!: string;

but im not sure what values to add in db migration.

Thanks

@B4nan
Copy link
Member

B4nan commented Oct 17, 2022

@Property({ columnType: "text", type: FullTextType })

This seems wrong, I think you should not override the column type, it needs to be tsvector. Note that you say you tried to drop that, but your example code is for a different property (employeeName vs searchableTitle).

but im not sure what values to add in db migration.

I dont understand this, you are not using schema generator?

@jsprw
Copy link
Contributor Author

jsprw commented Oct 17, 2022

The tabs on the documentation site don't show correctly (https://mikro-orm.io/docs/query-conditions#postgresql). I understand the confusion.

It should show two possible ways to do FTS on Postgres.

  1. You can use a column to store full text searching metadata. Below the title and description property is duplicated to the searchable column for the fts metadata. This works the best if you have multiple properties that should be searchable by one fts query. Note that you have to use the FullTextType (so, columntype = tsvector) for the searchable column.
@Property()
  title!: string;

@Property()
  description!: string;

  @Index({ type: 'fulltext' })
  @Property({ type: FullTextType, onUpdate: (book) => book.title + ' ' + book.description })
  searchable!: string;
  1. You could also use a index on an existing column. Then you can just add @Index with type fulltext.
 @Index({ type: 'fulltext' })
  @Property()
  title!: string;

As noted above, you use FullTextType and try to override the type. You should just use:

@Index({ type: "fulltext" }) 
@Property({ type: "text" })
public employeeName: string;

@Jedliu
Copy link
Contributor

Jedliu commented Nov 27, 2022

Thank you for all your efforts. I'm using PostgreSQL.

Is there any update for the setweight and advanced constructing queries as below?

What to do in the future:

Do I forget any?

@Jedliu
Copy link
Contributor

Jedliu commented Nov 27, 2022

Btw, possible to include the highlighting function? Thank you,

@Jedliu
Copy link
Contributor

Jedliu commented Nov 29, 2022

Just reviewed your code and found the to_tsvector was written fixed in the code https://github.com/mikro-orm/mikro-orm/blob/8b8f14071b92e91161a32aa272315a0ecce1bc0b/packages/postgresql/src/types/FullTextType.ts#L13 so it's not possible to use setweight or change another regconfig (e.g to_tsvector('english', 'your content') or use another segment tool to preprocess the content.

Here is a code snippet which might be an option.

@Index({ type: 'fulltext' })
@Property({
  type: FullTextType,
  nullable: true,
  onCreate: (book: Book) => `setweight(to_tsvector('english', '${book.title}'),'A') || setweight(to_tsvector('english', '${book.description}'),'B')`,
  onUpdate: (book: Book) => `setweight(to_tsvector('english', '${book.title}'),'A') || setweight(to_tsvector('english', '${book.description}'),'B')`,
})
searchableContent!: string;

@jsprw Let me know your comments and see if I can help. Thanks,

@B4nan
Copy link
Member

B4nan commented Nov 29, 2022

You can create your own FullTextType if you don't like how it's designed, this is fully under your own control - you can as well override just the columnType while using it. Note that your proposed API wouldn't work, onCreate/onUpdate need to return runtime values, not SQL fragments.

@Jedliu
Copy link
Contributor

Jedliu commented Nov 29, 2022

That's right. Have to update AbstractSqlDriver.ts to make the proposed API work.
Do you have any better suggestions for running these raw SQL fragments? Thank you,

@B4nan
Copy link
Member

B4nan commented Nov 29, 2022

As said above, create your own custom type (or maybe extend the existing FullTextType), where you have absolute control over both the value and the SQL fragment.

https://mikro-orm.io/docs/custom-types

@jsprw
Copy link
Contributor Author

jsprw commented Nov 29, 2022

Is there any update for the setweight and advanced constructing queries as below?

I have not worked on any of the proposed updates yet, have to find some time to work on them. I will suggest some changes below, if those are fine. I will try to implement them this week.

change another regconfig (e.g to_tsvector('english', 'your content')

This is a change which I would like to implement myself as well. As @B4nan already mentioned, you could extend the FullTextType to include the custom regconfig. We could add a constructor property to the FullTextType where you can specify the regconfig (with default value simple to prevent a breaking change).

  @Index({ type: 'fulltext' })
  @Property({ type: new FullTextType("english"), onUpdate: (book) => book.title })
  searchableTitle!: string;

Altough this would work if changed to english, the search query would still be converted to regconfig simple.

return `to_tsvector('simple', :column:) @@ plainto_tsquery('simple', :query)`;

So, we would also need to access this regconfig property in the methods where the full text indexes are created or converted to be used in a query.

@B4nan: don't have the project open right now, but would this information be available in these methods?

so it's not possible to use setweight

Reading https://www.postgresql.org/docs/current/textsearch-controls.html, I see that you only have the letters A till D for the weighted function. What about adding an check that if you set an object (with type Partial<Record<FullTextWeight, string>> where FullTextWeight is an enum) instead of string to a column with FullTextTypes that it will apply setweight to them? This is something you can add yourself too with a custom type.

Then you can use it like this which is converted to the right query.

  @Property({
    type: FullTextType,
    nullable: true,
    onCreate: (book: Book) => { [FullTextWeight.A]: book.title, [FullTextWeight.B]: book.description },
    onUpdate: (book: Book) => { [FullTextWeight.A]: book.title, [FullTextWeight.B]: book.description },
  })
searchableContent!: string;

It means that we would have to detect an object in the method FullTextType#convertToDatabaseValueSQL.

convertToDatabaseValueSQL(key: string) {

It would probably work like this (haven't tested yet):

  convertToDatabaseValueSQL(key: string | Partial<Record<FullTextWeight, string>>) {
    if (typeof key === "object") {
      return Object.entries(key).map(([weight, value]) => `setweight(to_tsvector(coalesce(${value},'')), '${weight}')`).join(" || ");
    }
    return `to_tsvector('simple', ${key})`;
  }

(Object.entries({A: "test", B: "test2", C: "test3", D: "test4"}).map(([weight, value]) => `setweight(to_tsvector('simple', coalesce(${escape(value)},'')), '${weight}')`).join(" || "); would return "setweight(to_tsvector('simple', coalesce(test,'')), 'A') || setweight(to_tsvector('simple', coalesce(test2,'')), 'B') || setweight(to_tsvector('simple', coalesce(test3,'')), 'C') || setweight(to_tsvector('simple', coalesce(test4,'')), 'D')")

@B4nan
Copy link
Member

B4nan commented Nov 29, 2022

would this information be available in these methods?

In QB yes, you can get the property object and access the type instance, in SchemaGenerator you'd have to propagate it in the place where we convert metadata to DatabaseSchema instance (and maybe adjust the database reflection to support schema diffing).

I don't mind adding the constructor, PR welcome.

@Jedliu
Copy link
Contributor

Jedliu commented Nov 29, 2022

You can create your own FullTextType if you don't like how it's designed, this is fully under your own control - you can as well override just the columnType while using it. Note that your proposed API wouldn't work, onCreate/onUpdate need to return runtime values, not SQL fragments.

Thank you. I will look into this.

@jsprw your idea looks much more promising. Thanks for taking the time to do this.

B4nan pushed a commit that referenced this pull request May 9, 2023
…nfig (#3805)

Adds support for a custom regconfig and weighted tsvectors as requested
by
#3317 (comment).

For more context, see
#3317 (comment).

Docs: also (hopefully) fixed the tabs which caused the confusion as seen
here:
#3317 (comment).
B4nan pushed a commit that referenced this pull request May 14, 2023
…nfig (#3805)

Adds support for a custom regconfig and weighted tsvectors as requested
by
#3317 (comment).

For more context, see
#3317 (comment).

Docs: also (hopefully) fixed the tabs which caused the confusion as seen
here:
#3317 (comment).
B4nan pushed a commit that referenced this pull request May 14, 2023
…nfig (#3805)

Adds support for a custom regconfig and weighted tsvectors as requested
by
#3317 (comment).

For more context, see
#3317 (comment).

Docs: also (hopefully) fixed the tabs which caused the confusion as seen
here:
#3317 (comment).
B4nan pushed a commit that referenced this pull request May 24, 2023
…nfig (#3805)

Adds support for a custom regconfig and weighted tsvectors as requested
by
#3317 (comment).

For more context, see
#3317 (comment).

Docs: also (hopefully) fixed the tabs which caused the confusion as seen
here:
#3317 (comment).
B4nan pushed a commit that referenced this pull request May 26, 2023
…nfig (#3805)

Adds support for a custom regconfig and weighted tsvectors as requested
by
#3317 (comment).

For more context, see
#3317 (comment).

Docs: also (hopefully) fixed the tabs which caused the confusion as seen
here:
#3317 (comment).
B4nan pushed a commit that referenced this pull request Jun 11, 2023
…nfig (#3805)

Adds support for a custom regconfig and weighted tsvectors as requested
by
#3317 (comment).

For more context, see
#3317 (comment).

Docs: also (hopefully) fixed the tabs which caused the confusion as seen
here:
#3317 (comment).
B4nan pushed a commit that referenced this pull request Sep 10, 2023
…nfig (#3805)

Adds support for a custom regconfig and weighted tsvectors as requested
by
#3317 (comment).

For more context, see
#3317 (comment).

Docs: also (hopefully) fixed the tabs which caused the confusion as seen
here:
#3317 (comment).
B4nan pushed a commit that referenced this pull request Sep 20, 2023
…nfig (#3805)

Adds support for a custom regconfig and weighted tsvectors as requested
by
#3317 (comment).

For more context, see
#3317 (comment).

Docs: also (hopefully) fixed the tabs which caused the confusion as seen
here:
#3317 (comment).
B4nan pushed a commit that referenced this pull request Sep 24, 2023
…nfig (#3805)

Adds support for a custom regconfig and weighted tsvectors as requested
by
#3317 (comment).

For more context, see
#3317 (comment).

Docs: also (hopefully) fixed the tabs which caused the confusion as seen
here:
#3317 (comment).
B4nan pushed a commit that referenced this pull request Sep 30, 2023
…nfig (#3805)

Adds support for a custom regconfig and weighted tsvectors as requested
by
#3317 (comment).

For more context, see
#3317 (comment).

Docs: also (hopefully) fixed the tabs which caused the confusion as seen
here:
#3317 (comment).
B4nan pushed a commit that referenced this pull request Oct 2, 2023
…nfig (#3805)

Adds support for a custom regconfig and weighted tsvectors as requested
by
#3317 (comment).

For more context, see
#3317 (comment).

Docs: also (hopefully) fixed the tabs which caused the confusion as seen
here:
#3317 (comment).
B4nan pushed a commit that referenced this pull request Oct 17, 2023
…nfig (#3805)

Adds support for a custom regconfig and weighted tsvectors as requested
by
#3317 (comment).

For more context, see
#3317 (comment).

Docs: also (hopefully) fixed the tabs which caused the confusion as seen
here:
#3317 (comment).
B4nan pushed a commit that referenced this pull request Oct 21, 2023
…nfig (#3805)

Adds support for a custom regconfig and weighted tsvectors as requested
by
#3317 (comment).

For more context, see
#3317 (comment).

Docs: also (hopefully) fixed the tabs which caused the confusion as seen
here:
#3317 (comment).
B4nan pushed a commit that referenced this pull request Oct 25, 2023
…nfig (#3805)

Adds support for a custom regconfig and weighted tsvectors as requested
by
#3317 (comment).

For more context, see
#3317 (comment).

Docs: also (hopefully) fixed the tabs which caused the confusion as seen
here:
#3317 (comment).
B4nan pushed a commit that referenced this pull request Nov 2, 2023
…nfig (#3805)

Adds support for a custom regconfig and weighted tsvectors as requested
by
#3317 (comment).

For more context, see
#3317 (comment).

Docs: also (hopefully) fixed the tabs which caused the confusion as seen
here:
#3317 (comment).
B4nan pushed a commit that referenced this pull request Nov 5, 2023
…nfig (#3805)

Adds support for a custom regconfig and weighted tsvectors as requested
by
#3317 (comment).

For more context, see
#3317 (comment).

Docs: also (hopefully) fixed the tabs which caused the confusion as seen
here:
#3317 (comment).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Full Text Search
5 participants