-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change account search to match by text when opted-in #25599
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,8 +2,37 @@ | |
|
||
class AccountsIndex < Chewy::Index | ||
settings index: { refresh_interval: '30s' }, analysis: { | ||
filter: { | ||
english_stop: { | ||
type: 'stop', | ||
stopwords: '_english_', | ||
}, | ||
|
||
english_stemmer: { | ||
type: 'stemmer', | ||
language: 'english', | ||
}, | ||
|
||
english_possessive_stemmer: { | ||
type: 'stemmer', | ||
language: 'possessive_english', | ||
}, | ||
}, | ||
|
||
analyzer: { | ||
content: { | ||
natural: { | ||
tokenizer: 'uax_url_email', | ||
filter: %w( | ||
english_possessive_stemmer | ||
lowercase | ||
asciifolding | ||
cjk_width | ||
english_stop | ||
english_stemmer | ||
), | ||
}, | ||
|
||
verbatim: { | ||
tokenizer: 'whitespace', | ||
filter: %w(lowercase asciifolding cjk_width), | ||
}, | ||
|
@@ -26,18 +55,13 @@ class AccountsIndex < Chewy::Index | |
index_scope ::Account.searchable.includes(:account_stat) | ||
|
||
root date_detection: false do | ||
field :id, type: 'long' | ||
|
||
field :display_name, type: 'text', analyzer: 'content' do | ||
field :edge_ngram, type: 'text', analyzer: 'edge_ngram', search_analyzer: 'content' | ||
end | ||
|
||
field :acct, type: 'text', analyzer: 'content', value: ->(account) { [account.username, account.domain].compact.join('@') } do | ||
field :edge_ngram, type: 'text', analyzer: 'edge_ngram', search_analyzer: 'content' | ||
end | ||
|
||
field :following_count, type: 'long', value: ->(account) { account.following_count } | ||
field :followers_count, type: 'long', value: ->(account) { account.followers_count } | ||
field :last_status_at, type: 'date', value: ->(account) { account.last_status_at || account.created_at } | ||
field(:id, type: 'long') | ||
field(:following_count, type: 'long') | ||
field(:followers_count, type: 'long') | ||
field(:properties, type: 'keyword', value: ->(account) { account.searchable_properties }) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this used for anything? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It’s a way to future proof against having to completely recreate the index when adding more filtering capabilities. |
||
field(:last_status_at, type: 'date', value: ->(account) { account.last_status_at || account.created_at }) | ||
field(:display_name, type: 'text', analyzer: 'verbatim') { field :edge_ngram, type: 'text', analyzer: 'edge_ngram', search_analyzer: 'verbatim' } | ||
field(:username, type: 'text', analyzer: 'verbatim', value: ->(account) { [account.username, account.domain].compact.join('@') }) { field :edge_ngram, type: 'text', analyzer: 'edge_ngram', search_analyzer: 'verbatim' } | ||
field(:text, type: 'text', value: ->(account) { account.searchable_text }) { field :stemmed, type: 'text', analyzer: 'natural' } | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -106,6 +106,17 @@ module AccountSearch | |
LIMIT :limit OFFSET :offset | ||
SQL | ||
|
||
def searchable_text | ||
PlainTextFormatter.new(note, local?).to_s if discoverable? | ||
end | ||
|
||
def searchable_properties | ||
[].tap do |properties| | ||
properties << 'bot' if bot? | ||
properties << 'verified' if fields.any?(&:verified?) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure what it's supposed to be used for, but having a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It can be useful to exclude results that have no verification at all, however. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure I understand the use case |
||
end | ||
end | ||
|
||
class_methods do | ||
def search_for(terms, limit: 10, offset: 0) | ||
tsquery = generate_query_for_search(terms) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't really comment on this as I am not knowledgeable in Chewy or ElasticSearch. That being said, this seems very tailored to English and I worry this could provide subpar results in other languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s a copy of the analyzer we use for posts.