Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change account search to match by text when opted-in #25599

Merged
merged 3 commits into from
Jun 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
52 changes: 38 additions & 14 deletions app/chewy/accounts_index.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,37 @@

class AccountsIndex < Chewy::Index
settings index: { refresh_interval: '30s' }, analysis: {
filter: {
english_stop: {
type: 'stop',
stopwords: '_english_',
},

english_stemmer: {
type: 'stemmer',
language: 'english',
},

english_possessive_stemmer: {
type: 'stemmer',
language: 'possessive_english',
},
},

analyzer: {
content: {
natural: {
tokenizer: 'uax_url_email',
filter: %w(
english_possessive_stemmer
lowercase
asciifolding
cjk_width
english_stop
english_stemmer
),
},

verbatim: {
tokenizer: 'whitespace',
filter: %w(lowercase asciifolding cjk_width),
},
Comment on lines +5 to 38
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really comment on this as I am not knowledgeable in Chewy or ElasticSearch. That being said, this seems very tailored to English and I worry this could provide subpar results in other languages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s a copy of the analyzer we use for posts.

Expand All @@ -26,18 +55,13 @@ class AccountsIndex < Chewy::Index
index_scope ::Account.searchable.includes(:account_stat)

root date_detection: false do
field :id, type: 'long'

field :display_name, type: 'text', analyzer: 'content' do
field :edge_ngram, type: 'text', analyzer: 'edge_ngram', search_analyzer: 'content'
end

field :acct, type: 'text', analyzer: 'content', value: ->(account) { [account.username, account.domain].compact.join('@') } do
field :edge_ngram, type: 'text', analyzer: 'edge_ngram', search_analyzer: 'content'
end

field :following_count, type: 'long', value: ->(account) { account.following_count }
field :followers_count, type: 'long', value: ->(account) { account.followers_count }
field :last_status_at, type: 'date', value: ->(account) { account.last_status_at || account.created_at }
field(:id, type: 'long')
field(:following_count, type: 'long')
field(:followers_count, type: 'long')
field(:properties, type: 'keyword', value: ->(account) { account.searchable_properties })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used for anything?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s a way to future proof against having to completely recreate the index when adding more filtering capabilities.

field(:last_status_at, type: 'date', value: ->(account) { account.last_status_at || account.created_at })
field(:display_name, type: 'text', analyzer: 'verbatim') { field :edge_ngram, type: 'text', analyzer: 'edge_ngram', search_analyzer: 'verbatim' }
field(:username, type: 'text', analyzer: 'verbatim', value: ->(account) { [account.username, account.domain].compact.join('@') }) { field :edge_ngram, type: 'text', analyzer: 'edge_ngram', search_analyzer: 'verbatim' }
field(:text, type: 'text', value: ->(account) { account.searchable_text }) { field :stemmed, type: 'text', analyzer: 'natural' }
end
end
11 changes: 11 additions & 0 deletions app/models/concerns/account_search.rb
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,17 @@ module AccountSearch
LIMIT :limit OFFSET :offset
SQL

def searchable_text
PlainTextFormatter.new(note, local?).to_s if discoverable?
end

def searchable_properties
[].tap do |properties|
properties << 'bot' if bot?
properties << 'verified' if fields.any?(&:verified?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what it's supposed to be used for, but having a verified value that does not reference the value of the exact verified field does not seem useful, surfacing that the account has a verified link, without showing which, may be misleading.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be useful to exclude results that have no verification at all, however.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the use case

end
end

class_methods do
def search_for(terms, limit: 10, offset: 0)
tsquery = generate_query_for_search(terms)
Expand Down
51 changes: 38 additions & 13 deletions app/services/account_search_service.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,11 @@ class AccountSearchService < BaseService
MIN_QUERY_LENGTH = 5

def call(query, account = nil, options = {})
@acct_hint = query&.start_with?('@')
@query = query&.strip&.gsub(/\A@/, '')
@limit = options[:limit].to_i
@offset = options[:offset].to_i
@options = options
@account = account
@query = query&.strip&.gsub(/\A@/, '')
@limit = options[:limit].to_i
@offset = options[:offset].to_i
@options = options
@account = account

search_service_results.compact.uniq
end
Expand Down Expand Up @@ -72,8 +71,8 @@ def simple_search_results
end

def from_elasticsearch
must_clauses = [{ multi_match: { query: terms_for_query, fields: likely_acct? ? %w(acct.edge_ngram acct) : %w(acct.edge_ngram acct display_name.edge_ngram display_name), type: 'most_fields', operator: 'and' } }]
should_clauses = []
must_clauses = must_clause
should_clauses = should_clause

if account
return [] if options[:following] && following_ids.empty?
Expand All @@ -88,7 +87,7 @@ def from_elasticsearch
query = { bool: { must: must_clauses, should: should_clauses } }
functions = [reputation_score_function, followers_score_function, time_distance_function]

records = AccountsIndex.query(function_score: { query: query, functions: functions, boost_mode: 'multiply', score_mode: 'avg' })
records = AccountsIndex.query(function_score: { query: query, functions: functions })
.limit(limit_for_non_exact_results)
.offset(offset)
.objects
Expand Down Expand Up @@ -133,6 +132,36 @@ def time_distance_function
}
end

def must_clause
fields = %w(username username.* display_name display_name.*)
fields << 'text' << 'text.*' if options[:use_searchable_text]

[
{
multi_match: {
query: terms_for_query,
fields: fields,
type: 'best_fields',
operator: 'or',
},
},
]
end

def should_clause
[
{
multi_match: {
query: terms_for_query,
fields: %w(username username.* display_name display_name.*),
type: 'best_fields',
operator: 'and',
boost: 10,
},
},
]
end
ClearlyClaire marked this conversation as resolved.
Show resolved Hide resolved

def following_ids
@following_ids ||= account.active_relationships.pluck(:target_account_id) + [account.id]
end
Expand Down Expand Up @@ -182,8 +211,4 @@ def exact_match?
def username_complete?
query.include?('@') && "@#{query}".match?(MENTION_ONLY_RE)
end

def likely_acct?
@acct_hint || username_complete?
end
end
3 changes: 2 additions & 1 deletion app/services/search_service.rb
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ def perform_accounts_search!
@account,
limit: @limit,
resolve: @resolve,
offset: @offset
offset: @offset,
use_searchable_text: true
)
end

Expand Down
2 changes: 1 addition & 1 deletion spec/services/search_service_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@
allow(AccountSearchService).to receive(:new).and_return(service)

results = subject.call(query, nil, 10)
expect(service).to have_received(:call).with(query, nil, limit: 10, offset: 0, resolve: false)
expect(service).to have_received(:call).with(query, nil, limit: 10, offset: 0, resolve: false, use_searchable_text: true)
expect(results).to eq empty_results.merge(accounts: [account])
end
end
Expand Down