New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/full text search #1136

Merged
merged 13 commits into from Feb 20, 2017

Conversation

Projects
None yet
5 participants
@rslota
Contributor

rslota commented Dec 28, 2016

This PR introduces MVP for MAM full text search. Currently it works only with ODBC and Riak backends and its very limited when comes to text matching. This is NOT an extension to the standard. Current MAM XEP-0313 allows for queries using "forms" that may contain any server-defined fields, therefore simple field named 'full-text-search' was introduced in order to provide this feature.

The following stanza shows a example MAM stanza that uses full text search:

<iq type='set' id='query1'>
  <query xmlns='urn:xmpp:mam:1'>
    <x xmlns='jabber:x:data' type='submit'>
      <field type='hidden' var='FORM_TYPE'>
        <value>urn:xmpp:mam:1</value>
      </field>
      <field type='text-single' var='full-text-search'>
        <value>cat and dog</value>
      </field>
    </x>
  </query>
</iq>

Full Text Search is an optional server-side feature. To find out whether its supported, client may query server about active form fields for MAM stanza:

<iq type='get' id='form1'>
  <query xmlns='urn:xmpp:mam:1'/>
</iq>

The response shows which form fields may be used to filter MAM messages (that includes required fields i.e. with, start and end):

<iq type='result' id='form1'>
  <query xmlns='urn:xmpp:mam:1'>
    <x xmlns='jabber:x:data' type='form'>
      <field type='hidden' var='FORM_TYPE'>
        <value>urn:xmpp:mam:1</value>
      </field>
      <field type='jid-single' var='with'/>
      <field type='text-single' var='start'/>
      <field type='text-single' var='end'/>
      <field type='text-single' var='full-text-search'/>
    </x>
  </query>
</iq>

While reading PR you may want to skip "Make Elvis happy" commit to get rid of code quality changes.

@michalwski

This is very good job, thanks! I really appreciate you also added the search feature to Riak's backend. Here are my general comments also:

  • The PR description is not up-to-date (doesn't say about Riak backend)
  • It would be very valuable if you could add an example stanza in the PR desc so it's easy for client devs to see and understand how to use this feature.
  • It would be perfect if the search feature were optional. There maybe some installations where the packet format is in fact sth encrypted (that's why the packet's format is configurable). In such situation we don't want to keep the body in plain text. Also for setups where the search feature is not needed, disabling it will safe some storage space.
Show outdated Hide outdated apps/ejabberd/src/mod_mam_cassandra_arch.erl Outdated
_Start, _End, _Now, _WithJID, <<_SearchText/binary>>,
_PageSize, _LimitPassed, _MaxResultLimit,
_IsSimple) ->
{error, 'not-supported'};

This comment has been minimized.

@michalwski

michalwski Jan 2, 2017

Member

This one looks far more better then omg from prev comment :)

@michalwski

michalwski Jan 2, 2017

Member

This one looks far more better then omg from prev comment :)

Show outdated Hide outdated apps/ejabberd/src/mod_mam_odbc_arch.erl Outdated
Show outdated Hide outdated apps/ejabberd/src/mod_mam_utils.erl Outdated
replace_x_user_element(FromJID, Role, Affiliation, Packet) ->
append_x_user_element(FromJID, Role, Affiliation,
delete_x_user_element(Packet)).

This comment has been minimized.

@michalwski

michalwski Jan 2, 2017

Member

That's clever :) I like it!

@michalwski

michalwski Jan 2, 2017

Member

That's clever :) I like it!

Show outdated Hide outdated elvis.config Outdated

@rslota rslota changed the title from Feature/free text search to Feature/full text search Jan 9, 2017

-spec normalize_search_text(binary() | string() | undefined, string()) -> string() | undefined.
normalize_search_text(undefined, _WordSeparator) ->
undefined;
normalize_search_text(Text, WordSeparator) ->

This comment has been minimized.

@michalwski

michalwski Jan 10, 2017

Member

A word (or more) of an explanation will makes things easier to understand. A simple example may also help.

@michalwski

michalwski Jan 10, 2017

Member

A word (or more) of an explanation will makes things easier to understand. A simple example may also help.

true ->
false %% full page but not the last one in the result set
end;
TotalCount =:= PagedCount; %% false means full page but not the last one in the result set

This comment has been minimized.

@michalwski

michalwski Jan 10, 2017

Member

That's clever :) I like it!

@michalwski

michalwski Jan 10, 2017

Member

That's clever :) I like it!

@rslota rslota added MAM ready labels Jan 24, 2017

@kzemek

There's still documentation missing, with focus on what happens when full-text search is turned on (and a note that turning it on won't make old messages searchable).

Show outdated Hide outdated apps/ejabberd/src/ejabberd_gen_mam_archive.erl Outdated
Show outdated Hide outdated apps/ejabberd/src/ejabberd_gen_mam_archive.erl Outdated
Show outdated Hide outdated apps/ejabberd/src/mod_commands.erl Outdated
Show outdated Hide outdated apps/ejabberd/src/mod_mam.erl Outdated
Show outdated Hide outdated apps/ejabberd/src/mod_mam.erl Outdated
Show outdated Hide outdated apps/ejabberd/src/mod_mam_utils.erl Outdated
Show outdated Hide outdated apps/ejabberd/src/mod_mam_utils.erl Outdated
Show outdated Hide outdated apps/ejabberd/src/mod_mam_utils.erl Outdated
Show outdated Hide outdated apps/ejabberd/src/mod_mam_utils.erl Outdated
-spec normalize_search_text(binary() | string() | undefined, string()) -> string() | undefined.
normalize_search_text(undefined, _WordSeparator) ->
undefined;
normalize_search_text(Text, WordSeparator) ->

This comment has been minimized.

@kzemek

kzemek Feb 3, 2017

Contributor

This function has to do some heavy lifting on texts, have you measured the performance impact of ?

@kzemek

kzemek Feb 3, 2017

Contributor

This function has to do some heavy lifting on texts, have you measured the performance impact of ?

This comment has been minimized.

@rslota

rslota Feb 16, 2017

Contributor

It adds like 100 μs to processing time per row on my slow CPU. Simple string:to_lower is only 10 times faster then this whole function. Also, I've checked that precompiling regexes only speed up this function by ~5%. Removing only one of those 3 regexes could give also no more then 5-10%, but I'm not sure if its easy to do.

@rslota

rslota Feb 16, 2017

Contributor

It adds like 100 μs to processing time per row on my slow CPU. Simple string:to_lower is only 10 times faster then this whole function. Also, I've checked that precompiling regexes only speed up this function by ~5%. Removing only one of those 3 regexes could give also no more then 5-10%, but I'm not sure if its easy to do.

@kzemek

kzemek approved these changes Feb 17, 2017

@michalwski michalwski merged commit 0ddc72a into master Feb 20, 2017

4 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.1%) to 68.308%
Details
gadget/compiler compiler is satisfied with your pull request
gadget/elvis elvis is satisfied with your pull request

@michalwski michalwski deleted the feature/free-text-search branch Feb 20, 2017

@michalwski michalwski referenced this pull request Mar 28, 2017

Merged

MongooseIM 2.1.0beta1 #1244

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment