Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot find subwords in body after 1.0.1 #32

Closed
Nebukadneza opened this issue Jan 5, 2020 · 4 comments
Closed

Cannot find subwords in body after 1.0.1 #32

Nebukadneza opened this issue Jan 5, 2020 · 4 comments

Comments

@Nebukadneza
Copy link

Nebukadneza commented Jan 5, 2020

I’ve been trying to integrate fts-xapian in Mailu, a easy-to-use mailserver project. During testing, I made the observation that subwords are matched in Subject or To, but not in the body. In the body, only full words matches are returned.

I have built fts-xapian 1.2.6 in our dovecot docker, and adapted the config as shown in README.md. We use alpine 3.10 (so musl, not libc) with dovecot 2.3.7.2, xapian-core 1.4.11 and icu 64.2. The rest of the setup and config can be found here, if of any interest:
https://github.com/Nebukadneza/Mailu/blob/try_fts_xapian/core/dovecot/Dockerfile
https://github.com/Nebukadneza/Mailu/tree/try_fts_xapian/core/dovecot/conf

I’m testing using this simple python 3.7 snippet:

import imaplib
i = imaplib.IMAP4_SSL("mydomain")
i.login("myuser","mypass")
i.select() # selects inbodx
i.search(None, 'BODY "word"')
i.search(None, 'BODY "subword"')
i.search(None, 'TEXT "word"')
i.search(None, 'TEXT "subword"')
i.search(None, 'SUBJECT "word"')

The test-setup includes 4 mails in the inbox:

  • Subject: "Has subword"; Body: "Has subword"
  • Subject: "Has subword"; Body: "Has no youknowwhat"
  • Subject: "Has no youknowwhat"; Body: "Has subword"
  • Subject: "Has no youknowwhat"; Body: "Has no youknowwhat"

The expected outcome is:

  • i.search(None, 'BODY "word"') — should return 2, returns 0
  • i.search(None, 'BODY "subword"') — should return 2, returns 2
  • i.search(None, 'TEXT "word"') — should return 3, returns 2 (only the subject ones)
  • i.search(None, 'SUBJECT "word"') — should return 2, returns 2

Logs on: 7b7e7e7:

# BODY "word"

imap_1      | Jan 05 10:00:23 imap(myuser@mydomain)<70><LFgJm2GbDobAqMsC>: Info: Connection closed (SEARCH finished 25.026 secs ago) in=46 out=775 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=0 body_bytes=0
imap_1      | Jan 05 10:00:24 imap-login: Info: Login: user=<myuser@mydomain>, method=PLAIN, rip=192.168.203.2, lip=192.168.203.7, mpid=78, session=<nKCOnGGbIobAqMsC>
imap_1      | Jan 05 10:00:24 imap(myuser@mydomain78><nKCOnGGbIobAqMsC>: Info: FTS Xapian: FLAG=AND
imap_1      | Jan 05 10:00:24 imap(myuser@mydomain)<78><nKCOnGGbIobAqMsC>: Info: FTS Xapian: Query= body:word
imap_1      | Jan 05 10:00:24 imap(myuser@mydomain)<78><nKCOnGGbIobAqMsC>: Info: FTS Xapian: 0 results in 2 ms

# BODY "subword"

imap_1      | Jan 05 10:11:50 imap(myuser@mydomain)<18><tgNDxWGb0uzAqMsD>: Info: Connection closed (SEARCH finished 3.111 secs ago) in=49 out=779 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=0 body_bytes=0
imap_1      | Jan 05 10:11:50 imap-login: Info: Login: user=<myuser@mydomain>, method=PLAIN, rip=192.168.203.3, lip=192.168.203.5, mpid=22, session=<Zj97xWGb3uzAqMsD>
imap_1      | Jan 05 10:11:50 imap(myuser@mydomain)<22><Zj97xWGb3uzAqMsD>: Info: FTS Xapian: FLAG=AND
imap_1      | Jan 05 10:11:50 imap(myuser@mydomain)<22><Zj97xWGb3uzAqMsD>: Info: FTS Xapian: Query= body:subword
imap_1      | Jan 05 10:11:50 imap(myuser@mydomain)<22><Zj97xWGb3uzAqMsD>: Info: FTS Xapian: 2 results in 2 ms


Log on 1.2.6:

# BODY "word"

imap_1      | Jan 05 10:04:50 imap(myuser@mydomain)<33><nvMyrGGbgOvAqMsD>: Info: Connection closed (SEARCH finished 3.346 secs ago) in=46 out=775 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=0 body_bytes=0
imap_1      | Jan 05 10:04:50 imap-login: Info: Login: user=<myuser@mydomain>, method=PLAIN, rip=192.168.203.3, lip=192.168.203.5, mpid=37, session=<afpvrGGbjOvAqMsD>
imap_1      | Jan 05 10:04:50 imap(myuser@mydomain)<37><afpvrGGbjOvAqMsD>: Info: Opening DB (RO) /mail/admin@myuser@mydomain/xapian-indexes/db_2a75312dbeb4115e21000000c58116a5
imap_1      | Jan 05 10:04:50 imap(admin@myuser@mydomain)<37><afpvrGGbjOvAqMsD>: Info: Get last UID of INBOX = 4
imap_1      | Jan 05 10:04:50 imap(admin@myuser@mydomain)<37><afpvrGGbjOvAqMsD>: Info: Opening DB (RO) /mail/myuser@mydomain/xapian-indexes/db_2a75312dbeb4115e21000000c58116a5
imap_1      | Jan 05 10:04:50 imap(myuser@mydomain)<37><afpvrGGbjOvAqMsD>: Info: Get last UID of INBOX = 4
imap_1      | Jan 05 10:04:50 imap(myuser@mydomain)<37><afpvrGGbjOvAqMsD>: Info: Opening DB (RO) /mail/myuser@mydomain/xapian-indexes/db_2a75312dbeb4115e21000000c58116a5
imap_1      | Jan 05 10:04:50 imap(myuser@mydomain)<37><afpvrGGbjOvAqMsD>: Info: FTS Xapian: FLAG=AND
imap_1      | Jan 05 10:04:50 imap(myuser@mydomain)<37><afpvrGGbjOvAqMsD>: Info: FTS Xapian: Query= body:"word"
imap_1      | Jan 05 10:04:50 imap(myuser@mydomain)<37><afpvrGGbjOvAqMsD>: Info: FTS Xapian: 0 results in 1 ms

# BODY "subword"

imap_1      | Jan 05 10:05:39 imap(myuser@mydomain)<37><afpvrGGbjOvAqMsD>: Info: Connection closed (SEARCH finished 48.116 secs ago) in=46 out=767 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=0 body_bytes=0
imap_1      | Jan 05 10:05:39 imap-login: Info: Login: user=<myuser@mydomain>, method=PLAIN, rip=192.168.203.3, lip=192.168.203.5, mpid=57, session=<mRxWr2GbuuvAqMsD>
imap_1      | Jan 05 10:05:39 imap(myuser@mydomain)<57><mRxWr2GbuuvAqMsD>: Info: Opening DB (RO) /mail/myuser@mydomain/xapian-indexes/db_2a75312dbeb4115e21000000c58116a5
imap_1      | Jan 05 10:05:39 imap(myuser@mydomain)<57><mRxWr2GbuuvAqMsD>: Info: Get last UID of INBOX = 4
imap_1      | Jan 05 10:05:39 imap(myuser@mydomain)<57><mRxWr2GbuuvAqMsD>: Info: Opening DB (RO) /mail/myuser@mydomain/xapian-indexes/db_2a75312dbeb4115e21000000c58116a5
imap_1      | Jan 05 10:05:39 imap(myuser@mydomain)<57><mRxWr2GbuuvAqMsD>: Info: Get last UID of INBOX = 4
imap_1      | Jan 05 10:05:39 imap(myuser@mydomain)<57><mRxWr2GbuuvAqMsD>: Info: Opening DB (RO) /mail/myuser@mydomain/xapian-indexes/db_2a75312dbeb4115e21000000c58116a5
imap_1      | Jan 05 10:05:39 imap(myuser@mydomain)<57><mRxWr2GbuuvAqMsD>: Info: FTS Xapian: FLAG=AND
imap_1      | Jan 05 10:05:39 imap(myuser@mydomain)<57><mRxWr2GbuuvAqMsD>: Info: FTS Xapian: Query= body:"subword"
imap_1      | Jan 05 10:05:39 imap(myuser@mydomain)<57><mRxWr2GbuuvAqMsD>: Info: FTS Xapian: 2 results in 2 ms

Log on 1.0.1:

# BODY "word"

imap_1      | Jan 05 10:09:10 imap(myuser@mydomain)<33><zoGvu2GbMI3AqMsD>: Info: Connection closed (SEARCH finished 4.063 secs ago) in=46 out=779 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=0 body_bytes=0
imap_1      | Jan 05 10:09:11 imap-login: Info: Login: user=<myuser@mydomain>, method=PLAIN, rip=192.168.203.3, lip=192.168.203.6, mpid=37, session=<RAr3u2GbPo3AqMsD>
imap_1      | Jan 05 10:09:11 imap(myuser@mydomain)<37><RAr3u2GbPo3AqMsD>: Info: Get last UID of INBOX = 4
imap_1      | Jan 05 10:09:11 imap(myuser@mydomain)<37><RAr3u2GbPo3AqMsD>: Info: Get last UID of INBOX = 4
imap_1      | Jan 05 10:09:11 imap(myuser@mydomain)<37><RAr3u2GbPo3AqMsD>: Info: Query: FLAG=AND
imap_1      | Jan 05 10:09:11 imap(myuser@mydomain)<37><RAr3u2GbPo3AqMsD>: Info: Query(1): add term(wilcard) : word
imap_1      | Jan 05 10:09:11 imap(myuser@mydomain)<37><RAr3u2GbPo3AqMsD>: Info: Testing if wildcard
imap_1      | Jan 05 10:09:11 imap(myuser@mydomain)<37><RAr3u2GbPo3AqMsD>: Info: Query: set GLOBAL (no specified header)
imap_1      | Jan 05 10:09:11 imap(myuser@mydomain)<37><RAr3u2GbPo3AqMsD>: Info: Query : ( bcc:word OR body:word OR cc:word OR from:word OR message-id:word OR subject:word OR to:word )
imap_1      | Jan 05 10:09:11 imap(myuser@mydomain)<37><RAr3u2GbPo3AqMsD>: Info: Query: 2 results in 3 ms
imap_1      | Jan 05 10:09:13 imap(myuser@mydomain)<37><RAr3u2GbPo3AqMsD>: Info: Connection closed (SEARCH finished 2.398 secs ago) in=46 out=771 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=0 body_bytes=0


# BODY "subword"

imap_1      | Jan 05 10:09:14 imap-login: Info: Login: user=<myuser@mydomain>, method=PLAIN, rip=192.168.203.3, lip=192.168.203.6, mpid=39, session=<vK8hvGGbSo3AqMsD>
imap_1      | Jan 05 10:09:14 imap(myuser@mydomain)<39><vK8hvGGbSo3AqMsD>: Info: Get last UID of INBOX = 4
imap_1      | Jan 05 10:09:14 imap(myuser@mydomain)<39><vK8hvGGbSo3AqMsD>: Info: Get last UID of INBOX = 4
imap_1      | Jan 05 10:09:14 imap(myuser@mydomain)<39><vK8hvGGbSo3AqMsD>: Info: Query: FLAG=AND
imap_1      | Jan 05 10:09:14 imap(myuser@mydomain)<39><vK8hvGGbSo3AqMsD>: Info: Query(1): add term(wilcard) : subword
imap_1      | Jan 05 10:09:14 imap(myuser@mydomain)<39><vK8hvGGbSo3AqMsD>: Info: Testing if wildcard
imap_1      | Jan 05 10:09:14 imap(myuser@mydomain)<39><vK8hvGGbSo3AqMsD>: Info: Query: set GLOBAL (no specified header)
imap_1      | Jan 05 10:09:14 imap(myuser@mydomain)<39><vK8hvGGbSo3AqMsD>: Info: Query : ( bcc:subword OR body:subword OR cc:subword OR from:subword OR message-id:subword OR subject:subword OR to:subword )
imap_1      | Jan 05 10:09:14 imap(myuser@mydomain)<39><vK8hvGGbSo3AqMsD>: Info: Query: 3 results in 3 ms

I then went on looking at the version history, and tried the tag fts-xapian-1.0.1, and the above did start searching subwords in body! Upon using git bisect, i found the offending commit to be 7b7e7e7 — before this commit I’m able to perform subword searches in body, afterwards it doesn’t work anymore.

Unfortunately that commit’s a bit convoluted, so it’s hard for me as an outsider to follow what really happened. Maybe someone can shed a bit of light on this? Or maybe I’m doing my testing or configuration wrong …

Thanks for this great little plugin nevertheless ^_^

@grosjo
Copy link
Owner

grosjo commented Jan 18, 2020

Thank you for notice, I will dig into that and let you know

grosjo added a commit that referenced this issue Jan 18, 2020
grosjo added a commit that referenced this issue Jan 18, 2020
grosjo added a commit that referenced this issue Jan 18, 2020
grosjo added a commit that referenced this issue Jan 18, 2020
grosjo added a commit that referenced this issue Jan 18, 2020
grosjo added a commit that referenced this issue Jan 18, 2020
@grosjo
Copy link
Owner

grosjo commented Jan 18, 2020

Please kindly test with latest git

@Nebukadneza
Copy link
Author

Thank you a lot for your quick and (probably) helpful response. I’ll see that I make some time for testing this again tomorrow or latest by mid of next week. I’ll report back whether your changes fixed the issue.

Thanks a bunch & Best Regards,
-Dario

@Nebukadneza
Copy link
Author

Nebukadneza commented Jan 19, 2020

Hi,

I found some time to test it again, and I’m happy to report back that searching now works as expected, and I’m able to find subwords and whole phrases in all BODY, SUBJECT or both using TEXT. If there’s nothing else from your side, then this issue could be closed.

Thank you a lot!

@grosjo grosjo closed this as completed Jan 21, 2020
bors bot added a commit to Mailu/Mailu that referenced this issue Feb 1, 2020
1320: Add xapian full-text-search plugin to dovecot r=mergify[bot] a=Nebukadneza

## What type of PR?
Enhancement

## What does this PR do?
Currently we are not able to offer our users a FTS experience after the
demise of lucene due to unfixed coredumps with musl/alpine.
We now add lucene, the only remaining maintained small/lean FTS plugin
for dovecot. It is quite simple to add to our stack: A two-stage docker
build is used to compile the fts plugin in the first stage, and copy
over only the resulting plugin-artifact to the second stage, which is
our usual dovecot container. Configuration is also minimal.

There was a upstream issue where bodies were not able to be searched for subwords, but fortunately it was fixed quite quickly. We currently need to wait for a new release to use a stable tag in our `Dockerfile`.

### Related issue(s)
- #1176
- #1297
- #751
- **Upstream-issues which is the cause for the `TODO` in the `Dockerfile`**: grosjo/fts-xapian#32

## Prerequistes
- [ ] Wait for upstream to prepare new release after grosjo/fts-xapian#32 — so that we can use a stable tag in our `Dockerfile`
- [ ] In case of feature or enhancement: documentation updated accordingly
- [ ] Unless it's docs or a minor change: add [changelog](https://mailu.io/master/contributors/guide.html#changelog) entry file.


Co-authored-by: Dario Ernst <dario@kanojo.de>
bors bot added a commit to Mailu/Mailu that referenced this issue Mar 10, 2020
1320: Add xapian full-text-search plugin to dovecot r=mergify[bot] a=Nebukadneza

## What type of PR?
Enhancement

## What does this PR do?
Currently we are not able to offer our users a FTS experience after the
demise of lucene due to unfixed coredumps with musl/alpine.
We now add lucene, the only remaining maintained small/lean FTS plugin
for dovecot. It is quite simple to add to our stack: A two-stage docker
build is used to compile the fts plugin in the first stage, and copy
over only the resulting plugin-artifact to the second stage, which is
our usual dovecot container. Configuration is also minimal.

There was a upstream issue where bodies were not able to be searched for subwords, but fortunately it was fixed quite quickly. We currently need to wait for a new release to use a stable tag in our `Dockerfile`.

### Related issue(s)
- #1176
- #1297
- #751
- **Upstream-issues which is the cause for the `TODO` in the `Dockerfile`**: grosjo/fts-xapian#32

## Prerequistes
- [ ] Wait for upstream to prepare new release after grosjo/fts-xapian#32 — so that we can use a stable tag in our `Dockerfile`
- [ ] In case of feature or enhancement: documentation updated accordingly
- [ ] Unless it's docs or a minor change: add [changelog](https://mailu.io/master/contributors/guide.html#changelog) entry file.


Co-authored-by: Dario Ernst <dario@kanojo.de>
Co-authored-by: Dario Ernst <dario.ernst@rommelag.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants