Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding more trigrams makes post query worse #16

Closed
GoogleCodeExporter opened this issue Mar 30, 2015 · 3 comments
Closed

Adding more trigrams makes post query worse #16

GoogleCodeExporter opened this issue Mar 30, 2015 · 3 comments

Comments

@GoogleCodeExporter
Copy link

bash$ csearch -i -verbose '%%0.*sprintf'
2012/02/29 16:02:11 query: "%%0" 
("SPR"|"SPr"|"SpR"|"Spr"|"sPR"|"sPr"|"spR"|"spr")|("\xbfPR" "ſP")|("\xbfPr" 
"ſP")|("\xbfpR" "ſp")|("\xbfpr" "ſp") 
("PRI"|"PRi"|"PrI"|"Pri"|"pRI"|"pRi"|"prI"|"pri") 
("RIN"|"RIn"|"RiN"|"Rin"|"rIN"|"rIn"|"riN"|"rin") 
("INT"|"INt"|"InT"|"Int"|"iNT"|"iNt"|"inT"|"int") 
("NTF"|"NTf"|"NtF"|"Ntf"|"nTF"|"nTf"|"ntF"|"ntf")
2012/02/29 16:02:11 post query identified 3295 possible files
bash$ csearch -i -verbose '%%0' >/dev/null
2012/02/29 16:02:24 query: "%%0"
2012/02/29 16:02:24 post query identified 9 possible files

Since only 9 files match '%%0', I would expect no more than 9 possible files 
from the post query when adding '.*sprintf' to the query string.  It looks like 
something is ignoring the 'restrict' list somewhere.

I'm still working on a minimal test case.  I'll look into this more tonight or 
so.

Original issue reported on code.google.com by dgryski on 29 Feb 2012 at 3:10

@GoogleCodeExporter
Copy link
Author

postReader init() was not initializing restrict correctly, so all r.next() 
calls returned the full list list and not the restricted set.

Patch attached.

Original comment by dgryski on 29 Feb 2012 at 3:43

  • Added labels: ****
  • Removed labels: ****

Attachments:

@GoogleCodeExporter
Copy link
Author

With this fix, the 599 files matching 'hello world' in linux-3.1.3 example ( 
http://swtch.com/~rsc/regexp/regexp4.html ) is reduced to 32:

/tmp dgryski$ CSEARCHINDEX=/tmp/.csearch-linux csearch -verbose -i -c 'hello 
world' 
2012/03/07 11:45:20 query: ("HEL"|"HEl"|"HeL"|"Hel"|"hEL"|"hEl"|"heL"|"hel") 
("ELL"|"ELl"|"ElL"|"Ell"|"eLL"|"eLl"|"elL"|"ell") 
("LLO"|"LLo"|"LlO"|"Llo"|"lLO"|"lLo"|"llO"|"llo") ("LO "|"Lo "|"lO "|"lo ") ("O 
W"|"O w"|"o W"|"o w") (" WO"|" Wo"|" wO"|" wo") 
("WOR"|"WOr"|"WoR"|"Wor"|"wOR"|"wOr"|"woR"|"wor") 
("ORL"|"ORl"|"OrL"|"Orl"|"oRL"|"oRl"|"orL"|"orl") 
("RLD"|"RLd"|"RlD"|"Rld"|"rLD"|"rLd"|"rlD"|"rld")
2012/03/07 11:45:20 post query identified 32 possible files
/tmp/linux-3.1.3/Documentation/filesystems/ramfs-rootfs-initramfs.txt: 3
/tmp/linux-3.1.3/Documentation/java.txt: 1
/tmp/linux-3.1.3/Documentation/s390/Debugging390.txt: 3
/tmp/linux-3.1.3/arch/blackfin/kernel/kgdb_test.c: 1
/tmp/linux-3.1.3/arch/frv/kernel/gdb-stub.c: 1
/tmp/linux-3.1.3/arch/mn10300/kernel/gdb-stub.c: 1
/tmp/linux-3.1.3/arch/powerpc/platforms/powermac/udbg_scc.c: 1
/tmp/linux-3.1.3/drivers/media/video/msp3400-driver.c: 1
/tmp/linux-3.1.3/drivers/net/sfc/selftest.c: 1
/tmp/linux-3.1.3/samples/kdb/kdb_hello.c: 2

Original comment by dgryski on 7 Mar 2012 at 10:48

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

This issue was closed by revision dbeef1612f35.

Original comment by dgryski on 2 May 2012 at 8:02

  • Changed state: Fixed
  • Added labels: ****
  • Removed labels: ****

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant