Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indextool --dumpdocids is not working #1099

Closed
sanikolaev opened this issue Apr 19, 2023 · 6 comments
Closed

indextool --dumpdocids is not working #1099

sanikolaev opened this issue Apr 19, 2023 · 6 comments

Comments

@sanikolaev
Copy link
Collaborator

sanikolaev commented Apr 19, 2023

MRE:

➜  ~ cat csv.conf    
searchd {    
    listen = 9315:mysql41    
    log = searchd.log    
    pid_file = searchd.pid    
    binlog_path =    
}    
    
source src {    
    type = csvpipe    
    csvpipe_command = echo "1,a"; echo "2,b"    
    csvpipe_field = f    
}    
    
index idx {    
    type = plain    
    source = src    
    path = /tmp/idx    
}    
    
    
➜  ~ indexer -c csv.conf --all    
Manticore 6.0.5 70662654f@230405 dev    
Copyright (c) 2001-2016, Andrew Aksyonoff    
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)    
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)    
    
WARNING: Error initializing columnar storage: daemon requires columnar library v21 (trying to load v18)    
WARNING: Error initializing secondary index: daemon requires secondary library v8 (trying to load v6)    
using config file '/Users/sn/csv.conf'...    
indexing table 'idx'...    
collected 2 docs, 0.0 MB    
creating lookup: 0.0 Kdocs, 100.0% done    
sorted 0.0 Mhits, 100.0% done    
total 2 docs, 2 bytes    
total 0.029 sec, 68 bytes/sec, 68.15 docs/sec    
total 3 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg    
total 15 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg    
    
➜  ~ indextool -c csv.conf --dumpdocids idx    
Error initializing columnar storage: daemon requires columnar library v21 (trying to load v18)Error initializing secondary index: daemon requires secondary library v8 (trying to load v6)Manticore 6.0.5 70662654f@230405 dev    
Copyright (c) 2001-2016, Andrew Aksyonoff    
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)    
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)    
    
using config file '/Users/sn/csv.conf'...    
WARNING: secondary library not loaded; secondary index(es) disabled    
dumping docids for table 'idx'...    
docinfo-bytes: docinfo=16, min-max=32, total=0    
docinfo-stride: 8    
docinfo-rows: 2    
➜  ~    

Expected: doc ids at the output of indextool.

@githubmanticore
Copy link
Contributor

➤ Sergey Nikolaev commented:

SERGEY will make a cleaner test w/o

WARNING: Error initializing columnar storage: daemon requires columnar library v21 (trying to load v18)  
WARNING: Error initializing secondary index: daemon requires secondary library v8 (trying to load v6)  

@sanikolaev
Copy link
Collaborator Author

Retest:

➜  ~ cat csv.conf
searchd {
    listen = 9315:mysql41
    log = searchd.log
    pid_file = searchd.pid
    binlog_path =
}

source src {
    type = csvpipe
    csvpipe_command = echo "1,acd def ghi"; echo "2,abc ghi xyz"; echo "3,def hjk kjh"
    csvpipe_field = f
}

index idx {
    type = plain
    source = src
    path = /tmp/idx
}

➜  ~ indexer -c csv.conf --all
Manticore 6.0.5 19c3ca50e@230428 dev (columnar 2.0.5 24e76dd@230422) (secondary 2.0.5 24e76dd@230422)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

using config file '/Users/sn/csv.conf'...
indexing table 'idx'...
collected 3 docs, 0.0 MB
creating secondary index
creating lookup: 0.0 Kdocs, 100.0% done
sorted 0.0 Mhits, 100.0% done
total 3 docs, 33 bytes
total 0.031 sec, 1044 bytes/sec, 94.99 docs/sec
total 3 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 15 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg

➜  ~ indextool -c csv.conf --dumpdocids idx
Manticore 6.0.5 19c3ca50e@230428 dev (columnar 2.0.5 24e76dd@230422) (secondary 2.0.5 24e76dd@230422)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

using config file '/Users/sn/csv.conf'...
dumping docids for table 'idx'...
docinfo-bytes: docinfo=24, min-max=32, total=0
docinfo-stride: 8
docinfo-rows: 3
➜  ~

@githubmanticore
Copy link
Contributor

➤ Aleksey N. Vinogradov commented:

That's quite tricky task.
If I just enable index loading (which is necessary by this function) - it will reopen behavior causing #456.
So, that is mutually exclusive options - either consume much memory (issue #456), either dump docids.

Rev where decision to 456 was made is f2c5e5c (21.11.2018). From this point to now we can't dump docids.

@githubmanticore
Copy link
Contributor

➤ Ilya Kuznetsov commented:

We have docid lookup file (basically (docid;rowid) pairs) that can be used to dump docids.

@githubmanticore
Copy link
Contributor

➤ Aleksey N. Vinogradov commented:

That is question between 'fast fix in 2 lines' vs 'reimplement dump'.
Not sure, that dumping docids is popular function of indextool, to spend a time on it.

@sanikolaev
Copy link
Collaborator Author

If I just enable index loading (which is necessary by this function) - it will reopen behavior causing #456.

@klirichek did it in 47146bd . The task is complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants