Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiLevelSkipListReader doesn't handle large offsets #29

Closed
drigh opened this issue Sep 20, 2012 · 9 comments
Closed

MultiLevelSkipListReader doesn't handle large offsets #29

drigh opened this issue Sep 20, 2012 · 9 comments

Comments

@drigh
Copy link

drigh commented Sep 20, 2012

MultiLevelSkipListReader.cpp:79 should have (int32_t) cast removed.

if (level > 0 && lastChildPointer > (int32_t)skipStream[level - 1]->getFilePointer())

@alanw
Copy link
Collaborator

alanw commented Sep 20, 2012

Done.

@alanw alanw closed this as completed Sep 20, 2012
@alian555
Copy link

Hi !

Even with this fix, I've problem with big base (segfault):

#0 0x00007ff0ea44d725 in operator at /home/albert/albert/LucenePlusPlus/include/Array.h:116
#1 Lucene::SkipBuffer::readByte (this=0x3175ba40) at /home/albert/albert/LucenePlusPlus/src/core/index/MultiLevelSkipListReader.cpp:222
#2 0x00007ff0ea4b3825 in Lucene::IndexInput::readLong (this=0x3175ba40) at /home/albert/albert/LucenePlusPlus/src/core/store/IndexInput.cpp:54
#3 0x00007ff0ea3249c0 in Lucene::DefaultSkipListReader::readSkipData (this=0x7ff0e67ccb00, level=2, skipStream=DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjuction with DW_OP_piece or DW_OP_bit_piece.
)
at /home/albert/albert/LucenePlusPlus/src/core/index/DefaultSkipListReader.cpp:90
#4 0x00007ff0ea44e317 in Lucene::MultiLevelSkipListReader::loadNextSkip (this=0x3175bf90, level=2)
at /home/albert/albert/LucenePlusPlus/src/core/index/MultiLevelSkipListReader.cpp:105
#5 0x00007ff0ea44d8cf in Lucene::MultiLevelSkipListReader::skipTo (this=0x3175bf90, target=20491)
at /home/albert/albert/LucenePlusPlus/src/core/index/MultiLevelSkipListReader.cpp:73
#6 0x00007ff0ea478106 in operator-> (this=0x3175abd0, target=20491) at /usr/local/include/boost/smart_ptr/shared_ptr.hpp:418
#7 Lucene::SegmentTermDocs::skipTo (this=0x3175abd0, target=20491) at /home/albert/albert/LucenePlusPlus/src/core/index/SegmentTermDocs.cpp:223
#8 0x00007ff0ea472cc8 in shared_count (this=0x3175ba40, target=2) at /usr/local/include/boost/smart_ptr/detail/shared_count.hpp:223
#9 shared_ptr (this=0x3175ba40, target=2) at /usr/local/include/boost/smart_ptr/shared_ptr.hpp:169
#10 top (this=0x3175ba40, target=2) at /home/albert/albert/LucenePlusPlus/include/PriorityQueue.h:126
#11 Lucene::MultipleTermPositions::skipTo (this=0x3175ba40, target=2) at /home/albert/albert/LucenePlusPlus/src/core/index/MultipleTermPositions.cpp:69
#12 0x00007ff0ea147b93 in Lucene::PhrasePositions::skipTo (this=0x66c9b0, target=2) at /home/albert/albert/LucenePlusPlus/src/core/search/PhrasePositions.cpp:43
#13 0x00007ff0ea145d17 in Lucene::PhraseScorer::doNext (this=0x66c8c0) at /home/albert/albert/LucenePlusPlus/src/core/search/PhraseScorer.cpp:72
#14 0x00007ff0ea1461e8 in Lucene::PhraseScorer::nextDoc (this=0x66c8c0) at /home/albert/albert/LucenePlusPlus/src/core/search/PhraseScorer.cpp:61
#15 0x00007ff0ea1436c0 in Lucene::Scorer::score (this=0x66c8c0, collector=DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjuction with DW_OP_piece or DW_OP_bit_piece.
) at /home/albert/albert/LucenePlusPlus/src/core/search/Scorer.cpp:31
#16 0x00007ff0ea106c61 in Lucene::IndexSearcher::search (this=0x667210, weight=DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjuction with DW_OP_piece or DW_OP_bit_piece.
) at /home/albert/albert/LucenePlusPlus/src/core/search/IndexSearcher.cpp:131
#17 0x00007ff0ea10797e in Lucene::IndexSearcher::search (this=0x667210, weight=, filter=, n=)
at /home/albert/albert/LucenePlusPlus/src/core/search/IndexSearcher.cpp:106
#18 0x00007ff0ea1adc30 in Lucene::Searcher::search (this=0x667210, query=, filter=, n=100)
at /home/albert/albert/LucenePlusPlus/src/core/search/Searcher.cpp:41

Something like 200 Go of data for 70 000 000 docs.

-rw-rw-rw- 1 alian users 148641417159 22 janv. 16:11 _h75q.fdt
-rw-rw-rw- 1 alian users 559054868 22 janv. 16:11 _h75q.fdx
-rw-rw-rw- 1 alian users 136 22 janv. 15:18 _h75q.fnm
-rw-rw-rw- 1 alian users 6262872761 22 janv. 16:56 _h75q.frq
-rw-rw-rw- 1 alian users 8735241 28 janv. 11:24 _h75q_h.del
-rw-rw-rw- 1 alian users 628936726 22 janv. 16:56 _h75q.nrm
-rw-rw-rw- 1 alian users 3391246405 22 janv. 16:56 _h75q.prx
-rw-rw-rw- 1 alian users 36898360 22 janv. 16:56 _h75q.tii
-rw-rw-rw- 1 alian users 3166106102 22 janv. 16:56 _h75q.tis
-rw-rw-rw- 1 alian users 7837 28 janv. 07:39 _h8fg_9.del
-rw-rw-rw- 1 alian users 149718875 22 janv. 21:15 _h8fg.cfs
-rw-rw-rw- 1 alian users 7409 28 janv. 07:39 _hbv1_6.del
-rw-rw-rw- 1 alian users 142550728 23 janv. 11:13 _hbv1.cfs
-rw-rw-rw- 1 alian users 7278 28 janv. 07:39 _hhji_4.del
-rw-rw-rw- 1 alian users 140427386 23 janv. 22:51 _hhji.cfs
-rw-rw-rw- 1 alian users 7425 26 janv. 07:17 _hk3j_4.del
-rw-rw-rw- 1 alian users 142586864 24 janv. 05:13 _hk3j.cfs
-rw-rw-rw- 1 alian users 50 26 janv. 19:06 _hosg_4.del
-rw-rw-rw- 1 alian users 149351158 24 janv. 07:42 _hosg.cfs
-rw-rw-rw- 1 alian users 7344 28 janv. 07:39 _hut8_4.del
-rw-rw-rw- 1 alian users 140612269 25 janv. 16:08 _hut8.cfs
-rw-rw-rw- 1 alian users 7140 28 janv. 07:39 _j619_4.del
-rw-rw-rw- 1 alian users 136371245 25 janv. 18:33 _j619.cfs
-rw-rw-rw- 1 alian users 7331 28 janv. 07:39 _jh0e_3.del
-rw-rw-rw- 1 alian users 144042436 25 janv. 19:05 _jh0e.cfs
-rw-rw-rw- 1 alian users 7055 28 janv. 07:39 _jp6n_4.del
-rw-rw-rw- 1 alian users 135794813 26 janv. 09:21 _jp6n.cfs
-rw-rw-rw- 1 alian users 7160 28 janv. 07:39 _jv9s_1.del
-rw-rw-rw- 1 alian users 134653421 27 janv. 06:36 _jv9s.cfs
-rw-rw-rw- 1 alian users 32 28 janv. 11:24 _jy5d_1.del
-rw-rw-rw- 1 alian users 147401672 28 janv. 07:26 _jy5d.cfs
-rw-rw-rw- 1 alian users 1024353 28 janv. 11:23 _jy6q.cfs
-rw-rw-rw- 1 alian users 9 28 janv. 11:24 _jy6r_1.del
-rw-rw-rw- 1 alian users 2914 28 janv. 11:24 _jy6r.cfs
-rw-rw-rw- 1 alian users 207 28 janv. 17:20 _jy6z.cfs
-rw-rw-rw- 1 alian users 198 29 janv. 10:23 _jy73.cfs
-rw-rw-rw- 1 alian users 20 29 janv. 10:23 segments.gen
-rw-rw-rw- 1 alian users 1955 29 janv. 10:23 segments_nw
-rw-rw-rw- 1 alian users 0 29 janv. 11:24 write.lock

Do you have an idea, even a small ? Thanks !

@drigh
Copy link
Author

drigh commented Jan 29, 2013

I don't have an idea off-hand, but generally I'd look for other offsets having int32 type -- could be in both class members and local variables. Sorry, nothing more concrete at the moment.

We've been using Lucene++ for some time now and didn't notice other crashes we could attribute to it (our indicies do not grow past 10-15G though and phrase queries are not typical for our applications). That said, I would consider possibility of memory corruption happening somewhere else in the program.

@alian555
Copy link

Really thanks for your answer, just to see that I'm not alone in the sea :-) I I need to find a test case where I can reproduce it easly, and the creation of the base must be in the test to avoid to work on a production server :-/ I give you a feedback if I find something ...

@drigh
Copy link
Author

drigh commented Jan 30, 2013

It could be really tricky to reproduce the bug (original one I was only able to reproduce carefully picking queries and queries that reproduced crash on one version of index wouldn't reproduce it on next version). So, if you ask me -- make a snapshot of whatever index it has crashed on and try exactly the same query you have in core dump. If you're lucky, it will repro. If not -- examining core more carefully is your best bet.

Some bugs are just impossible to design a test for :-(

@drigh
Copy link
Author

drigh commented Mar 18, 2013

Just noticed that src/core/include/_MMapDirectory.h has offsets as int32 as well. It's easy to fix (look through c++ file too, there're casts there). Hope that helps

@drigh
Copy link
Author

drigh commented Mar 20, 2013

Also, MiscUtils::arrayCopy uses int32_t as an offset, and this method is used inside memmapped index input. Changing argument types to int64 (as well as MMapIndexInput _length and bufferPosition) have sorted things out for us.

yeah, we just recently switched to memmapped index input :-)

@alanw
Copy link
Collaborator

alanw commented Mar 20, 2013

Thanks for highlighting this.

@drigh
Copy link
Author

drigh commented Mar 22, 2013

Sure :-)

That looks like carbon copy from Java, it's arrayCopy is not used in Java Lucene

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants