Too many data will crash native fruzzy matcher #19

mars90226 · 2019-05-16T03:34:03Z

When I use Denite with command_history source that will list vim command history and use native fruzzy matcher, it will crash. The detail bug information can be found in Shougo/denite.nvim#636.
When I use nvim -i NONE to ignore previous command history, the problem is gone. So I think it's probably because my vim command history is too large.

The text was updated successfully, but these errors were encountered:

raghur · 2019-05-16T04:30:07Z

Can you turn off usenative and see how many entries you have on your command history so I can test with something similar?

mars90226 · 2019-05-17T07:22:47Z

Sure, I have 8305 entries in my command history.

mars90226 · 2019-05-17T07:53:24Z

I've used set history=[limit] suggested by Shougo to test the limit. For me, the limit is 1548. When I set history=1549, fruzzy will crash. And the corresponding command history size is between 27461 ~ 27485 bytes. (Counted by using q:, select all, and use g <C-g>)

raghur · 2019-05-17T10:01:00Z

@mars90226 - I just added a test with 2400 lines (120k file size). At least with pytest, this passes (I'm testing this on Win 8.1). That indicates that this could be something between the denite/fruzzy border.

What OS are you on? Also, if you aren't on Windows, is it possible to run pytest? See instructions on the README.

mars90226 · 2019-05-17T10:42:07Z

I've run FUZZY_CMOD=1 pytest --log-level=debug -s. Here's the result:

==================================================================================== test session starts =====================================================================================
platform linux -- Python 3.6.7, pytest-4.5.0, py-1.8.0, pluggy-0.11.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /root/.vim/plugged/fruzzy/rplugin
plugins: benchmark-3.2.2
collected 18 items

fruzzy_test.py .................lenght of results - 10
.


------------------------------------------------------------------------------------------------ benchmark: 5 tests --------------------------------------------------------------------------
---------------------
Name (time in us)                               Min                 Max               Mean             StdDev             Median                IQR            Outliers  OPS (Kops/s)
   Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------
test_measure_baseline_native_call           11.4143 (1.0)      421.0388 (3.68)     13.3437 (1.0)      21.3837 (5.06)     11.8725 (1.0)       0.1984 (1.0)      147;1799       74.9420 (1.0)
    34313           1
test_must_prefer_longer_match               32.2582 (2.83)     114.5201 (1.0)      39.3677 (2.95)      5.5705 (1.32)     42.4199 (3.57)      9.4557 (47.67)    7012;128       25.4015 (0.34)
    21841           1
test_must_prefer_match_at_end               40.8161 (3.58)     137.7100 (1.20)     48.9782 (3.67)      5.7976 (1.37)     51.1087 (4.30)     10.4345 (52.60)     1400;24       20.4172 (0.27)
     4296           1
test_must_prefer_match_after_separators     48.7715 (4.27)     159.2608 (1.39)     58.9746 (4.42)      4.5775 (1.08)     58.9928 (4.97)      1.7174 (8.66)    1608;1959       16.9564 (0.23)
    13202           1
test_must_score_cluster_higher              49.7187 (4.36)     132.0997 (1.15)     59.4936 (4.46)      4.2253 (1.0)      59.7546 (5.03)      1.0524 (5.31)    1320;2613       16.8085 (0.22)
    13013           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
================================================================================= 18 passed in 4.47 seconds ==================================================================================

It seems normal. I'm not sure what's the problem here. Maybe I should try ctrlp with fruzzy?

raghur · 2019-05-17T11:24:16Z

Thanks - much appreciated. I suppose there's a bug there somewhere - just that I haven't been able to find it. It doesn't seem to be the actual size of the data though. I'm going to try to inject the same dataset through denite and see if I can repro your crash.

Can I request you to try one more test? On the same branch, if you can replace the contents of neomru_file_big with your command history and run the test? Or if you can send me your command history, I can spare you the trouble.

mars90226 · 2019-05-17T12:04:37Z

Here's the result with neomru_file_big replaced with my command history. It seems that there's some invalid Unicode sequence.

==================================================================================== test session starts =====================================================================================
platform linux -- Python 3.6.7, pytest-4.5.0, py-1.8.0, pluggy-0.11.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /root/.vim/plugged/fruzzy/rplugin
plugins: benchmark-3.2.2
collected 0 items / 1 errors

=========================================================================================== ERRORS ===========================================================================================
______________________________________________________________________________ ERROR collecting fruzzy_test.py _______________________________________________________________________________
fruzzy_test.py:143: in <module>
    biglist = [line.strip() for line in fh.readlines()]
/usr/lib/python3.6/codecs.py:321: in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
E   UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfd in position 3605: invalid start byte
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================================================== 1 error in 0.08 seconds ===================================================================================

I've also found that :Denite register will crash too, and it's caused by a register with hhi<80><fc>^Hn^[l as content. But I'm not sure how this <80><fc> come from.

mars90226 · 2019-05-17T13:08:03Z

There are two commands that has corrupt unicodes:

call ������2_last_tab()
echo ������2_last_tab()

Deleting them will make native fruzzy work. But it's weird that non-native fruzzy can work, but native fruzzy can't.

The hex of the corrupt commands are:

┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ 63 61 6c 6c 20 fd bf bf ┊ ba b4 83 32 5f 6c 61 73 │call ×××┊×××2_las│
│00000010│ 74 5f 74 61 62 28 29 0a ┊ 65 63 68 6f 20 fd bf bf │t_tab()_┊echo ×××│
│00000020│ ba b4 83 32 5f 6c 61 73 ┊ 74 5f 74 61 62 28 29 0a │×××2_las┊t_tab()_│
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘

raghur · 2019-05-17T14:02:34Z

Thanks for investigating... Native fruzzy module does not do Unicode.. So likely the crash is caused by that when it encounters Unicode sequence

…

On Fri, 17 May, 2019, 6:38 PM Mars Peng, ***@***.***> wrote: It's there are two commands that has corrupt unicodes: call ��2_last_tab() echo ��2_last_tab() Deleting them will make native fruzzy work. But it's weird that non-native fruzzy can work, but native fruzzy can't. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#19?email_source=notifications&email_token=AAD6BK2A5F3MVFDFCW6H5ETPV2U3JA5CNFSM4HNIU5SKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVUWOIQ#issuecomment-493446946>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAD6BK43Z6QWMBKMXAYFFVLPV2U3JANCNFSM4HNIU5SA> .

The reason is that native fruzzy cannot handle unicode sequence and crash. Delete the commands that contain unicode sequence can avoid this problem. Ref: raghur/fruzzy#19

mars90226 · 2019-05-23T08:55:02Z

So, I think I can close this issue as this is the duplicate issue of #2?

raghur · 2019-05-23T09:30:49Z

Yeah - I just have to get around to it.

raghur · 2019-05-24T06:35:11Z

@mars90226 - I tried some basic unicode support but the timings are about the same as Python3...(so 370 - 420 us) Nim's unicode API isn't my strong suit and if all its going to result in is a solution that's as performant as plain python then it basically calls to question whether its even worth it.

I've pushed it on the same branch that I created for this bug (if I pick it up again)..

The reason is that native fruzzy cannot handle unicode sequence and crash. Delete the commands that contain unicode sequence can avoid this problem. Ref: raghur/fruzzy#19

raghur added a commit that referenced this issue May 17, 2019

see #19 - add test for large input list

3e64cc2

mars90226 closed this as completed May 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too many data will crash native fruzzy matcher #19

Too many data will crash native fruzzy matcher #19

mars90226 commented May 16, 2019

raghur commented May 16, 2019

mars90226 commented May 17, 2019

mars90226 commented May 17, 2019

raghur commented May 17, 2019

mars90226 commented May 17, 2019

raghur commented May 17, 2019

mars90226 commented May 17, 2019

mars90226 commented May 17, 2019 •

edited

raghur commented May 17, 2019 via email

mars90226 commented May 23, 2019

raghur commented May 23, 2019

raghur commented May 24, 2019 •

edited

Too many data will crash native fruzzy matcher #19

Too many data will crash native fruzzy matcher #19

Comments

mars90226 commented May 16, 2019

raghur commented May 16, 2019

mars90226 commented May 17, 2019

mars90226 commented May 17, 2019

raghur commented May 17, 2019

mars90226 commented May 17, 2019

raghur commented May 17, 2019

mars90226 commented May 17, 2019

mars90226 commented May 17, 2019 • edited

raghur commented May 17, 2019 via email

mars90226 commented May 23, 2019

raghur commented May 23, 2019

raghur commented May 24, 2019 • edited

mars90226 commented May 17, 2019 •

edited

raghur commented May 24, 2019 •

edited