Rewrite 'Remove consecutive duplicate lines' to not use regex #13558

ArkadiuszMichalski · 2023-04-21T11:22:05Z

Files for test and results:

uniq_list.zip from #5538 (comment) - something around 2.7 million lines

In my machine I got 20s (new implementation) vs 70s (old implementation).
Unix (LF) 44073 lines file from #5538 (comment)

In this example we have problem with processing many lines at once when using regex (the line with EXECUTE as content). The new implementation does not have such problem and is also faster.

I tried to recreate the current behavior of this method. Now we have more control over the whole thing, so if something is not working properly, we can correct it. It might be even faster if we delete all the lines at once, but that would affect the state of the lines that are supposed to stay, so I don't do this (as in the previous commands).

Other commands that also use regex with a limit can be modified in a similar way. But so far we don't have any reported issues for them so that's a task for the future.

Rewrite 'Remove consecutive duplicate lines' to not use regex

3f3e6ce

chcg added enhancement Proposed enhancements of existing features performance issue labels Apr 22, 2023

donho self-assigned this Apr 23, 2023

donho added the accepted label May 1, 2023

donho closed this in ecb1071 May 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite 'Remove consecutive duplicate lines' to not use regex #13558

Rewrite 'Remove consecutive duplicate lines' to not use regex #13558

ArkadiuszMichalski commented Apr 21, 2023

Rewrite 'Remove consecutive duplicate lines' to not use regex #13558

Rewrite 'Remove consecutive duplicate lines' to not use regex #13558

Conversation

ArkadiuszMichalski commented Apr 21, 2023