faster string.Explode and string.Replace #1166

Merged
merged 4 commits into from Aug 23, 2016

Projects

None yet

8 participants

@CapsAdmin
Contributor

I can't test this as gmod doesn't work here but I have tested this outside of gmod and I'm sure the method works. It's about 10 times faster.

There's no need for string.PatternSafe with this method as string.find's third argument can specify whether to do a plain or pattern based search.

I could use while true do but for i = 1, math.huge do gives me a free i. Now that I'm writing this maybe it could be limited to string.len(str) as it would definitely not exceed that.

So I'd like some feedback before this gets merged (obviously) but especially since I can't test this in gmod myself.

@aStonedPenguin

Using

local str = 'awdawd;ad;awdadawd;awdawd;adawd;awdad;awd;'
local sep = ';'

require 'xbench' -- https://github.com/SuperiorServers/plib_v2/blob/master/lua/plib/libraries/xbench.lua

xbench.Compare({
    Old = function()
        string.Explode(str, sep)
    end,
    New = function()
        string.Explode2(str, sep)
    end,
}, 100000)

I got:
OLD: 0.2296575782384
NEW: 0.017039076028595

@CapsAdmin CapsAdmin slight improvements
instead of using math.huge as limit use the strings length as limit. this might be faster for smaller strings as the loop might be unrollable by jit
simplified current position logic, potentially faster but if not cleaner
b9ee9b7
@CapsAdmin
Contributor

It's slightly faster now. Could you do the same test agian?

also xbench.Compare is a neat idea

@Kefta
Contributor
Kefta commented Apr 24, 2016 edited

Trial 1:
OLDPATTERN: 0.049061595078456
OLD: 0.43176123587452
NEWPATTERN: 0.048394999406099
NEW: 0.035022127699676

Trial 2:
OLDPATTERN: 0.048659692157742
OLD: 0.43263569471353
NEWPATTERN: 0.052097114081022
NEW: 0.033646442160801

Trial 3:
OLDPATTERN: 0.052711999950844
OLD: 0.43146454448271
NEWPATTERN: 0.048868835269985
NEW: 0.033879904239598

@CapsAdmin CapsAdmin changed the title from faster string.Explode to faster string.Explode and string.Replace Apr 25, 2016
@CapsAdmin
Contributor

On a random generated 10000000 word string (56mb) using explode method takes 0.065 sec while the gsub method takes 0.8 sec.

@bigdogmat bigdogmat and 1 other commented on an outdated diff Apr 25, 2016
garrysmod/lua/includes/extensions/string.lua
@@ -232,9 +231,13 @@ end
function string.Replace( str, tofind, toreplace )
- tofind = tofind:PatternSafe()
- toreplace = toreplace:gsub( "%%", "%%%1" )
- return ( str:gsub( tofind, toreplace ) )
+ local tbl = string.Explode(tofind, str)
+
+ if tbl[1] then
+ return table.concat(tbl, toreplace)
+ end
+
+ return self
@bigdogmat
bigdogmat Apr 25, 2016

Shouldn't this return str

@CapsAdmin
CapsAdmin Apr 25, 2016 Contributor

ops

@thegrb93 thegrb93 referenced this pull request in Rusketh/ExpAdv2 Apr 30, 2016
Closed

Purge the string.format #206

@Rusketh
Rusketh commented Apr 30, 2016

Why has this not made it in yet. Merge Garry, Merge!

@Kefta
Contributor
Kefta commented Apr 30, 2016

@Rusketh Probably because it's still on RB's and Willox's testing list. Garry doesn't handle GMod anymore.

@robotboy655
Collaborator

Ehh, lets see how much I will regret merging this...

@robotboy655 robotboy655 reopened this Aug 23, 2016
@robotboy655 robotboy655 merged commit 3fac4ef into garrynewman:master Aug 23, 2016
@aStonedPenguin

@robotboy655 I've been running it on my servers since this PR, seems to work fine.

@Divran
Contributor
Divran commented Sep 10, 2016 edited

Garry's original explode function was replaced by another person years ago. When that happened, we discovered that wire's adv duplicator would crash the entire server. We narrowed it down to a place in adv dupe's code when it loads the dupe file, it would run string.Explode on the file, with only 1 single explode point near the center (of a very large file). The replacement explode function couldn't handle this use case, and so the server would crash.

TomyLobo made the explode function that has been in gmod ever since (the one that this PR replaces) because his was able to handle every single use case we could throw at it with about the same performance (which was 1000s of times faster than garry's in most cases).

If yours is better, that's great and it should definitely be merged and used. I just want us to make sure that it works for every use case. Specifically the one I mentioned earlier (I'd test it right now but it's late and I'm tired af).

EDIT: Also I don't understand how that string.Replace can be faster

@CapsAdmin
Contributor

I tried splitting a 50mb string in 2 and it doesn't crash. I'm not sure why the previous one would crash either. Out of memory or ran for too long? The max amount of iterations here is going to be the length of the string which should be safer than the previous method.

If you can throw your use cases at the new function and report back that'd be great.

This version is faster because it doesn't use string.gmatch which is internally a very complex function (loops and branchy) and is not a jit compilable function. The use case for string.explode tends to be without patterns so theres the string.PatternSafe which uses string.gsub and is inserting escape patterns into the string. For larger strings this can become memory intensive and slow.

@TomyLobo

I managed to find the old (2011) thread with Divran's test cases:
https://facepunch.com/showthread.php?t=1062458

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment