Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPLIT-SEQUENCE is quadratic on lists #9

Closed
phoe opened this issue Dec 21, 2018 · 4 comments · Fixed by #13
Closed

SPLIT-SEQUENCE is quadratic on lists #9

phoe opened this issue Dec 21, 2018 · 4 comments · Fixed by #13

Comments

@phoe
Copy link
Contributor

phoe commented Dec 21, 2018

SPLIT-SEQUENCE internally uses SUBSEQ inside LOOP. This means that looping over long lists will exhibit quadratic time complexity.

This is not necessary as lists can be traversed only once - naturally if splitting is done from start, and (optionally) with REVERSE and NREVERSE if splitting is done FROM-END. This allows for linear-time splitting on lists. See the following code for a very simple demonstration of that algorithm:

(defun split-on-delimiter (list delimiter)
  (loop for current = list then tail
        for (head tail)
          = (loop for sublist on current
                  until (eql (car sublist) delimiter)
                  collect (car sublist) into first-part
                  finally (return (list first-part (cdr sublist))))
        until (endp current)
        collect head))
@phoe
Copy link
Contributor Author

phoe commented Dec 21, 2018

Example with linear growth in data size:

CL-USER> (loop for n in '(10000 20000 30000 40000 50000 60000 70000 80000 90000 100000)
               for list = (alexandria:iota n)
               do (time (split-sequence:split-sequence-if (lambda (x) (= 0 (mod x 5))) list)))
Evaluation took:
  0.039 seconds of real time
  0.038105 seconds of total run time (0.038105 user, 0.000000 system)
  97.44% CPU
  126,768,951 processor cycles
  163,840 bytes consed
  
Evaluation took:
  0.152 seconds of real time
  0.150115 seconds of total run time (0.146414 user, 0.003701 system)
  98.68% CPU
  499,400,139 processor cycles
  302,336 bytes consed
  
Evaluation took:
  0.337 seconds of real time
  0.335578 seconds of total run time (0.335578 user, 0.000000 system)
  99.70% CPU
  1,109,536,053 processor cycles
  506,432 bytes consed
  
Evaluation took:
  0.605 seconds of real time
  0.602060 seconds of total run time (0.598225 user, 0.003835 system)
  99.50% CPU
  1,994,582,001 processor cycles
  622,592 bytes consed
  
Evaluation took:
  0.943 seconds of real time
  0.937031 seconds of total run time (0.932967 user, 0.004064 system)
  99.36% CPU
  3,104,688,450 processor cycles
  786,432 bytes consed
  
Evaluation took:
  1.353 seconds of real time
  1.345672 seconds of total run time (1.345672 user, 0.000000 system)
  99.48% CPU
  4,456,993,287 processor cycles
  950,272 bytes consed
  
Evaluation took:
  1.836 seconds of real time
  1.826522 seconds of total run time (1.826522 user, 0.000000 system)
  99.51% CPU
  6,046,009,596 processor cycles
  1,166,624 bytes consed
  
Evaluation took:
  2.401 seconds of real time
  2.388336 seconds of total run time (2.388336 user, 0.000000 system)
  99.46% CPU
  7,905,333,006 processor cycles
  1,279,616 bytes consed
  
Evaluation took:
  3.036 seconds of real time
  3.023084 seconds of total run time (3.023084 user, 0.000000 system)
  99.57% CPU
  9,994,764,348 processor cycles
  1,449,840 bytes consed
  
Evaluation took:
  3.724 seconds of real time
  3.718900 seconds of total run time (3.711284 user, 0.007616 system)
  99.87% CPU
  12,263,698,176 processor cycles
  1,605,632 bytes consed

@sionescu
Copy link
Member

Hi Michał, that's a very good finding. Can you turn it into a patch ?

@phoe
Copy link
Contributor Author

phoe commented Dec 22, 2018

Yes, I will be working on it today.

From what I already see, this requires that I write a separate version of SPLIT-SEQUENCE{,-IF,-IF-NOT} to avoid calling POSITION on lists in the predicate function.

This was referenced Dec 22, 2018
@phoe
Copy link
Contributor Author

phoe commented Dec 22, 2018

Same test on #13 gives me an insane speedup:

CL-USER> (loop for n in '(10000 20000 30000 40000 50000 60000 70000 80000 90000 100000)
               for list = (alexandria:iota n)
               do (time (split-sequence:split-sequence-if (lambda (x) (= 0 (mod x 5))) list)))
Evaluation took:
  0.001 seconds of real time
  0.000348 seconds of total run time (0.000342 user, 0.000006 system)
  0.00% CPU
  1,141,083 processor cycles
  163,840 bytes consed
  
Evaluation took:
  0.001 seconds of real time
  0.000708 seconds of total run time (0.000708 user, 0.000000 system)
  100.00% CPU
  2,331,288 processor cycles
  360,448 bytes consed
  
Evaluation took:
  0.001 seconds of real time
  0.001085 seconds of total run time (0.001085 user, 0.000000 system)
  100.00% CPU
  3,568,164 processor cycles
  589,824 bytes consed
  
Evaluation took:
  0.001 seconds of real time
  0.001417 seconds of total run time (0.001414 user, 0.000003 system)
  100.00% CPU
  4,668,300 processor cycles
  786,432 bytes consed
  
Evaluation took:
  0.002 seconds of real time
  0.002131 seconds of total run time (0.002131 user, 0.000000 system)
  100.00% CPU
  7,008,110 processor cycles
  983,040 bytes consed
  
Evaluation took:
  0.003 seconds of real time
  0.002505 seconds of total run time (0.002505 user, 0.000000 system)
  100.00% CPU
  8,241,584 processor cycles
  1,146,880 bytes consed
  
Evaluation took:
  0.002 seconds of real time
  0.002629 seconds of total run time (0.002629 user, 0.000000 system)
  150.00% CPU
  8,651,373 processor cycles
  1,343,488 bytes consed
  
Evaluation took:
  0.003 seconds of real time
  0.002755 seconds of total run time (0.002755 user, 0.000000 system)
  100.00% CPU
  9,066,423 processor cycles
  1,540,096 bytes consed
  
Evaluation took:
  0.003 seconds of real time
  0.003134 seconds of total run time (0.003134 user, 0.000000 system)
  100.00% CPU
  10,313,691 processor cycles
  1,736,704 bytes consed
  
Evaluation took:
  0.003 seconds of real time
  0.003513 seconds of total run time (0.003513 user, 0.000000 system)
  133.33% CPU
  11,559,720 processor cycles
  1,900,544 bytes consed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants