Enhance Collection with rollingAvegra #6171

Ducasse · 2020-04-13T11:30:40Z

Expected behavior
Like that we can have rolling average :)

Expected development cost
Nearly none just turn the following code into a PR.

Collection >> runningMeans: width
     "This returns an array of running means as if the receiver
      were broken into overlapping segments 'width' long and the
      mean of each calculated.  This uses an adaptation of
      Welford's algorithm for stably updating the mean; it is
      important to maintain a current MEAN not a current SUM.
      This has been tested against 7 other algorithms.  It was
      the most accurate of the faster ones.  The result is an
      Array no matter what kind of sequence the receiver is.
      If the receiver is a tree (like a virtual concatenation)
      or a singly or doubly linked list you should convert the
      receiver to an Array first.  Note that there is no
      explicit check that width is an Integer or is in range;
      none is needed because those checks happen anyway."

    "(#(1 1 2 2 3 3) asOrderedCollection runningMeans: 2) >>> {1. (3/2). 2. (5/2). 3}".

     |result mean resultIndex|
     result := Array new: self size - width + 1.
     mean := 0.
     1 to: width do: [:i | mean := (self at: i) + mean].
     mean := mean / width.
     resultIndex := 1.
     result at: resultIndex put: mean.
     width + 1 to: self size do: [:i |
       mean := ((self at: i) - (self at: resultIndex)) / width + mean.
       resultIndex := resultIndex + 1.
       result at: resultIndex put: mean].
     ^result

OrderedCollection >> testRunningMeans

	| result col |
	col := #(1 1 2 2 3 3) asOrderedCollection.
	result := col runningMeans: 2. 

	self assert: (result = {1. (3/2). 2. (5/2). 3}).
	
	self assert: (result class = Array).
	
	self assert: result size <= (col size). "Running means with 1 has little interest ?"
	
	self should: [ col runningMeans: 7 ] raise: SubscriptOutOfBounds.
	self  should: [ col runningMeans: -2 ] raise: SubscriptOutOfBounds.
	self  should: [ col runningMeans: 1.3 ] raise: Error withExceptionDo: [ :anException | self assert: anException messageText equals: 'primitive #basicNew: in Array class failed’ ]

testRunningMeansDeviation
OrderedCollection >> 
	| result1 result2 col1 col2 |
	
	col1 := #(0.3s1 1s1 0.1s1 0.3s1 1s1 0.1s1 0.3s1 1s1 0.1s1).
	result1 := col1 runningMeans: 2.
	
	col2 := #(0.3 1 0.1 0.3 1 0.1 0.3 1 0.1). 
	result2 := col2 runningMeans: 2.
	
	self assert: result1 first equals: result1 fourth.
	
	"presence of a rounding error"
	self deny: result2 first equals: result2 fourth.
	self assert: result2 first closeTo: result2 fourth.

The text was updated successfully, but these errors were encountered:

David5i6 · 2020-04-16T05:58:18Z

Hi @Ducasse .
Why not also a running: aBloc of: aWidth, that let us define a bloc to process each "aWidth" elements ? (I know it's not as easy but that will let us easily do a running mean, max, min, ... )

olekscode · 2020-04-24T09:02:00Z

Oh yes, we need that!

olekscode · 2020-04-24T10:11:25Z

I propose a different implementation. It allows us to use any block (not just average) and does not convert every collection to an Array. I implemented it on SequenceableCollection because runningAverage only makes sense if collection is sequenceable.

SequenceableCollection >> running: aBlock of: aSubsetSize
	"This is a generalization of a running average (a.k.a. moving average, rolling average) which allows you to apply any given block to the shifting subsets of a given size.
	
	For example, given a collection #(1 2 3 4 5) and a window size 2, we collect subsets of this collection by starting with first 2 elements and shifting the window 1 element to the right: #((1 2)(2 3)(3 4)(4 5)), then we apply aBlock to each subset and collect the results. For example, if aBlock is [ :subset | subset average ], this will give us #(1.5 2.5 3.5 4.5)"
	| result |
	
	aSubsetSize > self size ifTrue: [
		SubscriptOutOfBounds
			signal: 'The subset size can not exceed the size of a collection' ].
		
	aSubsetSize < 0 ifTrue: [
		SubscriptOutOfBounds
			signal: 'The subset size must be positive' ].
	
	result := (1 to: self size - aSubsetSize + 1) collect: [ :i |
		aBlock value: (self copyFrom: i to: i + aSubsetSize - 1) ].
	
	^ self species withAll: result

SequenceableCollection >> runningAverage: aSubsetSize
	"Running average (a.k.a. moving average, rolling average). See the comment of self >> #running:of: for more information."
	"(#(1 1 2 2 3 3) runningAverage: 2) >>> {1 . (3/2) . 2 . (5/2) . 3}"

	^ self running: #average of: aSubsetSize

SequenceableCollection >> runningMin: aSubsetSize
	"Running min. See the comment of self >> #running:of: for more information."
	"(#(1 1 2 2 3 3) runningMin: 3) >>> {1 . 1 . 2 . 2}"

	^ self running: #min of: aSubsetSize

SequenceableCollection >> runningMax: aSubsetSize
	"Running max. See the comment of self >> #running:of: for more information."
	"(#(1 1 2 2 3 3) runningMax: 3) >>> {2 . 2 . 3 . 3}"

	^ self running: #max of: aSubsetSize

CollectionArithmeticTest >> testRunningAverage
	| result collection |
	collection := #(1 1 2 2 3 3).
	result := collection runningAverage: 2. 

	self assert: result equals: {1 . (3/2) . 2 . (5/2) . 3}.

CollectionArithmeticTest >> testRunningMin
	| result collection |
	collection := #(1 1 2 2 3 3).
	result := collection runningMin: 3. 

	self assert: result equals: {1 . 1 . 2 . 2}.

CollectionArithmeticTest >> testRunningMax
	| result collection |
	collection := #(1 1 2 2 3 3).
	result := collection runningMax: 3. 

	self assert: result equals: {2 . 2 . 3 . 3}.

CollectionArithmeticTest >> testRunningAverageSubscriptOutOfBounds
	| collection |
	collection := #(1 1 2 2 3 3).

	self should: [ collection runningAverage: 7 ] raise: SubscriptOutOfBounds.
	self  should: [ collection runningAverage: -2 ] raise: SubscriptOutOfBounds.

CollectionArithmeticTest >> testRunningAverageWithSubsetSize1IsSameAsCollection
	| collection |
	collection := #(1 1 2 2 3 3).
	
	self
		assert: (collection runningAverage: 1) 
		equals: collection.

CollectionArithmeticTest >> testRunningAverageWithFullSubsetSizeIsSameAsAverage
	| collection |
	collection := #(1 1 2 2 3 3).
	
	self
		assert: (collection runningAverage: collection size) 
		equals: { collection average }.

Ducasse added Type: Enhancement Easy labels Apr 13, 2020

olekscode mentioned this issue Apr 24, 2020

6171-Enhance-Collection-with-rollingAvegra #6220

Merged

Ducasse closed this as completed in #6220 Apr 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance Collection with rollingAvegra #6171

Enhance Collection with rollingAvegra #6171

Ducasse commented Apr 13, 2020

David5i6 commented Apr 16, 2020 •

edited

olekscode commented Apr 24, 2020

olekscode commented Apr 24, 2020

Enhance Collection with rollingAvegra #6171

Enhance Collection with rollingAvegra #6171

Comments

Ducasse commented Apr 13, 2020

David5i6 commented Apr 16, 2020 • edited

olekscode commented Apr 24, 2020

olekscode commented Apr 24, 2020

David5i6 commented Apr 16, 2020 •

edited