Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Collection with rollingAvegra #6171

Closed
Ducasse opened this issue Apr 13, 2020 · 3 comments · Fixed by #6220
Closed

Enhance Collection with rollingAvegra #6171

Ducasse opened this issue Apr 13, 2020 · 3 comments · Fixed by #6220

Comments

@Ducasse
Copy link
Member

Ducasse commented Apr 13, 2020

Expected behavior
Like that we can have rolling average :)

Expected development cost
Nearly none just turn the following code into a PR.

Collection >> runningMeans: width
     "This returns an array of running means as if the receiver
      were broken into overlapping segments 'width' long and the
      mean of each calculated.  This uses an adaptation of
      Welford's algorithm for stably updating the mean; it is
      important to maintain a current MEAN not a current SUM.
      This has been tested against 7 other algorithms.  It was
      the most accurate of the faster ones.  The result is an
      Array no matter what kind of sequence the receiver is.
      If the receiver is a tree (like a virtual concatenation)
      or a singly or doubly linked list you should convert the
      receiver to an Array first.  Note that there is no
      explicit check that width is an Integer or is in range;
      none is needed because those checks happen anyway."

    "(#(1 1 2 2 3 3) asOrderedCollection runningMeans: 2) >>> {1. (3/2). 2. (5/2). 3}".

     |result mean resultIndex|
     result := Array new: self size - width + 1.
     mean := 0.
     1 to: width do: [:i | mean := (self at: i) + mean].
     mean := mean / width.
     resultIndex := 1.
     result at: resultIndex put: mean.
     width + 1 to: self size do: [:i |
       mean := ((self at: i) - (self at: resultIndex)) / width + mean.
       resultIndex := resultIndex + 1.
       result at: resultIndex put: mean].
     ^result
OrderedCollection >> testRunningMeans

	| result col |
	col := #(1 1 2 2 3 3) asOrderedCollection.
	result := col runningMeans: 2. 

	self assert: (result = {1. (3/2). 2. (5/2). 3}).
	
	self assert: (result class = Array).
	
	self assert: result size <= (col size). "Running means with 1 has little interest ?"
	
	self should: [ col runningMeans: 7 ] raise: SubscriptOutOfBounds.
	self  should: [ col runningMeans: -2 ] raise: SubscriptOutOfBounds.
	self  should: [ col runningMeans: 1.3 ] raise: Error withExceptionDo: [ :anException | self assert: anException messageText equals: 'primitive #basicNew: in Array class failed’ ]

testRunningMeansDeviation
OrderedCollection >> 
	| result1 result2 col1 col2 |
	
	col1 := #(0.3s1 1s1 0.1s1 0.3s1 1s1 0.1s1 0.3s1 1s1 0.1s1).
	result1 := col1 runningMeans: 2.
	
	col2 := #(0.3 1 0.1 0.3 1 0.1 0.3 1 0.1). 
	result2 := col2 runningMeans: 2.
	
	self assert: result1 first equals: result1 fourth.
	
	"presence of a rounding error"
	self deny: result2 first equals: result2 fourth.
	self assert: result2 first closeTo: result2 fourth.
@David5i6
Copy link
Contributor

David5i6 commented Apr 16, 2020

Hi @Ducasse .
Why not also a running: aBloc of: aWidth, that let us define a bloc to process each "aWidth" elements ? (I know it's not as easy but that will let us easily do a running mean, max, min, ... )

@olekscode
Copy link
Member

Oh yes, we need that!

@olekscode
Copy link
Member

I propose a different implementation. It allows us to use any block (not just average) and does not convert every collection to an Array. I implemented it on SequenceableCollection because runningAverage only makes sense if collection is sequenceable.

SequenceableCollection >> running: aBlock of: aSubsetSize
	"This is a generalization of a running average (a.k.a. moving average, rolling average) which allows you to apply any given block to the shifting subsets of a given size.
	
	For example, given a collection #(1 2 3 4 5) and a window size 2, we collect subsets of this collection by starting with first 2 elements and shifting the window 1 element to the right: #((1 2)(2 3)(3 4)(4 5)), then we apply aBlock to each subset and collect the results. For example, if aBlock is [ :subset | subset average ], this will give us #(1.5 2.5 3.5 4.5)"
	| result |
	
	aSubsetSize > self size ifTrue: [
		SubscriptOutOfBounds
			signal: 'The subset size can not exceed the size of a collection' ].
		
	aSubsetSize < 0 ifTrue: [
		SubscriptOutOfBounds
			signal: 'The subset size must be positive' ].
	
	result := (1 to: self size - aSubsetSize + 1) collect: [ :i |
		aBlock value: (self copyFrom: i to: i + aSubsetSize - 1) ].
	
	^ self species withAll: result
SequenceableCollection >> runningAverage: aSubsetSize
	"Running average (a.k.a. moving average, rolling average). See the comment of self >> #running:of: for more information."
	"(#(1 1 2 2 3 3) runningAverage: 2) >>> {1 . (3/2) . 2 . (5/2) . 3}"

	^ self running: #average of: aSubsetSize
SequenceableCollection >> runningMin: aSubsetSize
	"Running min. See the comment of self >> #running:of: for more information."
	"(#(1 1 2 2 3 3) runningMin: 3) >>> {1 . 1 . 2 . 2}"

	^ self running: #min of: aSubsetSize
SequenceableCollection >> runningMax: aSubsetSize
	"Running max. See the comment of self >> #running:of: for more information."
	"(#(1 1 2 2 3 3) runningMax: 3) >>> {2 . 2 . 3 . 3}"

	^ self running: #max of: aSubsetSize
CollectionArithmeticTest >> testRunningAverage
	| result collection |
	collection := #(1 1 2 2 3 3).
	result := collection runningAverage: 2. 

	self assert: result equals: {1 . (3/2) . 2 . (5/2) . 3}.
CollectionArithmeticTest >> testRunningMin
	| result collection |
	collection := #(1 1 2 2 3 3).
	result := collection runningMin: 3. 

	self assert: result equals: {1 . 1 . 2 . 2}.
CollectionArithmeticTest >> testRunningMax
	| result collection |
	collection := #(1 1 2 2 3 3).
	result := collection runningMax: 3. 

	self assert: result equals: {2 . 2 . 3 . 3}.
CollectionArithmeticTest >> testRunningAverageSubscriptOutOfBounds
	| collection |
	collection := #(1 1 2 2 3 3).

	self should: [ collection runningAverage: 7 ] raise: SubscriptOutOfBounds.
	self  should: [ collection runningAverage: -2 ] raise: SubscriptOutOfBounds.
CollectionArithmeticTest >> testRunningAverageWithSubsetSize1IsSameAsCollection
	| collection |
	collection := #(1 1 2 2 3 3).
	
	self
		assert: (collection runningAverage: 1) 
		equals: collection.
CollectionArithmeticTest >> testRunningAverageWithFullSubsetSizeIsSameAsAverage
	| collection |
	collection := #(1 1 2 2 3 3).
	
	self
		assert: (collection runningAverage: collection size) 
		equals: { collection average }.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants