Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topological sort according to TAOCP by Donald Knuth, Volume 1 page 265. #7457

Conversation

massimo-nocentini
Copy link
Contributor

@massimo-nocentini massimo-nocentini commented Oct 9, 2020

Implementation that combines sequential and linked allocations to obtain O(m + n) complexity, where m is the number of input relations and n is the number of unique objects that are being related. Neither priority heaps nor recursive message sends (as in DFS-based solutions) are used, just an Array of size n, m ValueLink objects and a Dictionary of size proportional to the number of sources (nodes with no incoming edges) are allocated.

The proposed message return an Array object with the objects in topological order if the given associations form a DAG, otherwise it detect a cycle and invoke a given block with a collection of associations forming that cycle.

It depends on two auxiliary messages #ignoreBlock: and #anyAssociation that are commented and tested as well.

The example from Knuth:

testTopologicalSortOnCycleDo1

	"The following test is kept from TAOCP by Donald Knuth, Volume 1 page 272."

	self
		assert: ({ 
				 (9 -> 2).
				 (3 -> 7).
				 (7 -> 5).
				 (5 -> 8).
				 (8 -> 6).
				 (4 -> 6).
				 (1 -> 3).
				 (7 -> 4).
				 (9 -> 5).
				 (2 -> 8) } topologicalSortOnCycleDo: [ Error signal ])
		equals: #( 1 9 3 2 7 4 5 8 6 )

Assumption
the naturals used in the input relations have to cover the interval [1, n], no holes allowed. For the sake of clarity,

testTopologicalSortOnCycleDo5

	"This test stress four small cycles detection."

	self
		should: [ 
			{ 
				(1 -> 2).
				(4 -> 1) } topologicalSortOnCycleDo: [ :cycle | Error signal ] ]
		raise: SubscriptOutOfBounds

@massimo-nocentini massimo-nocentini marked this pull request as ready for review October 11, 2020 09:06
@dionisiydk
Copy link
Contributor

Hi Massimo.
Could you explain why do you think it needs to be in Pharo core?
It looks like very specialized method for special kind of collections. But you added it to the root Collection class.

@dionisiydk
Copy link
Contributor

Another observation:
I am sure browser shows a lot of critiques for this huge method #topologicalSortOnCycleDo:. No way to refactor it?

… found in the original text.

Add a test case that shows how to iteratively remove cycle to refine the input relations to be acyclic.
@massimo-nocentini
Copy link
Contributor Author

massimo-nocentini commented Oct 12, 2020

Hi @dionisiydk ,

I added it to the root Collection class because the implementation is quite general and uses native objects only, despite the assumption to encode objects with natural numbers. IMHO, neither depending on graph encoding of the input relations nor specialised data structures (priority heaps for instance) makes Collection a candidate for such message.

Yes, you're right, the implementation is long and the critique browser complains as you said. I think that many parts can be encapsulated/promoted to be a message of its own; before going further, I would like to ask you an advice about that.

…em acyclic, in order to collect the set of edges removed in the process.
@dionisiydk
Copy link
Contributor

I added it to the root Collection class because the implementation is quite general and uses native objects only, despite the assumption to encode objects with natural numbers.

Maybe I misunderstood PR but it looks like algorithm needs a collection of associations as an input. And it is what I consider as a special kind of collection while existing collection methods are really general and can be applied to any collection of any objects.

Also you did not explain why it should be in the core collection library?
To be honest I am no really familiar with this algorithm, what is a domain area for it. And therefore to me it should be pushed to Polymath project or related one, the place for mathematical stuff

@massimo-nocentini
Copy link
Contributor Author

After a talk with @pavel-krivanek and Richard Uttner, I fully understand your comments and realised that Collection is not the best place for the proposed message.

BTW, are the auxiliary messages #ignoreBlock: and #anyAssociation good and useful enough to remain in this PR in your opinion?

Thanks.

@dionisiydk
Copy link
Contributor

BTW, are the auxiliary messages #ignoreBlock: and #anyAssociation good and useful enough to remain in this PR in your opinion?

I doubt about #ignoreBlock: because you can do the same with #in: . Also I find the name is very unclear.
#anyAssociation may be useful.

@massimo-nocentini
Copy link
Contributor Author

massimo-nocentini commented Oct 21, 2020

Ok, IMHO #in: returns the value computed within the given block; on the other hand, #ignoreBlock: invokes the given block, discards its computed value and returns the receiver. An example where I found that message useful:

   ...
   ^ p value ignoreBlock: [ p := p nextLink ]

provided that p has been instantiate as a ValueLink object, equivalent to:

   |v|
   ...
   v := p value.
   p := p nextLink.
   ^ v

What's a better name for it?

@dionisiydk
Copy link
Contributor

Ok, IMHO #in: returns the value computed within the given block; on the other hand, #ignoreBlock: invokes the given block, discards its computed value and returns the receiver.

I see the difference. Thanks for details.
I don't know how useful it is. IMHO the version with variables (the last example) is more readable. But others may think different.

@gcotelli
Copy link
Member

Maybe this can be a TopologicalSortAlgorithm class working over a provided collection instead of a giant method in the Collection hiearchy? So you can implement

topologicalSortOnCycleDo: aBlock

 ^(TopologicalSortAlgorithm over: self onCycleDo: aBlock) value 

And refactor a bit the internals of the algorithm in this class instead of adding several private methods to the Collection hiearchy?

@Ducasse
Copy link
Member

Ducasse commented Dec 23, 2020

I like the idea of Gabriel, I would name it

TopologicalSorter

In awesome there is

sortTopologically
	"Plenty of room for increased efficiency in this one."

	| remaining result pick |
	remaining := self asOrderedCollection.
	result := OrderedCollection new.
	[remaining isEmpty] whileFalse: [
		pick := remaining select: [:item |
			remaining allSatisfy: [:anotherItem |
				item == anotherItem or: [self should: item precede: anotherItem]]].
		pick isEmpty ifTrue: [self error: 'bad topological ordering'].
		result addAll: pick.
		remaining removeAll: pick].
	^self copySameFrom: result
should: a precede: b

	^ sortBlock ifNil: [a <= b] ifNotNil: [sortBlock value: a value: b]

in SortedCollection

So I will close the issue but I encourage you to turn it in TopologicalSorter and we can host it in http://www.github.com/Pharo-containers/

GitHub
In this organization, we collect various data structures for Pharo. You will find containers such as Stack, Queue, HashTable, OrderedSet, and many more. - pharo-containers

@Ducasse Ducasse closed this Dec 23, 2020
@massimo-nocentini
Copy link
Contributor Author

massimo-nocentini commented Jan 16, 2021

Hi,

many thanks to all of you for the advices. I've moved the code in https://github.com/massimo-nocentini/Containers-LinkedStoragePool and an short documentation can be found at https://massimo-nocentini.github.io/Booklet-DSst/linked-allocation/linked-allocation.html#topological-sorting

Whenever I've time I'll refactor it as suggested.

All the best,
-massimo

GitHub
Contribute to massimo-nocentini/Containers-LinkedStoragePool development by creating an account on GitHub.

@Ducasse
Copy link
Member

Ducasse commented Jan 16, 2021

Thanks this is great and I will your doc. It looks super nice.
And we can push it to pharo-containers.

S

@massimo-nocentini massimo-nocentini deleted the Pharo9.0-topological-sort-in-linear-time branch April 3, 2024 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants